Intel Nic (em) High Cpu Usage



  • Hi Guys been having this problem for days, I use 4 Intel Nic with 2 wan connection,
    Cpu usage is always high in one my Intel Nic's (em2). running 2.0.3 Amd64

    Following things have been tried:
    changed the nic = no (used dlink & broadcom = still cpu usage is high)
    Fresh install = no
    using only connection = no

    I've attached a screenshot of "system activity" on fresh install with only internet connectivity.



  • Netgate Administrator

    What are you using em2 for?

    That display doesn't show the type of CPU usage. Please post the output of 'top -SHd3' from the command line if you can.
    If you remove the cable from em2 does this effect the cpu load?

    Steve



  • Thanks for the reply steve,

    I used it for a public ip. Yeah if i disable interface or remove the cable from the nic. cpu usage goes low about 5-10%.

    here's the output:
    [2.0.3-RELEASE]/root(1): top -SHd3
    last pid:  6608;  load averages:  0.58,  0.80,  0.80    up 0+00:55:20  18:12:29
    123 processes: 3 running, 101 sleeping, 19 waiting
    CPU:  0.0% user,  0.0% nice, 32.5% system,  0.0% interrupt, 67.5% idle
    Mem: 108M Active, 48M Inact, 446M Wired, 652K Cache, 414M Buf, 3295M Free
    Swap: 10G Total, 10G Free

    PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
      11 root     171 ki31     0K      32K CPU0        0  44:02 99.76% {idle: cpu0}
       0 root     -68    0      0K      192K -            1  34:32 66.06% {em2 taskq}
      11 root     171 ki31     0K      32K RUN          1  19:12 33.89% {idle: cpu1}
       0 root     -68    0      0K       192K -            0   0:10  0.29% {em3 taskq}
     243 root      76   20     7004K  1308K kqread   0   9:05  0.00% check_reload_stat
       0 root      76    0       0K       192K sched      0   0:53  0.00% {swapper}
      12 root     -32    -      0K       304K WAIT       0   0:09  0.00% {swi4: clock}
       0 root     -68    0      0K        192K -            0   0:07  0.00% {em0 que}
      14 root      44    -      0K        16K -              0   0:06  0.00% yarrow
    41917 root      62    0    113M    32404K accept  1   0:04  0.00% php
       0 root     -68    0      0K       192K -              0   0:04  0.00% {em1 que}
       4 root      -8    -       0K       16K -               0   0:03  0.00% g_down
      12 root     -68    -     0K       304K WAIT         0   0:02  0.00% {irq19: em3 uhci1
       3 root      -8    -      0K        16K -               0   0:02  0.00% g_up
    25057 root      53    0   113M    34012K accept   0   0:01  0.00% php
      12 root     -64    -     0K       304K WAIT         0   0:01  0.00% {irq15: ata1}


  • Netgate Administrator

    Hmm, not interrupt load then. So it is having to deal with some traffic, are you seeing a lot of firewall hits in the logs?
    One thing that can cause an apparent high cpu load is if the cpu has gone into a low power/speed state in which case there is no problem.
    What do you see from 'sysctl dev.cpu'?

    Steve



  • you're right damn right, yes i see lot of firewall hits on the public ip.
    It's not a single public ip it lies on the sub-net /29.

    This is what i get, btw i'm using core 2 duo 2.2 ghz procy and gigabyte  g41 Mobo.

    [2.0.3-RELEASE]/root(1): sysctl dev.cpu
    dev.cpu.0.%desc: ACPI CPU
    dev.cpu.0.%driver: cpu
    dev.cpu.0.%location: handle=_PR_.CPU0
    dev.cpu.0.%pnpinfo: _HID=none UID=0
    dev.cpu.0.%parent: acpi0
    dev.cpu.0.freq: 2199
    dev.cpu.0.freq_levels: 2199/-1 1924/-1 1649/-1 1374/-1 1099/-1 824/-1 549/-1 274/-1
    dev.cpu.0.cx_supported: C1/0
    dev.cpu.0.cx_lowest: C1
    dev.cpu.0.cx_usage: 100.00% last 500us
    dev.cpu.1.%desc: ACPI CPU
    dev.cpu.1.%driver: cpu
    dev.cpu.1.%location: handle=_PR
    .CPU1
    dev.cpu.1.%pnpinfo: _HID=none _UID=0
    dev.cpu.1.%parent: acpi0
    dev.cpu.1.cx_supported: C1/0
    dev.cpu.1.cx_lowest: C1
    dev.cpu.1.cx_usage: 100.00% last 500us


  • Netgate Administrator

    @srk3461:

    dev.cpu.0.freq: 2199
    dev.cpu.0.freq_levels: 2199/-1 1924/-1 1649/-1 1374/-1 1099/-1 824/-1 549/-1 274/-1
    dev.cpu.0.cx_supported: C1/0
    dev.cpu.0.cx_lowest: C1
    dev.cpu.0.cx_usage: 100.00% last 500us

    Ok, so this shows it is supporting some type of power saving modes, could be speedstep or ACPI throttling, but isn't using it. It's still running at 2.2GHz. Also that it only supports C1 as it's lowest C state. States C2+3 provide much more power savings.
    I assume you haven't enabled powerd in pfSense?

    Presumably then it is just having to deal with a massive amount of unwanted packets on WAN? Weird  :-\ What type of connection is it? When you say 'a lot of firewall hits' how many, say, per minute are you talking about?

    Steve



  • Nope steve, I haven't enabled powerD. It's a 2Mbps leased line(optic). About 100 hits in a min.

    And one more thing steve, if i enable, after 5 min to 10 min my wan goes offline alternatively (mine is 2 wan connection).


  • Netgate Administrator

    Well 100 hits a minute isn't going to strain the CPU much. Perhaps there is something odd happening like malformed packets other traffic that causes the NIC to work but doesn't actaully end up as IP hits at the firewall. Pure speculation, I've never seen it. What does Status: Interfaces: show for that interface?

    The WAN going down after 5 mins could be a clue. What do the logs show when that happens?

    Can you prove the leased line connection with some other hardware?

    Running out of obvious causes here, or ones that I've seen in my limited experience. Anyone else?

    Steve



  • Thanks a lot steve! Here's the status: I've removed the cable!

    PUBTATA interface (em2)

    Status no carrier
    MAC address 00:07:e9:11:5c:b8
    IP address 11x.9x.15x.16x  
    Subnet mask 255.255.255.248
    Media autoselect
    In/out packets 10685420566/220737 (378.27 GB/188.34 MB)
    In/out packets (pass) 95431501/220737 (3.48 GB/188.34 MB)
    In/out packets (block) 10589989065/0 (374.79 GB/0 bytes)
    In/out errors 104092/0
    Collisions 0

    yeah i've tried two d-link nic's and same thing happens, also tried on pfsense 2.0.3, 2.1 = i386.
    The thing i'm confused is the wan interface for the same line works fine, the problem starts when I plug in the Public ip!

    Here's the log steve:

    Jun 17 19:54:38 dhcpleases: Could not deliver signal HUP to process because its pidfile does not exist, No such file or directory.
    Jun 17 19:54:38 check_reload_status: Updating all dyndns
    Jun 17 19:54:38 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:54:38 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:54:38 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:54:38 dnsmasq[59098]: started, version 2.65 cachesize 10000
    Jun 17 19:54:38 dnsmasq[59098]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN DHCP DHCPv6 no-Lua TFTP

    no-conntrack
    Jun 17 19:54:38 php: : SQUID is installed but not started. Not installing "nat" rules.
    Jun 17 19:54:38 dnsmasq[59098]: reading /etc/resolv.conf
    Jun 17 19:54:38 dnsmasq[59098]: using nameserver 121.242.190.210#53
    Jun 17 19:54:38 dnsmasq[59098]: using nameserver 121.242.190.181#53
    Jun 17 19:54:38 dnsmasq[59098]: using nameserver 202.138.103.100#53
    Jun 17 19:54:38 dnsmasq[59098]: using nameserver 202.138.96.2#53
    Jun 17 19:54:38 php: : SQUID is installed but not started. Not installing "pfearly" rules.
    Jun 17 19:54:38 dnsmasq[59098]: read /etc/hosts - 56 addresses
    Jun 17 19:54:38 dnsmasq[59098]: read /etc/hosts - 56 addresses
    Jun 17 19:54:38 php: : SQUID is installed but not started. Not installing "filter" rules.
    Jun 17 19:54:38 dnsmasq[59098]: read /etc/hosts - 56 addresses
    Jun 17 19:54:40 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:54:40 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:54:40 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:54:40 php: : SQUID is installed but not started. Not installing "nat" rules.
    Jun 17 19:54:40 php: : SQUID is installed but not started. Not installing "pfearly" rules.
    Jun 17 19:54:40 php: : SQUID is installed but not started. Not installing "filter" rules.
    Jun 17 19:54:40 check_reload_status: Restarting ipsec tunnels
    Jun 17 19:54:43 php: : Creating rrd update script
    Jun 17 19:54:43 php: : Restarting/Starting all packages.
    Jun 17 19:54:44 php: : Starting Squid
    Jun 17 19:54:44 squid[39102]: Squid Parent: child process 39569 started
    Jun 17 19:54:44 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
    Jun 17 19:54:45 php: : Reloading Squid for configuration sync
    Jun 17 19:54:45 php: : Reloading Squid for configuration sync
    Jun 17 19:54:45 php: : Not calling package sync code for dependency squid of squid because some include files

    are missing.
    Jun 17 19:54:45 php: : Reloading Squid for configuration sync
    Jun 17 19:54:47 php: : Reloading Squid for configuration sync
    Jun 17 19:54:47 php: : Reloading Squid for configuration sync
    Jun 17 19:54:47 check_reload_status: Reloading filter
    Jun 17 19:54:47 php: : Reloading Squid for configuration sync
    Jun 17 19:54:52 squid[39102]: Squid Parent: child process 39569 exited with status 0
    Jun 17 19:54:53 php: : The command '/usr/local/etc/rc.d/squid.sh stop' returned exit code '1', the output was ''
    Jun 17 19:54:54 apinger: ALARM: RELGW(220.227.74.82) *** down ***
    Jun 17 19:54:55 squid[27060]: Squid Parent: child process 27619 started
    Jun 17 19:54:56 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
    Jun 17 19:55:04 check_reload_status: Reloading filter
    Jun 17 19:55:05 php: : Reloading Squid for configuration sync
    Jun 17 19:55:06 check_reload_status: Syncing firewall
    Jun 17 19:55:06 squid[27619]: The url_rewriter helpers are crashing too rapidly, need help!
    Jun 17 19:55:06 squid[27060]: Squid Parent: child process 27619 exited due to signal 6
    Jun 17 19:55:06 kernel: pid 27619 (squid), uid 62: exited on signal 6
    Jun 17 19:55:06 apinger: ALARM: RELGW(220.227.74.82) *** down ***
    Jun 17 19:55:07 apinger: Error while feeding rrdtool: Broken pipe
    Jun 17 19:55:07 apinger: /usr/local/bin/rrdtool respawning too fast, waiting 300s.
    Jun 17 19:55:09 php: : IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
    Jun 17 19:55:09 squid[27060]: Squid Parent: child process 464 started
    Jun 17 19:55:10 login: login on ttyv0 as root
    Jun 17 19:55:10 login: login on ttyv1 as root
    Jun 17 19:55:10 sshlockout[2614]: sshlockout/webConfigurator v3.0 starting up
    Jun 17 19:55:14 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
    Jun 17 19:55:16 check_reload_status: Reloading filter

    apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
    Jun 17 19:55:16 check_reload_status: Reloading filter
    Jun 17 19:55:24 apinger: ALARM: RELGW(220.227.74.82) *** down ***
    Jun 17 19:55:34 check_reload_status: Reloading filter
    Jun 17 19:55:37 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:55:37 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:55:37 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:55:43 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
    Jun 17 19:55:53 apinger: ALARM: RELGW(220.227.74.82) *** down ***
    Jun 17 19:55:53 check_reload_status: Reloading filter
    Jun 17 19:56:03 check_reload_status: Reloading filter
    Jun 17 19:56:06 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:56:06 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:56:06 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:56:09 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
    Jun 17 19:56:13 check_reload_status: Syncing firewall
    Jun 17 19:56:13 dnsmasq[59098]: exiting on receipt of SIGTERM
    Jun 17 19:56:13 dhcpleases: Could not deliver signal HUP to process because its pidfile does not exist, No such

    process.
    Jun 17 19:56:14 dnsmasq[56617]: started, version 2.65 cachesize 10000
    Jun 17 19:56:14 dnsmasq[56617]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN DHCP DHCPv6 no-Lua TFTP

    no-conntrack
    Jun 17 19:56:14 dnsmasq[56617]: reading /etc/resolv.conf
    Jun 17 19:56:14 dnsmasq[56617]: using nameserver 121.242.190.210#53
    Jun 17 19:56:14 dnsmasq[56617]: using nameserver 121.242.190.181#53
    Jun 17 19:56:14 dnsmasq[56617]: using nameserver 202.138.103.100#53
    Jun 17 19:56:14 dnsmasq[56617]: using nameserver 202.138.96.2#53
    Jun 17 19:56:14 dnsmasq[56617]: read /etc/hosts - 56 addresses
    Jun 17 19:56:14 dnsmasq[56617]: read /etc/hosts - 56 addresses
    Jun 17 19:56:14 dnsmasq[56617]: read /etc/hosts - 56 addresses
    Jun 17 19:56:15 dhcpd: Internet Systems Consortium DHCP Server 4.2.4-P2
    Jun 17 19:56:15 dhcpd: Copyright 2004-2012 Internet Systems Consortium.
    Jun 17 19:56:15 dhcpd: All rights reserved.
    Jun 17 19:56:15 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
    Jun 17 19:56:15 dnsmasq[56617]: read /etc/hosts - 56 addresses
    Jun 17 19:56:15 dnsmasq[56617]: read /etc/hosts - 56 addresses
    Jun 17 19:56:18 dnsmasq[56617]: read /etc/hosts - 56 addresses
    Jun 17 19:56:19 apinger: ALARM: RELGW(220.227.74.82) *** down ***
    Jun 17 19:56:19 check_reload_status: Reloading filter
    Jun 17 19:56:23 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
    Jun 17 19:56:27 dnsmasq[56617]: read /etc/hosts - 56 addresses
    Jun 17 19:56:29 check_reload_status: Reloading filter
    Jun 17 19:56:35 apinger: ALARM: RELGW(220.227.74.82) *** down ***
    Jun 17 19:56:45 check_reload_status: Reloading filter
    Jun 17 19:56:47 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:56:47 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:56:47 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:56:56 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
    Jun 17 19:57:06 apinger: ALARM: RELGW(220.227.74.82) *** down ***
    Jun 17 19:57:06 check_reload_status: Reloading filter
    Jun 17 19:57:09 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:57:09 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:57:09 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:57:16 check_reload_status: Reloading filter
    Jun 17 19:57:19 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:57:19 php: : MONITOR: RELGW is down, removing from routing group
    Jun 17 19:57:19 php: : MONITOR: RELGW is down, removing from routing group


  • Netgate Administrator

    @srk3461:

    In/out errors 104092/0

    Well that doesn't look good.

    Also:

    @srk3461:

    In/out packets (block)    10589989065/0 (374.79 GB/0 bytes)

    379GB of blocked packets! Over what period is that?

    The fact that you've seen the same thing across multiple NICs does point to something upstream rather than a hardware/driver problem.

    @srk3461:

    the wan interface for the same line works fine, the problem starts when I plug in the Public ip!

    I'm confused by this statement. How are your interfaces connected? em2 is not a WAN interface? Are your two WANs different?

    Steve

    Edit: That log output would be much easier to read in a code box.  ;)



  • What Intel NIC's are you using? What is is plugging into?

    Duplex mismatch or bad patch cord?



  • Steve I use all intel nic's one is dual port and other two are single, yes two wan connections are different.
    379 GB over 5-6hrs. I noticed it the moment u told me to look at the status-> interface! It was right there staring at me but i wasn't looking!

    @patrick;  I use intel PT server nic and pro/1000. Duplex is half duplex but mostly i'll leave it to pfsense to decide.

    Guys i found out the issue, i.e one of my public ip which was used for ticketing(otrs) and as a apache server was compromised, (dunno how), But I did take pfsense off for a few days for a maintenance and this happnd. It was uploading at 80 Mbits/sec. which was causing all hell probs. I suspect some kinda malware or something.

    When i removed it out of my lan, everything is back to normal.

    Thanks a lot Steve & patrick!


  • Netgate Administrator

    Good catch.  :)
    I'm still not sure I understand how your network is configured then. I would have expected the compromised server to be behind pfSense and hence the outbound traffic to show up. I take it that wasn't the case.
    Whatever was on that server it has shifted a lot of data over a few days at that speed.

    Steve



  • Yeah my network is configured in a weird way i.e my way (it aint easy!)
    Yep you're abs right, it's not behind pfsense, but kinda like "dmz" configured by someone before (now have to configure dmz "properly")

    we're still trying to figure what the hell it sent out!  Thanks to you i was able to pin it!
    Thank you steve and Good day!  :)


Log in to reply