Intel Nic (em) High Cpu Usage

srk3461

Hi Guys been having this problem for days, I use 4 Intel Nic with 2 wan connection,
Cpu usage is always high in one my Intel Nic's (em2). running 2.0.3 Amd64

Following things have been tried:
changed the nic = no (used dlink & broadcom = still cpu usage is high)
Fresh install = no
using only connection = no

I've attached a screenshot of "system activity" on fresh install with only internet connectivity.

sc.jpg_thumb

stephenw10

What are you using em2 for?

That display doesn't show the type of CPU usage. Please post the output of 'top -SHd3' from the command line if you can.
If you remove the cable from em2 does this effect the cpu load?

Steve

srk3461

Thanks for the reply steve,

I used it for a public ip. Yeah if i disable interface or remove the cable from the nic. cpu usage goes low about 5-10%.

here's the output:
[2.0.3-RELEASE]/root(1): top -SHd3
last pid: 6608; load averages: 0.58, 0.80, 0.80 up 0+00:55:20 18:12:29
123 processes: 3 running, 101 sleeping, 19 waiting
CPU: 0.0% user, 0.0% nice, 32.5% system, 0.0% interrupt, 67.5% idle
Mem: 108M Active, 48M Inact, 446M Wired, 652K Cache, 414M Buf, 3295M Free
Swap: 10G Total, 10G Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 171 ki31 0K 32K CPU0 0 44:02 99.76% {idle: cpu0}
0 root -68 0 0K 192K - 1 34:32 66.06% {em2 taskq}
11 root 171 ki31 0K 32K RUN 1 19:12 33.89% {idle: cpu1}
0 root -68 0 0K 192K - 0 0:10 0.29% {em3 taskq}
243 root 76 20 7004K 1308K kqread 0 9:05 0.00% check_reload_stat
0 root 76 0 0K 192K sched 0 0:53 0.00% {swapper}
12 root -32 - 0K 304K WAIT 0 0:09 0.00% {swi4: clock}
0 root -68 0 0K 192K - 0 0:07 0.00% {em0 que}
14 root 44 - 0K 16K - 0 0:06 0.00% yarrow
41917 root 62 0 113M 32404K accept 1 0:04 0.00% php
0 root -68 0 0K 192K - 0 0:04 0.00% {em1 que}
4 root -8 - 0K 16K - 0 0:03 0.00% g_down
12 root -68 - 0K 304K WAIT 0 0:02 0.00% {irq19: em3 uhci1
3 root -8 - 0K 16K - 0 0:02 0.00% g_up
25057 root 53 0 113M 34012K accept 0 0:01 0.00% php
12 root -64 - 0K 304K WAIT 0 0:01 0.00% {irq15: ata1}

stephenw10

Hmm, not interrupt load then. So it is having to deal with some traffic, are you seeing a lot of firewall hits in the logs?
One thing that can cause an apparent high cpu load is if the cpu has gone into a low power/speed state in which case there is no problem.
What do you see from 'sysctl dev.cpu'?

Steve

srk3461

you're right damn right, yes i see lot of firewall hits on the public ip.
It's not a single public ip it lies on the sub-net /29.

This is what i get, btw i'm using core 2 duo 2.2 ghz procy and gigabyte g41 Mobo.

[2.0.3-RELEASE]/root(1): sysctl dev.cpu
dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=_PR_.CPU0
dev.cpu.0.%pnpinfo: _HID=none UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.freq: 2199
dev.cpu.0.freq_levels: 2199/-1 1924/-1 1649/-1 1374/-1 1099/-1 824/-1 549/-1 274/-1
dev.cpu.0.cx_supported: C1/0
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% last 500us
dev.cpu.1.%desc: ACPI CPU
dev.cpu.1.%driver: cpu
dev.cpu.1.%location: handle=_PR.CPU1
dev.cpu.1.%pnpinfo: _HID=none _UID=0
dev.cpu.1.%parent: acpi0
dev.cpu.1.cx_supported: C1/0
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_usage: 100.00% last 500us

stephenw10

@srk3461:

dev.cpu.0.freq: 2199
dev.cpu.0.freq_levels: 2199/-1 1924/-1 1649/-1 1374/-1 1099/-1 824/-1 549/-1 274/-1
dev.cpu.0.cx_supported: C1/0
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% last 500us

Ok, so this shows it is supporting some type of power saving modes, could be speedstep or ACPI throttling, but isn't using it. It's still running at 2.2GHz. Also that it only supports C1 as it's lowest C state. States C2+3 provide much more power savings.
I assume you haven't enabled powerd in pfSense?

Presumably then it is just having to deal with a massive amount of unwanted packets on WAN? Weird :-\ What type of connection is it? When you say 'a lot of firewall hits' how many, say, per minute are you talking about?

Steve

srk3461

Nope steve, I haven't enabled powerD. It's a 2Mbps leased line(optic). About 100 hits in a min.

And one more thing steve, if i enable, after 5 min to 10 min my wan goes offline alternatively (mine is 2 wan connection).

stephenw10

Well 100 hits a minute isn't going to strain the CPU much. Perhaps there is something odd happening like malformed packets other traffic that causes the NIC to work but doesn't actaully end up as IP hits at the firewall. Pure speculation, I've never seen it. What does Status: Interfaces: show for that interface?

The WAN going down after 5 mins could be a clue. What do the logs show when that happens?

Can you prove the leased line connection with some other hardware?

Running out of obvious causes here, or ones that I've seen in my limited experience. Anyone else?

Steve

srk3461

Thanks a lot steve! Here's the status: I've removed the cable!

PUBTATA interface (em2)

Status no carrier
MAC address 00:07:e9:11:5c:b8
IP address 11x.9x.15x.16x
Subnet mask 255.255.255.248
Media autoselect
In/out packets 10685420566/220737 (378.27 GB/188.34 MB)
In/out packets (pass) 95431501/220737 (3.48 GB/188.34 MB)
In/out packets (block) 10589989065/0 (374.79 GB/0 bytes)
In/out errors 104092/0
Collisions 0

yeah i've tried two d-link nic's and same thing happens, also tried on pfsense 2.0.3, 2.1 = i386.
The thing i'm confused is the wan interface for the same line works fine, the problem starts when I plug in the Public ip!

Here's the log steve:

Jun 17 19:54:38 dhcpleases: Could not deliver signal HUP to process because its pidfile does not exist, No such file or directory.
Jun 17 19:54:38 check_reload_status: Updating all dyndns
Jun 17 19:54:38 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:54:38 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:54:38 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:54:38 dnsmasq[59098]: started, version 2.65 cachesize 10000
Jun 17 19:54:38 dnsmasq[59098]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN DHCP DHCPv6 no-Lua TFTP

no-conntrack
Jun 17 19:54:38 php: : SQUID is installed but not started. Not installing "nat" rules.
Jun 17 19:54:38 dnsmasq[59098]: reading /etc/resolv.conf
Jun 17 19:54:38 dnsmasq[59098]: using nameserver 121.242.190.210#53
Jun 17 19:54:38 dnsmasq[59098]: using nameserver 121.242.190.181#53
Jun 17 19:54:38 dnsmasq[59098]: using nameserver 202.138.103.100#53
Jun 17 19:54:38 dnsmasq[59098]: using nameserver 202.138.96.2#53
Jun 17 19:54:38 php: : SQUID is installed but not started. Not installing "pfearly" rules.
Jun 17 19:54:38 dnsmasq[59098]: read /etc/hosts - 56 addresses
Jun 17 19:54:38 dnsmasq[59098]: read /etc/hosts - 56 addresses
Jun 17 19:54:38 php: : SQUID is installed but not started. Not installing "filter" rules.
Jun 17 19:54:38 dnsmasq[59098]: read /etc/hosts - 56 addresses
Jun 17 19:54:40 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:54:40 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:54:40 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:54:40 php: : SQUID is installed but not started. Not installing "nat" rules.
Jun 17 19:54:40 php: : SQUID is installed but not started. Not installing "pfearly" rules.
Jun 17 19:54:40 php: : SQUID is installed but not started. Not installing "filter" rules.
Jun 17 19:54:40 check_reload_status: Restarting ipsec tunnels
Jun 17 19:54:43 php: : Creating rrd update script
Jun 17 19:54:43 php: : Restarting/Starting all packages.
Jun 17 19:54:44 php: : Starting Squid
Jun 17 19:54:44 squid[39102]: Squid Parent: child process 39569 started
Jun 17 19:54:44 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
Jun 17 19:54:45 php: : Reloading Squid for configuration sync
Jun 17 19:54:45 php: : Reloading Squid for configuration sync
Jun 17 19:54:45 php: : Not calling package sync code for dependency squid of squid because some include files

are missing.
Jun 17 19:54:45 php: : Reloading Squid for configuration sync
Jun 17 19:54:47 php: : Reloading Squid for configuration sync
Jun 17 19:54:47 php: : Reloading Squid for configuration sync
Jun 17 19:54:47 check_reload_status: Reloading filter
Jun 17 19:54:47 php: : Reloading Squid for configuration sync
Jun 17 19:54:52 squid[39102]: Squid Parent: child process 39569 exited with status 0
Jun 17 19:54:53 php: : The command '/usr/local/etc/rc.d/squid.sh stop' returned exit code '1', the output was ''
Jun 17 19:54:54 apinger: ALARM: RELGW(220.227.74.82) *** down ***
Jun 17 19:54:55 squid[27060]: Squid Parent: child process 27619 started
Jun 17 19:54:56 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
Jun 17 19:55:04 check_reload_status: Reloading filter
Jun 17 19:55:05 php: : Reloading Squid for configuration sync
Jun 17 19:55:06 check_reload_status: Syncing firewall
Jun 17 19:55:06 squid[27619]: The url_rewriter helpers are crashing too rapidly, need help!
Jun 17 19:55:06 squid[27060]: Squid Parent: child process 27619 exited due to signal 6
Jun 17 19:55:06 kernel: pid 27619 (squid), uid 62: exited on signal 6
Jun 17 19:55:06 apinger: ALARM: RELGW(220.227.74.82) *** down ***
Jun 17 19:55:07 apinger: Error while feeding rrdtool: Broken pipe
Jun 17 19:55:07 apinger: /usr/local/bin/rrdtool respawning too fast, waiting 300s.
Jun 17 19:55:09 php: : IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
Jun 17 19:55:09 squid[27060]: Squid Parent: child process 464 started
Jun 17 19:55:10 login: login on ttyv0 as root
Jun 17 19:55:10 login: login on ttyv1 as root
Jun 17 19:55:10 sshlockout[2614]: sshlockout/webConfigurator v3.0 starting up
Jun 17 19:55:14 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
Jun 17 19:55:16 check_reload_status: Reloading filter

apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
Jun 17 19:55:16 check_reload_status: Reloading filter
Jun 17 19:55:24 apinger: ALARM: RELGW(220.227.74.82) *** down ***
Jun 17 19:55:34 check_reload_status: Reloading filter
Jun 17 19:55:37 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:55:37 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:55:37 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:55:43 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
Jun 17 19:55:53 apinger: ALARM: RELGW(220.227.74.82) *** down ***
Jun 17 19:55:53 check_reload_status: Reloading filter
Jun 17 19:56:03 check_reload_status: Reloading filter
Jun 17 19:56:06 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:56:06 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:56:06 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:56:09 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
Jun 17 19:56:13 check_reload_status: Syncing firewall
Jun 17 19:56:13 dnsmasq[59098]: exiting on receipt of SIGTERM
Jun 17 19:56:13 dhcpleases: Could not deliver signal HUP to process because its pidfile does not exist, No such

process.
Jun 17 19:56:14 dnsmasq[56617]: started, version 2.65 cachesize 10000
Jun 17 19:56:14 dnsmasq[56617]: compile time options: IPv6 GNU-getopt no-DBus i18n IDN DHCP DHCPv6 no-Lua TFTP

no-conntrack
Jun 17 19:56:14 dnsmasq[56617]: reading /etc/resolv.conf
Jun 17 19:56:14 dnsmasq[56617]: using nameserver 121.242.190.210#53
Jun 17 19:56:14 dnsmasq[56617]: using nameserver 121.242.190.181#53
Jun 17 19:56:14 dnsmasq[56617]: using nameserver 202.138.103.100#53
Jun 17 19:56:14 dnsmasq[56617]: using nameserver 202.138.96.2#53
Jun 17 19:56:14 dnsmasq[56617]: read /etc/hosts - 56 addresses
Jun 17 19:56:14 dnsmasq[56617]: read /etc/hosts - 56 addresses
Jun 17 19:56:14 dnsmasq[56617]: read /etc/hosts - 56 addresses
Jun 17 19:56:15 dhcpd: Internet Systems Consortium DHCP Server 4.2.4-P2
Jun 17 19:56:15 dhcpd: Copyright 2004-2012 Internet Systems Consortium.
Jun 17 19:56:15 dhcpd: All rights reserved.
Jun 17 19:56:15 dhcpd: For info, please visit https://www.isc.org/software/dhcp/
Jun 17 19:56:15 dnsmasq[56617]: read /etc/hosts - 56 addresses
Jun 17 19:56:15 dnsmasq[56617]: read /etc/hosts - 56 addresses
Jun 17 19:56:18 dnsmasq[56617]: read /etc/hosts - 56 addresses
Jun 17 19:56:19 apinger: ALARM: RELGW(220.227.74.82) *** down ***
Jun 17 19:56:19 check_reload_status: Reloading filter
Jun 17 19:56:23 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
Jun 17 19:56:27 dnsmasq[56617]: read /etc/hosts - 56 addresses
Jun 17 19:56:29 check_reload_status: Reloading filter
Jun 17 19:56:35 apinger: ALARM: RELGW(220.227.74.82) *** down ***
Jun 17 19:56:45 check_reload_status: Reloading filter
Jun 17 19:56:47 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:56:47 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:56:47 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:56:56 apinger: alarm canceled: RELGW(220.227.74.82) *** down ***
Jun 17 19:57:06 apinger: ALARM: RELGW(220.227.74.82) *** down ***
Jun 17 19:57:06 check_reload_status: Reloading filter
Jun 17 19:57:09 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:57:09 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:57:09 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:57:16 check_reload_status: Reloading filter
Jun 17 19:57:19 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:57:19 php: : MONITOR: RELGW is down, removing from routing group
Jun 17 19:57:19 php: : MONITOR: RELGW is down, removing from routing group

stephenw10

@srk3461:

In/out errors 104092/0

Well that doesn't look good.

Also:

@srk3461:

In/out packets (block) 10589989065/0 (374.79 GB/0 bytes)

379GB of blocked packets! Over what period is that?

The fact that you've seen the same thing across multiple NICs does point to something upstream rather than a hardware/driver problem.

@srk3461:

the wan interface for the same line works fine, the problem starts when I plug in the Public ip!

I'm confused by this statement. How are your interfaces connected? em2 is not a WAN interface? Are your two WANs different?

Steve

Edit: That log output would be much easier to read in a code box. ;)

ZGamer

What Intel NIC's are you using? What is is plugging into?

Duplex mismatch or bad patch cord?

srk3461

Steve I use all intel nic's one is dual port and other two are single, yes two wan connections are different.
379 GB over 5-6hrs. I noticed it the moment u told me to look at the status-> interface! It was right there staring at me but i wasn't looking!

@patrick; I use intel PT server nic and pro/1000. Duplex is half duplex but mostly i'll leave it to pfsense to decide.

Guys i found out the issue, i.e one of my public ip which was used for ticketing(otrs) and as a apache server was compromised, (dunno how), But I did take pfsense off for a few days for a maintenance and this happnd. It was uploading at 80 Mbits/sec. which was causing all hell probs. I suspect some kinda malware or something.

When i removed it out of my lan, everything is back to normal.

Thanks a lot Steve & patrick!

stephenw10

Good catch. :)
I'm still not sure I understand how your network is configured then. I would have expected the compromised server to be behind pfSense and hence the outbound traffic to show up. I take it that wasn't the case.
Whatever was on that server it has shifted a lot of data over a few days at that speed.

Steve

srk3461

Yeah my network is configured in a weird way i.e my way (it aint easy!)
Yep you're abs right, it's not behind pfsense, but kinda like "dmz" configured by someone before (now have to configure dmz "properly")

we're still trying to figure what the hell it sent out! Thanks to you i was able to pin it!
Thank you steve and Good day! :)