Firebox x700 CPU pinned at 100% by system.

C7J0yC3

I have a firebox x700 running today's (June 6th) live CD build, and I have a problem with CPU useage (This has actually been going on for quite some time, just figured enough was enough and it was time to report it). When I SSH in and run top I get this.

last pid: 30401; load averages: 2.58, 2.16, 1.79 up 0+04:11:13 13:15:45
49 processes: 2 running, 47 sleeping
CPU: 0.0% user, 0.0% nice, 100% system, 0.0% interrupt, 0.0% idle
Mem: 25M Active, 10M Inact, 24M Wired, 68K Cache, 17M Buf, 175M Free
Swap: 1024M Total, 1024M Free

PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
17919 root 1 76 0 40356K 14068K piperd 0:13 0.00% php
20775 root 1 44 0 3316K 1336K select 0:04 0.00% apinger
25424 root 1 44 0 6556K 4372K kqread 0:03 0.00% lighttpd
39978 root 1 76 20 3656K 1448K wait 0:02 0.00% sh
15534 root 1 44 0 3448K 1448K select 0:01 0.00% syslogd
16326 root 1 44 0 3316K 912K piperd 0:01 0.00% logger
16179 root 1 44 0 5912K 2404K bpf 0:01 0.00% tcpdump
17511 root 1 44 0 7992K 3540K select 0:01 0.00% sshd
57980 root 12 64 20 4812K 1392K nanslp 0:00 0.00% check_reload_st
46803 root 1 44 0 3352K 1316K select 0:00 0.00% miniupnpd
13537 _dhcp 1 44 0 3316K 1420K select 0:00 0.00% dhclient
37911 _ntp 1 44 0 3316K 1332K select 0:00 0.00% ntpd
47460 root 1 48 0 3404K 1364K nanslp 0:00 0.00% cron
25774 root 1 50 0 40356K 8528K wait 0:00 0.00% php
31187 root 1 44 0 4696K 2380K pause 0:00 0.00% tcsh
48728 root 1 44 0 3316K 1356K select 0:00 0.00% ntpd
55597 root 1 64 0 3316K 1024K nanslp 0:00 0.00% minicron
21044 root 1 44 0 4480K 1616K piperd 0:00 0.00% rrdtool
890 root 1 57 0 3684K 1576K wait 0:00 0.00% login
23665 root 1 56 0 3656K 1500K wait 0:00 0.00% sh
17138 root 1 44 0 3436K 1416K select 0:00 0.00% inetd
1200 root 1 44 0 3316K 960K piperd 0:00 0.00% sshlockout_pf
1509 root 1 76 0 3656K 1436K wait 0:00 0.00% sh
3772 root 1 76 0 3656K 1436K ttyin 0:00 0.00% sh
30401 root 1 45 0 3712K 1808K RUN 0:00 0.00% top
29985 root 1 76 0 3524K 1280K pipewr 0:00 0.00% grep
29577 root 1 76 0 3656K 1480K wait 0:00 0.00% sh
30085 root 1 96 0 3524K 1276K RUN 0:00 0.00% grep
30179 root 1 50 0 3316K 972K piperd 0:00 0.00% tail
7698 root 1 44 0 5272K 3040K select 0:00 0.00% sshd
131 root 1 44 0 1888K 532K select 0:00 0.00% devd
28975 nobody 1 76 0 4528K 2408K select 0:00 0.00% dnsmasq
29669 root 1 76 0 2072K 868K pipdwt 0:00 0.00% clog
55826 root 1 51 0 3316K 1024K nanslp 0:00 0.00% minicron
22676 root 1 76 20 1564K 580K nanslp 0:00 0.00% sleep
7888 root 1 76 0 3316K 1284K select 0:00 0.00% dhclient
56083 root 1 57 0 3316K 1024K nanslp 0:00 0.00% minicron
29534 root 1 76 0 3316K 1288K select 0:00 0.00% dhcrelay

I am using a 40GB Toshiba laptop harddrive instead of a CF or microdrive so I don't know if that is the problem. Also I upgraded the RAM from 256MB to 512MB.

eazydor

could you give output of "top -o cpu"?

C7J0yC3

$ top -o cpu
last pid: 30823; load averages: 2.26, 2.45, 2.47 up 0+07:01:57 16:06:29
44 processes: 1 running, 43 sleeping

Mem: 24M Active, 11M Inact, 25M Wired, 68K Cache, 18M Buf, 174M Free
Swap: 1024M Total, 1024M Free

PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
12705 root 1 54 0 40356K 13328K piperd 0:00 0.98% php
20775 root 1 44 0 3316K 1336K select 0:07 0.00% apinger
25424 root 1 44 0 7580K 4688K kqread 0:05 0.00% lighttpd
39978 root 1 76 20 3656K 1448K wait 0:04 0.00% sh
17511 root 1 44 0 7992K 3540K select 0:02 0.00% sshd
15534 root 1 44 0 3448K 1448K select 0:02 0.00% syslogd
16326 root 1 44 0 3316K 912K piperd 0:01 0.00% logger
16179 root 1 44 0 5912K 2488K bpf 0:01 0.00% tcpdump
57980 root 12 64 20 4812K 1392K nanslp 0:01 0.00% check_reload_status
46803 root 1 44 0 3352K 1316K select 0:01 0.00% miniupnpd
13537 _dhcp 1 44 0 3316K 1420K select 0:00 0.00% dhclient
37911 _ntp 1 44 0 3316K 1332K select 0:00 0.00% ntpd
47460 root 1 44 0 3404K 1364K nanslp 0:00 0.00% cron
25774 root 1 50 0 40356K 8528K wait 0:00 0.00% php
31187 root 1 44 0 4696K 2384K ttyin 0:00 0.00% tcsh
48728 root 1 44 0 3316K 1356K select 0:00 0.00% ntpd
55597 root 1 64 0 3316K 1024K nanslp 0:00 0.00% minicron
21044 root 1 44 0 4480K 1616K piperd 0:00 0.00% rrdtool

eazydor

im not quite sure, but another user had similar problems, with apinger, see link..

http://forum.pfsense.org/index.php/topic,26360.msg137213.html#msg137213

n.b:your mem-upgrade seems not to be recognized, since top is displaying 256 MB instead of 512.

try the suggested from the post.. (reboot or new snapshot)

cmb

That's not showing what's responsible, try: top -S

It's not apinger, that wouldn't be showing under system.

If you're using polling, it's probably just normal from idle_poll.

C7J0yC3

$ top -S
last pid: 853; load averages: 2.59, 2.47, 2.47 up 0+23:18:29 08:23:01
83 processes: 5 running, 68 sleeping, 10 waiting

Mem: 24M Active, 11M Inact, 26M Wired, 232K Cache, 18M Buf, 173M Free
Swap: 1024M Total, 1024M Free

PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
16 root 1 76 ki-6 0K 8K RUN 21.3H 100.00% idlepoll
11 root 12 -60 - 0K 96K WAIT 6:26 0.00% intr
0 root 7 8 0 0K 48K - 0:43 0.00% kernel
20775 root 1 44 0 3316K 1336K select 0:24 0.00% apinger
13 root 1 -16 - 0K 8K - 0:22 0.00% yarrow
25424 root 1 44 0 7580K 4748K kqread 0:20 0.00% lighttpd
10 root 1 171 ki31 0K 8K RUN 0:17 0.00% idle
39978 root 1 76 20 3656K 1448K wait 0:13 0.00% sh
17511 root 1 44 0 7992K 3540K select 0:07 0.00% sshd
15534 root 1 44 0 3448K 1448K select 0:07 0.00% syslogd
19 root 1 44 - 0K 8K syncer 0:05 0.00% syncer
35110 root 1 73 0 40356K 13324K piperd 0:04 0.00% php
16326 root 1 44 0 3316K 912K piperd 0:04 0.00% logger
16179 root 1 44 0 5912K 2496K bpf 0:03 0.00% tcpdump
2 root 1 -8 - 0K 8K - 0:03 0.00% g_event
57980 root 12 64 20 4812K 1392K nanslp 0:03 0.00% check_reload_status
3 root 1 -8 - 0K 8K - 0:03 0.00% g_up
4 root 1 -8 - 0K 8K - 0:02 0.00% g_down

It is in fact from idlepool. Is there any real advantage to having device polling turned on?

Perry

Is there any real advantage to having device polling turned on?

a earlier reply

I've done extensive testing over the last couple weeks that consistently shows significant performance drops (25-30+%) in firewalling scenarios with polling enabled, regardless of hz setting.

cmb

@Perry:

I've done extensive testing over the last couple weeks that consistently shows significant performance drops (25-30+%) in firewalling scenarios with polling enabled, regardless of hz setting.

That's actually the point - limiting interrupts so the management interfaces don't become unresponsive at very high load (where "very high" depends on the specs of your hardware). It will reduce your maximum achievable throughput to some extent.

In the vast majority of cases you shouldn't have enough throughput to max out your hardware, and shouldn't use polling.

C7J0yC3

Polling has been disabled, and my firebox is now much happier. Thank you guys for the help!

Other then that one issue I have to say I have been having a much better experience with 2.0 then I did with 1.2.3 on my firebox, I assume a lot of that is FreeBSD 8.1 but even still Bravo!