503 - Service Not Available and webintefrace really slow after upgrade
-
CPU busy?
Errors on LAN interface?
-
CPU busy?
ooops CPU usage constantly near 100% !!!
How can I identify what's eating all the cpu?
thanks and sorry but I'm a newbie
Max -
pfSense shell command # top -S -H should give some clues.
Perhaps you have inadvertently enabled polling - see System -> Advanced, click on the Networking tab.
-
pfSense shell command # top -S -H should give some clues.
Perhaps you have inadvertently enabled polling - see System -> Advanced, click on the Networking tab.
Device polling disabled. This is a screenshot of top -S -H
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 171 ki31 0K 32K CPU3 3 188:44 100.00% {idle: cpu3}
11 root 171 ki31 0K 32K CPU1 1 188:40 100.00% {idle: cpu1}
11 root 171 ki31 0K 32K CPU2 2 188:37 100.00% {idle: cpu2}
11 root 171 ki31 0K 32K RUN 0 188:14 100.00% {idle: cpu0}
0 root 76 0 0K 48K sched 0 1:31 0.00% {swapper}
12 root -68 - 0K 160K WAIT 3 0:43 0.00% {irq256: bce0}
12 root -68 - 0K 160K WAIT 1 0:29 0.00% {irq258: bce2}
12 root -32 - 0K 160K WAIT 2 0:28 0.00% {swi4: clock}
15465 root 44 0 3448K 1448K select 1 0:12 0.00% syslogd
37140 root 44 0 53888K 24024K accept 0 0:08 0.00% php
37404 root 67 0 53888K 22256K accept 1 0:04 0.00% php
14 root -16 - 0K 8K - 2 0:04 0.00% yarrow
16118 root 44 0 3316K 924K piperd 0 0:03 0.00% logger
12 root -68 - 0K 160K WAIT 2 0:02 0.00% {irq259: bce3}
16028 root 44 0 5912K 2852K bpf 2 0:01 0.00% tcpdump
37009 root 58 0 53888K 20684K accept 0 0:01 0.00% php
7835 root 76 20 3656K 1512K wait 3 0:01 0.00% sh
36705 root 44 0 54912K 19280K accept 0 0:01 0.00% php
35755 root 44 0 6588K 3824K kqread 2 0:01 0.00% lighttpd
8 root 44 - 0K 8K pftm 2 0:00 0.00% pfpurge
3 root -8 - 0K 8K - 2 0:00 0.00% g_up
58156 root 44 0 3316K 1344K select 0 0:00 0.00% apinger
2 root -8 - 0K 8K - 2 0:00 0.00% g_event
22 root 44 - 0K 8K syncer 2 0:00 0.00% syncer
12 root -64 - 0K 160K WAIT 1 0:00 0.00% {irq20: atapci0}
12 root -64 - 0K 160K WAIT 2 0:00 0.00% {irq22: ehci0 ehc}
15 root -64 - 0K 64K - 0 0:00 0.00% {usbus1}
4 root -8 - 0K 8K - 3 0:00 0.00% g_down
12 root -32 - 0K 160K WAIT 0 0:00 0.00% {swi4: clock}
15 root -64 - 0K 64K - 2 0:00 0.00% {usbus1}
15 root -64 - 0K 64K - 3 0:00 0.00% {usbus0}
39094 root 44 0 5116K 3404K select 2 0:00 0.00% openvpn
15 root -64 - 0K 64K - 2 0:00 0.00% {usbus0}
60860 nobody 44 0 5556K 2624K select 0 0:00 0.00% dnsmasq
12 root -32 - 0K 160K WAIT 1 0:00 0.00% {swi4: clock}
15311 root 44 0 7992K 3532K select 2 0:00 0.00% sshd
40 root -8 - 0K 8K mdwait 0 0:00 0.00% md0
36190 root 76 0 52864K 10600K wait 2 0:00 0.00% php -
Here is the rrd_graph
[update]
I also tried twice a new fresh install but the machine hangs while configuring the interfaces. I have a spare firewall pc (same hardware) and tomorrow I'll make some tests on it.
-
I think the RRD graph is showing a sudden jump in the number of processes (not CPU utilisation) from about 40 to about 100 at around 1230,
The top output doesn't suggest your system is CPU bound - with four CPUs apparently idle there should be plenty of spare CPU.
What about errors on the interfaces? What is reported by pfSense shell command # netstat -i
-
Is this issue resolved? I'm having the exact same problem with my Dell R310. Tested with i386 and amd64 builds.
-
Is this issue resolved?
Apparently not in that the information flow dried up leaving a couple of unanswered questions.
I'm having the exact same problem with my Dell R310.
Then please provide the information requested in earlier replies. And when providing the top -S -H output please don't cut off the top few lines which give the number of processes and memory and swap use summary.
-
Anybody using Dell hardware with a multiport broadcom based NIC should at least try the broadcom settings here:
http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards
This seems to cause a range of symptoms.Steve
-
It seems I have narrowed my problem to the bcm network cards. Whenever I plug a network cable into one of these, the system becomes unresponsive.
I can't even get a terminal session, so i can't run top command.I'll take stephenw10's advice and give the tweaking a shot. I'll post back with the results.
-
I disabled the ports connected on my switch connected to the broadcom network cards and my system became responsive again. I then logged in to the box with ssh, started top -S -H and enabled the ports again. Voila the box became totally unresponsive again. Top command also seems to be locked up. Here is the output from the top command:
last pid: 17723; load averages: 0.00, 0.00, 0.00 up 0+19:29:52 10:45:26 143 processes: 9 running, 94 sleeping, 40 waiting CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 40M Active, 13M Inact, 115M Wired, 84K Cache, 24M Buf, 2820M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 171 ki31 0K 64K CPU7 7 19.4H 100.00% {idle: cpu7} 11 root 171 ki31 0K 64K CPU6 6 19.4H 100.00% {idle: cpu6} 11 root 171 ki31 0K 64K CPU5 5 19.4H 100.00% {idle: cpu5} 11 root 171 ki31 0K 64K CPU4 4 19.4H 100.00% {idle: cpu4} 11 root 171 ki31 0K 64K CPU3 3 19.4H 100.00% {idle: cpu3} 11 root 171 ki31 0K 64K CPU1 1 19.4H 100.00% {idle: cpu1} 11 root 171 ki31 0K 64K RUN 2 19.4H 100.00% {idle: cpu2} 11 root 171 ki31 0K 64K CPU0 0 19.4H 99.27% {idle: cpu0} 12 root -32 - 0K 320K WAIT 1 2:40 0.00% {swi4: clock} 0 root 76 0 0K 176K sched 0 1:18 0.00% {swapper} 18965 root 44 0 3316K 1340K select 2 0:18 0.00% apinger 12 root -44 - 0K 320K WAIT 3 0:05 0.00% {swi1: netisr 0} 3 root -8 - 0K 8K - 0 0:02 0.00% g_up 12 root -64 - 0K 320K WAIT 7 0:02 0.00% {irq20: atapci0} 12 root -68 - 0K 320K WAIT 0 0:02 0.00% {irq256: igb0:que} 12 root -64 - 0K 320K WAIT 7 0:02 0.00% {irq22: ehci0 ehc} 14 root -16 - 0K 8K - 2 0:02 0.00% yarrow 28786 root 44 0 53596K 20084K lockf 5 0:01 0.00% php 12 root -32 - 0K 320K WAIT 1 0:01 0.00% {swi4: clock} 2 root -8 - 0K 8K - 2 0:01 0.00% g_event 15 root -64 - 0K 64K - 5 0:01 0.00% {usbus0} 12 root -68 - 0K 320K WAIT 7 0:01 0.00% {irq274: bce0} 9 root -8 - 0K 8K m:w1 5 0:01 0.00% g_mirror pfSenseMir 4 root -8 - 0K 8K - 0 0:01 0.00% g_down 15 root -64 - 0K 64K - 2 0:01 0.00% {usbus1} 12 root -32 - 0K 320K WAIT 4 0:01 0.00% {swi4: clock} 15 root -64 - 0K 64K - 3 0:01 0.00% {usbus0} 23 root 44 - 0K 8K syncer 2 0:01 0.00% syncer 15 root -64 - 0K 64K - 6 0:01 0.00% {usbus1} 12 root -32 - 0K 320K WAIT 2 0:00 0.00% {swi4: clock} 12 root -68 - 0K 320K WAIT 5 0:00 0.00% {irq261: igb0:que} 12474 root 44 0 5912K 2352K bpf 0 0:00 0.00% tcpdump 24959 root 57 0 54620K 20024K keglim 2 0:00 0.00% php 12 root -24 - 0K 320K WAIT 0 0:00 0.00% {swi6: task queue} 12 root -32 - 0K 320K WAIT 2 0:00 0.00% {swi4: clock} 12 root -32 - 0K 320K WAIT 1 0:00 0.00% {swi4: clock} 12 root -68 - 0K 320K WAIT 1 0:00 0.00% {irq257: igb0:que} 12 root -68 - 0K 320K WAIT 3 0:00 0.00% {irq259: igb0:que} 25118 root 44 0 53596K 16780K accept 0 0:00 0.00% php 8 root 44 - 0K 8K pftm 2 0:00 0.00% pfpurge 24 root 59 - 0K 8K sdflus 2 0:00 0.00% softdepflush 22 root 59 - 0K 8K vlruwt 2 0:00 0.00% vnlru 12 root -68 - 0K 320K WAIT 7 0:00 0.00% {irq263: igb0:que} 21 root 59 - 0K 8K psleep 2 0:00 0.00% bufdaemon 13518 root 45 0 53596K 19708K keglim 5 0:00 0.00% php 12 root -32 - 0K 320K WAIT 3 0:00 0.00% {swi4: clock} 12091 root 44 0 4944K 2492K select 7 0:00 0.00% syslogd 12 root -32 - 0K 320K WAIT 7 0:00 0.00% {swi4: clock} 42301 root 44 0 3712K 2028K CPU2 2 0:00 0.00% top 25939 root 64 20 6588K 4336K keglim 6 0:00 0.00% lighttpd 42511 _ntp 44 0 3316K 1344K select 2 0:00 0.00% ntpd 12 root -68 - 0K 320K WAIT 6 0:00 0.00% {irq262: igb0:que} 20 root 76 ki-6 0K 8K pollid 2 0:00 0.00% idlepoll 58160 root 44 0 3404K 1372K nanslp 2 0:00 0.00% cron 40 root -8 - 0K 8K mdwait 2 0:00 0.00% md0 1389 root 64 20 7992K 3544K select 7 0:00 0.00% sshd 6888 root 68 0 3316K 1040K nanslp 0 0:00 0.00% minicron 17 root 59 - 0K 8K psleep 0 0:00 0.00% pagedaemon
-
After taking this textdump I had to reboot since the system became unresponsive. Currently the system is not booting properly. It hangs after Starting Cron … Done.
No debug messages. I haven't added the tuning parameters in loader.local yet, so i know thats not the problem.I'm one inch from using the two R310's as paperweights and getting new hardware ;D
-
This looks a lot like the broadcom problem mentioned in the guide. Something in the driver/hardware combination causes it to use way more mbufs than other cards. If you get it to reboot what is your mbuf usage on the dashboard?
Alternatively:No problems on my boxes with that driver but if you are on the 64bit build do a netstat -m and check the following line
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
If they are NOT zero then increase the value of kern.ipc.nmbclusters.
Are you running 64bit?
Steve
-
At first I went for 64-bit, but after a while of headbashing i swapped it in for 32-bit.
Didn't manage to get the 32-bit booting again so now i'm back in 64-bit.
I'll do some checking today and i'll post all debug informasjon I can get hold of.
-
Here is a little update.
I have just reinstalled Pfsense with 64-bit. I'm currently only using the two Intel cards.
System seems responsive so far but the MBUF is: 17670/25600.
-
Added the following to loader.conf.local:
kern.ipc.nmbclusters="131072" hw.bce.tso_enable=0 hw.pci.enable_msix=0
This seems to have fixed the issues. I'll let it run for week and do some stresstesting, but so far it seems stable.
A big thanks to wallabybob and stephenw10 for helping me out!
-
Hello,
these settings are avalaible on pf 2.0.1 amd64 ?
Thanks -
Hi,
Yes, you have to create the loader.conf.local in /boot/
Add these three lines:
kern.ipc.nmbclusters="131072" hw.bce.tso_enable=0 hw.pci.enable_msix=0
And reboot.
-
Sorry for this long delay. Being actually the one who posted this request, implies at least an update from me ;-)
I can confirm that modifying the above mentioned files did the trick! Everything running flawlessly! ( using a del r210 with a Broadcom card and an Intel I3 cpu)
Thanks so much for your help!
Regards
max
Italy