High Load Average after upgrade [SOLVED]

Cursorkeys

I've just upgraded to 2.3.1 and I'm seeing high load averages on a fairly beefy server.

I've taken some readings and the CPUs look fairly idle, so I guess it's waiting on some kind of I/O:

top -SH

last pid: 13233; load averages: 1.06, 1.08, 1.06 up 12+02:56:29 11:45:44
271 processes: 18 running, 207 sleeping, 46 waiting
CPU: 1.8% user, 0.9% nice, 2.3% system, 0.5% interrupt, 94.6% idle
Mem: 731M Active, 12G Inact, 1564M Wired, 1396K Cache, 1579M Buf, 1818M Free
Swap: 32G Total, 53M Used, 32G Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0K 256K CPU10 10 286.8H 100.00% idle{idle: cpu10}
11 root 155 ki31 0K 256K CPU0 0 284.0H 100.00% idle{idle: cpu0}
11 root 155 ki31 0K 256K CPU5 5 283.4H 100.00% idle{idle: cpu5}
11 root 155 ki31 0K 256K CPU12 12 282.7H 100.00% idle{idle: cpu12}
11 root 155 ki31 0K 256K CPU1 1 282.5H 100.00% idle{idle: cpu1}
11 root 155 ki31 0K 256K CPU7 7 282.3H 100.00% idle{idle: cpu7}
11 root 155 ki31 0K 256K CPU8 8 281.2H 100.00% idle{idle: cpu8}
11 root 155 ki31 0K 256K CPU6 6 282.1H 99.76% idle{idle: cpu6}
11 root 155 ki31 0K 256K RUN 2 281.5H 99.66% idle{idle: cpu2}
11 root 155 ki31 0K 256K RUN 4 283.6H 98.19% idle{idle: cpu4}
11 root 155 ki31 0K 256K CPU14 14 279.8H 97.75% idle{idle: cpu14}
11 root 155 ki31 0K 256K CPU13 13 283.0H 97.46% idle{idle: cpu13}
11 root 155 ki31 0K 256K CPU15 15 284.5H 97.27% idle{idle: cpu15}
11 root 155 ki31 0K 256K CPU9 9 281.2H 96.29% idle{idle: cpu9}
11 root 155 ki31 0K 256K CPU3 3 281.6H 95.36% idle{idle: cpu3}
11 root 155 ki31 0K 256K CPU11 11 282.0H 87.70% idle{idle: cpu11}
2653 root 49 20 13280M 11296M bpf 11 67:07 15.87% snort{snort}
1844 squid 28 0 571M 321M kqread 11 574:16 14.89% squid
23142 clamav 28 0 496M 420M uwait 11 0:06 8.40% clamd{clamd}
0 root -92 - 0K 224K - 4 331:51 8.06% kernel{em1 taskq}
0 root -92 - 0K 224K - 14 669:43 7.76% kernel{em0 taskq}
23142 clamav 24 0 496M 420M uwait 13 0:14 1.56% clamd{clamd}
23142 clamav 23 0 496M 420M uwait 7 0:02 1.56% clamd{clamd}
23142 clamav 21 0 496M 420M select 12 241:02 1.07% clamd{clamd}
6517 root 52 0 262M 20960K accept 3 0:00 0.88% php-fpm
24716 clamav 20 0 55376K 4476K select 6 0:00 0.78% c-icap{c-icap}
12 root -60 - 0K 736K WAIT 0 48:07 0.68% intr{swi4: clock}
25828 root 20 0 21856K 3740K CPU0 0 0:04 0.49% top
23142 clamav 22 0 496M 420M uwait 2 0:03 0.39% clamd{clamd}
12 root -72 - 0K 736K WAIT 10 74:00 0.00% intr{swi1: netisr 10}
8 root -16 - 0K 32K psleep 4 56:11 0.00% pagedaemon{pagedaemon}
6 root -16 - 0K 16K pftm 3 43:27 0.00% pf purge
90087 root 20 0 236M 29284K nanslp 4 37:50 0.00% php
20 root 16 - 0K 16K syncer 14 30:48 0.00% syncer
80674 root 20 0 60724K 9412K kqread 4 19:11 0.00% lighttpd_pfb
14 root -8 - 0K 48K - 6 17:17 0.00% geom{g_up}
19 root 20 - 0K 32K sdflus 14 13:15 0.00% bufdaemon{/ worker}
9492 root 20 0 16676K 1860K bpf 2 9:38 0.00% filterlog
57199 root 52 20 260M 147M urdlck 13 9:11 0.00% snort{snort}
15 root -16 - 0K 16K - 3 7:56 0.00% rand_harvestq
12 root -88 - 0K 736K WAIT 8 5:34 0.00% intr{irq28: aac0}
74829 root 20 0 14516K 1920K select 13 5:12 0.00% syslogd
68737 root 52 0 262M 11756K kqread 4 4:54 0.00% php-fpm
0 root -16 - 0K 224K swapin 15 4:12 0.00% kernel{swapper}
12301 root 20 0 19108K 2056K nanslp 0 3:48 0.00% dpinger{dpinger}
11892 root 20 0 19108K 2056K nanslp 1 3:36 0.00% dpinger{dpinger}
23142 clamav 20 0 496M 420M select 9 3:19 0.00% clamd{clamd}
47516 root 20 0 30140K 17968K select 9 3:16 0.00% ntpd{ntpd}
30682 squid 20 0 37752K 3884K select 8 3:10 0.00% pinger
17206 root 20 0 28608K 2844K kqread 8 3:04 0.00% nginx
56396 squid 20 0 37752K 3892K select 5 2:50 0.00% pinger
16877 root 20 0 28608K 2940K kqread 1 2:46 0.00% nginx
87981 squid 20 0 37752K 3892K select 13 2:40 0.00% pinger
72469 squid 20 0 37752K 3892K select 1 2:22 0.00% pinger
64219 squid 20 0 37752K 3892K select 6 2:08 0.00% pinger
43200 unbound 20 0 224M 153M kqread 5 1:59 0.00% unbound{unbound}
72460 squid 20 0 37752K 3892K select 14 1:56 0.00% pinger
62363 squid 20 0 37752K 3920K select 15 1:49 0.00% pinger
55801 clamav 20 0 25292K 2936K select 4 1:40 0.00% c-icap
65037 squid 20 0 37752K 3928K select 2 1:32 0.00% pinger

iostat 15

tty aacd0 md0 cd0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
0 5 61.47 7 0.42 0.00 0 0.00 0.00 0 0.00 1 1 1 0 97
0 16 50.81 6 0.30 0.00 0 0.00 0.00 0 0.00 1 1 2 0 96
0 5 57.38 5 0.27 0.00 0 0.00 0.00 0 0.00 1 1 2 0 96
0 5 52.39 6 0.31 0.00 0 0.00 0.00 0 0.00 1 0 1 0 97
0 5 60.76 5 0.28 0.00 0 0.00 0.00 0 0.00 1 1 2 0 97
0 5 48.44 6 0.28 0.00 0 0.00 0.00 0 0.00 1 1 2 0 96
0 5 52.83 5 0.27 0.00 0 0.00 0.00 0 0.00 1 1 2 0 96
0 5 51.45 7 0.33 0.00 0 0.00 0.00 0 0.00 1 0 1 0 98
0 5 54.21 5 0.24 0.00 0 0.00 0.00 0 0.00 1 1 2 0 95

vmstat 15

procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr aa0 md0 in sy cs us sy id
0 0 7 30634M 1819M 1099 0 1 1 1305 1057 0 0 2260 160 2216 2 1 97
0 0 7 30621M 1820M 3619 0 0 0 4018 296 6 0 5939 8469 16927 2 1 96
2 0 7 30621M 1820M 2459 0 0 0 2745 313 5 0 5852 6947 16585 2 2 96
0 0 7 30621M 1824M 3396 0 0 0 3913 316 5 0 5906 8224 16667 3 3 94
0 0 7 30642M 1819M 3813 0 0 1 4282 316 3 0 6131 9414 17290 2 2 96

gstat -I -15s

netstat -Lan

Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen Local Address
tcp4 0/0/128 127.0.0.1.3128
tcp4 0/0/128 192.168.158.88.3128
tcp4 0/0/128 *.1344
tcp4 0/0/128 127.0.0.1.953
tcp4 0/0/128 *.53
tcp6 0/0/128 *.53
tcp4 0/0/128 *.8443
tcp4 0/0/128 *.8081
tcp6 0/0/128 *.80
tcp4 0/0/128 *.80
tcp4 0/0/128 *.22
tcp6 0/0/128 *.22
Netgraph sockets
Type Recv-Q Send-Q Node Address #Hooks
ctrl 0 0 [7e]: 0
ctrl 0 0 [48]: 0
unix 0/0/30 /var/run/qstats
unix 0/0/128 /var/run/clamav/clamd.sock
unix 0/0/128 /var/run/php-fpm.socket
unix 0/0/5 /var/run/dpinger_###ROUTER01_192.168.158.88_192.168.158.119.sock
unix 0/0/5 /var/run/dpinger_GW_WAN_###.###.###.###_###.###.###.###.sock
unix 0/0/4 /var/run/devd.pipe
unix 0/0/30 /var/run/check_reload_status
unix 0/0/4 /var/run/devd.seqpacket.pipe

Nothing seems to be jumping out at me, can anyone with a more trained eye see what the bottleneck is?

Edit: More interesting things in log files:

resolver log entries

Jun 21 14:36:17 pfSense unbound: [43200:9] notice: sendmsg failed: No buffer space available
–-A good run of these---

gateway log messages

Jun 22 09:25:57 pfSense dpinger: xxxROUTER01 192.168.158.119: sendto error: 55
–-for all 1000 entries---

192.168.158.119 is a router. It can be successfully pinged from the pfSense box.

Cursorkeys

Edit: Nope, scratch that. The WCPU gobbling processes are indeed the virtual 'idle' processes. All the CPUs are definitely not busy at all.

Any suggestions as to what to consider next gladly accepted!

Edit2:

Seeing this while doing a test ping:

[2.3.1-RELEASE][root@pfSense.xxxxxxxx.co.uk]/root: ping 192.168.158.119
PING 192.168.158.119 (192.168.158.119): 56 data bytes
64 bytes from 192.168.158.119: icmp_seq=0 ttl=64 time=35.216 ms
64 bytes from 192.168.158.119: icmp_seq=1 ttl=64 time=22.264 ms
64 bytes from 192.168.158.119: icmp_seq=2 ttl=64 time=28.457 ms
64 bytes from 192.168.158.119: icmp_seq=3 ttl=64 time=48.814 ms
64 bytes from 192.168.158.119: icmp_seq=4 ttl=64 time=33.435 ms
64 bytes from 192.168.158.119: icmp_seq=5 ttl=64 time=35.962 ms
64 bytes from 192.168.158.119: icmp_seq=6 ttl=64 time=14.627 ms
64 bytes from 192.168.158.119: icmp_seq=7 ttl=64 time=38.668 ms
ping: sendto: No buffer space available
ping: sendto: No buffer space available
64 bytes from 192.168.158.119: icmp_seq=10 ttl=64 time=36.417 ms
64 bytes from 192.168.158.119: icmp_seq=11 ttl=64 time=52.092 ms
64 bytes from 192.168.158.119: icmp_seq=12 ttl=64 time=22.819 ms
64 bytes from 192.168.158.119: icmp_seq=13 ttl=64 time=38.137 ms
64 bytes from 192.168.158.119: icmp_seq=14 ttl=64 time=34.070 ms
64 bytes from 192.168.158.119: icmp_seq=15 ttl=64 time=23.527 ms
64 bytes from 192.168.158.119: icmp_seq=16 ttl=64 time=44.394 ms
64 bytes from 192.168.158.119: icmp_seq=17 ttl=64 time=15.814 ms
64 bytes from 192.168.158.119: icmp_seq=18 ttl=64 time=38.633 ms
^C
–- 192.168.158.119 ping statistics ---
19 packets transmitted, 17 packets received, 10.5% packet loss
round-trip min/avg/max/stddev = 14.627/33.138/52.092/10.356 ms

Sooo, somehow running out of MBUFs?

netstat -m looks fine

[2.3.1-RELEASE][root@pfSense.xxxxxxxx.co.uk]/root: netstat -m
2261/15079/17340 mbufs in use (current/cache/total)
1375/8335/9710/26584 mbuf clusters in use (current/cache/total/max)
1375/8309 mbuf+clusters out of packet secondary zone in use (current/cache)
6/3066/3072/13291 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/3938 9k jumbo clusters in use (current/cache/total/max)
0/0/0/2215 16k jumbo clusters in use (current/cache/total/max)
3390K/32703K/36094K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
97 requests for I/O initiated by sendfile

Edit 3:

Jacking up kern.ipc.somaxconn, net.local.stream.sendspace, net.local.stream.recvspace and net.inet.udp.recvspace seems to have stopped the error 55 messages.

Doing that has increased the load average though, it's over 1.5 now.

Cursorkeys

Right, I seem to have been able to solve this.

loader.conf.local

kern.ipc.nmbclusters="131072"
net.inet.tcp.tso="0"
kern.ipc.somaxconn="4096"
net.inet.tcp.recvspace="65228"
net.inet.tcp.sendspace="65228"
hw.em.rxd="4096"
hw.em.txd="4096"

System Tunables

net.inet.tcp.sendbuf_max Set autotuning maximum to at least 16MB 16777216
net.inet.tcp.recvbuf_max Set autotuning maximum to at least 16MB 16777216
net.inet.tcp.sendbuf_auto Enable send/recv autotuning 1
net.inet.tcp.recvbuf_auto Enable send/recv autotuning 1
net.inet.tcp.sendbuf_inc Autotuning Step Size 524288
net.inet.tcp.recvbuf_inc Autotuning Step Size 524288
net.inet.tcp.slowstart_flightsize Squid optimize: It would be more beneficial to increase the slow-start flightsize via the net.inet.tcp.slowstart_flightsize sysctl rather than disable delayed acks. (default = 1, –> 64) 262144/1460=maxnumber 64
net.inet.udp.recvspace Optimised 65536
net.local.stream.recvspace Optimized. (10x (mtu 16384+40)) 164240
net.local.stream.sendspace Optimized. (10x (mtu 16384+40)) 164240
kern.ipc.somaxconn Optimized for squid 4096
net.inet.tcp.mssdflt Optimized. (default = 512, --> 1460) 1460
net.inet.tcp.inflight.min FreeBSD manual recommends 6144 6144
net.inet.tcp.local_slowstart_flightsize Loopback optimized. (for MTU 16384) see "net.local.stream.****space" 10
net.inet.tcp.nolocaltimewait Loopback optimised 1
net.inet.tcp.delayed_ack Optimized. see "net.inet.tcp.slowstart_flightsize" 1

This now gives the following under full 20/20Mb load, much more what I would expect from such a powerful server:

Load average
0.04, 0.16, 0.19

I must say the support here was disappointing. No-one could suggest anything? Stating 'this is likely a FreeBSD issue so look there' is also preferable to just being ignored.

heper

i have no clue the issue here is exactly. maybe mbufs, maybe i/o, maybe nic tuning
but you do realize that a load of 1.06 on a 16-core system is around 6-7% cpuload (if it were cpu related) ? so nothing to be concerned about if everything works as intended.
The new webgui is more resource intensive aswell, does the load drop if you close the GUI ?

squid & clamd can be resource hogs … does removing them change anything?

about the lack of support:
you posted on a friday & today is monday .... most forum member & all of the paid staff are fairly idle over the weekends

Cursorkeys

Thanks for the reply. I couldn't seem to pin it down with the tools I knew how to use, possibly someone more experienced with FreeBSD could have made a more useful diagnostic. All the tools showed everything was OK. I'm still unaware of a way to check what HW interrupts/low level driver activity is doing on a FreeBSD system, I suspect that might have shown something unusual.

I think you are right on the NIC tuning, nothing made a big difference by itself but after tweaking the heck out of all of those tuneables I got the results above.

I didn't check the load without the GUI, I'm sure that's contributing.

I didn't try without squid and clamd but squid did feel 'slow' before, time to initial page load on non-cached domains was on the order of 5 seconds. You can barely notice the 'waiting for' message in the browser now.
I didn't realise that load average is different to Linux when I started, this has made Squid perform better though so possibly something was overloaded in some way.

Admittedly I wasn't able to go about this very scientifically, I just browsed this forum and tried every tuning trick I could see that might apply to my system.

Sorry if I seemed rude, I wasn't trying to be. My initial post was on the 22nd of June not last Friday so I was feeling a little bit frustrated.

heper

oh june ;)

be sure to bump a topic every week if nobody replied to it :)