Fabiatech FX5625 improving throughput
-
The 100% CPU usage only seems to happen the early hours of the morning. Always at the same time. I'll try to get on to it and take a look remotely tomorrow morning and will post update.
-
After manually chucking some data through to generate this load, the main process responsible is 'intr{irq257: em0:rx0}' with similar processes for the other interfaces alongside it but not quite as high (understandably as em0 is the WAN interface).
Sample output:
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0K 64K CPU1 1 23.6H 87.26% [idle{idle: cpu1}] 12 root -92 - 0K 832K CPU0 0 233:12 79.73% [intr{irq257: em0:rx0}] 11 root 155 ki31 0K 64K RUN 3 23.9H 76.87% [idle{idle: cpu3}] 11 root 155 ki31 0K 64K CPU2 2 23.2H 49.23% [idle{idle: cpu2}] 12 root -92 - 0K 832K WAIT 2 4:05 34.74% [intr{irq278: em5:rx0}] 0 root -92 - 0K 816K - 3 7:47 20.59% [kernel{em0 rxq (cpuid 0)}] 11 root 155 ki31 0K 64K RUN 0 18.6H 13.41% [idle{idle: cpu0}] 12 root -92 - 0K 832K WAIT 2 51:47 11.30% [intr{irq261: em1:rx0}] 12 root -92 - 0K 832K WAIT 0 107:33 5.89% [intr{irq265: em2:rx0}] 0 root -92 - 0K 816K - 2 23:04 5.05% [kernel{dummynet}] 12 root -92 - 0K 832K WAIT 1 16:14 4.75% [intr{irq258: em0:tx0}] 12 root -92 - 0K 832K WAIT 3 0:16 4.39% [intr{irq279: em5:tx0}] 12 root -92 - 0K 832K WAIT 3 6:09 1.87% [intr{irq262: em1:tx0}] 12 root -92 - 0K 832K WAIT 2 13:49 1.40% [intr{irq269: em3:rx0}] 0 root -92 - 0K 816K - 1 1:41 0.75% [kernel{em5 rxq (cpuid 2)}] 12 root -92 - 0K 832K WAIT 1 15:43 0.58% [intr{irq266: em2:tx0}] 74844 root 20 0 9868K 4700K CPU3 3 0:00 0.53% top -aSH 0 root -92 - 0K 816K - 1 2:42 0.46% [kernel{em1 rxq (cpuid 2)}] 12 root -92 - 0K 832K WAIT 0 11:15 0.42% [intr{irq281: em6:rx0}] 12 root -60 - 0K 832K WAIT 1 3:25 0.27% [intr{swi4: clock (0)}] 12 root -92 - 0K 832K WAIT 3 2:18 0.26% [intr{irq270: em3:tx0}]
Checking things like mbuf et al, and there appears to be plenty of room there:
35554/14801/50355 mbufs in use (current/cache/total) 33501/13093/46594/249500 mbuf clusters in use (current/cache/total/max) 33501/13051 mbuf+clusters out of packet secondary zone in use (current/cache) 0/34/34/124749 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/36962 9k jumbo clusters in use (current/cache/total/max) 0/0/0/20791 16k jumbo clusters in use (current/cache/total/max) 75890K/30022K/105912K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 sendfile syscalls 0 sendfile syscalls completed without I/O request 0 requests for I/O initiated by sendfile 0 pages read by sendfile as part of a request 0 pages were valid at time of a sendfile request 0 pages were requested for read ahead by applications 0 pages were read ahead by sendfile 0 times sendfile encountered an already busy page 0 requests for sfbufs denied 0 requests for sfbufs delayed
Current MBUF limit set as:
[2.4.5-RELEASE][admin@firewall1.midlandcomputers.com]/root: sysctl kern.ipc.nmbclusters kern.ipc.nmbclusters: 249500
-
em uses a single receive and transmit queue so you're unlikely to exhaust the mbufs.
What throughput were you seeing when that was taken?
Between which interfacesWhat throughput do you see without any of those loader variables, just using the em defaults?
What output do you get from
vmstat -i
andsysctl net.isr
Steve
-
Output from
sysctl net.isr
:net.isr.numthreads: 4 net.isr.maxprot: 16 net.isr.defaultqlimit: 256 net.isr.maxqlimit: 10240 net.isr.bindthreads: 0 net.isr.maxthreads: 4 net.isr.dispatch: direct
Output from
vmstat -i
:interrupt total rate irq18: uhci2+ 304106 3 cpu0:timer 108772857 1036 cpu1:timer 68073061 648 cpu2:timer 9281390 88 cpu3:timer 19118159 182 irq257: em0:rx0 194215751 1850 irq258: em0:tx0 229258370 2183 irq259: em0:link 1 0 irq261: em1:rx0 48310327 460 irq262: em1:tx0 82599543 787 irq263: em1:link 1 0 irq265: em2:rx0 113082535 1077 irq266: em2:tx0 193176467 1840 irq267: em2:link 1 0 irq269: em3:rx0 23497096 224 irq270: em3:tx0 39913436 380 irq271: em3:link 1 0 irq273: em4:rx0 157084 1 irq274: em4:tx0 104642 1 irq275: em4:link 1 0 irq277: pcib8 1 0 irq278: em5:rx0 3537702 34 irq279: em5:tx0 3615446 34 irq280: em5:link 1 0 irq281: em6:rx0 11959127 114 irq282: em6:tx0 15965140 152 irq283: em6:link 1 0 irq284: em7:rx0 421216 4 irq285: em7:tx0 21775 0 irq286: em7:link 9 0 Total 1165385247 11098
In the example I posted above I was simply downloading large files two hosts without bandwidth caps. Where em0 is the WAN interface, and em1 & em5 were where the hosts were residing.
I will remove what I have entered from the loader.conf, reboot and retry, but rebooting the firewall during office hours is a pain to arrange. I'll get this done this evening.
-
You might try setting:
net.isr.bindthreads=1The core affinity might give you better distribution.
-
Hi,
I've set that and rebooted, and will test over the weekend.
I might be gong completely along the wrong train of thought, but would
net.isr.direct=1
possibly also help? -
@SimonB256 said in Fabiatech FX5625 improving throughput:
net.isr.direct
That doesn't exist in FreeBSD after 9 (pfSense 2.4.5 is built on 11.3), that's what
net.isr.dispatch: direct
does.Steve
-
Just to update, it appears that I am now getting better throughput after adding
net.isr.bindthreads=1
.Thank you for your help.
-
Ah, good to hear. What sort of improvement are you seeing?
-
In terms of throughput I'm only seeing a 15-20Mbps increase (so we're up to 470Mbps). But we're seeing far less packet loss at the top end of these speeds.
Looking further at the kind of traffic we're handling. We're talking around 600-700 flows at any given time (according to ntop I have running elsewhere in the network), and around 15k-20k states listed on the firewall itself.
So I imagine for this small device, handling a reasonable amount of small connections at any time might explain why we wouldn't be getting the 600Mbps+ theoretical max.
-
Yes that seems reasonable. You would only see >600Mbps using all full size packets.
Steve