Packet loss and bandwidth limitations
-
We have a failover pair of Dell 610s, each server has inbuilt 4 x 1Gbps NIC (Broadcom - bce) and we have -
bce0 - WAN
bce1 - LAN
bce2 - SYNCthey are connected to a pair of 6500 switches on modules that are wire speed.
Although the bandwidth on the WAN side never really gets above 300 - 400 Mbps combined we see that when the number of pps exceeds around 950, 000 (they generally run between 500 - 700 Mbps fine) we are getting some packet loss and as this is a VoIP setup it starts affecting call quality.
I have modified the buffer size in the loader.conf.local as per Netgate docs and this seems to have made some improvement ie. still some drops but lot less than before.
Are there any limits that I am hitting in terms of number of pps the pfSense can process or is that purely down to CPU/Memory resources etc. ?
I am assuming as bandwidth is not really getting anywhere near 1Gbps then making a LAGG would not help in this situation.
Any suggestions/pointers much appreciated
-
It could be a PPS limit althouh <1Mpps is not huge. What CPUs are in those?
Do you see any CPU cores at the limit?
Do you see multiple queues in use on each NIC?
Steve
-
Steve
Thanks for the reply.
The CPUs are dual Xeon Quad Core 2.4GHz and I'll check again today but at no time does the general CPU usage go over about 10% on the firewall as a whole.
This is the current snapshot but there are not issues at the moment.
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0K 384K CPU3 3 ??? 100.00% [idle{idle: cpu3}]
11 root 155 ki31 0K 384K CPU4 4 ??? 100.00% [idle{idle: cpu4}]
11 root 155 ki31 0K 384K CPU6 6 ??? 100.00% [idle{idle: cpu6}]
11 root 155 ki31 0K 384K CPU16 16 ??? 100.00% [idle{idle: cpu16}]
11 root 155 ki31 0K 384K CPU14 14 ??? 100.00% [idle{idle: cpu14}]
11 root 155 ki31 0K 384K CPU22 22 ??? 99.89% [idle{idle: cpu22}]
11 root 155 ki31 0K 384K RUN 20 ??? 99.84% [idle{idle: cpu20}]
11 root 155 ki31 0K 384K CPU12 12 ??? 99.79% [idle{idle: cpu12}]
11 root 155 ki31 0K 384K CPU17 17 ??? 99.77% [idle{idle: cpu17}]
11 root 155 ki31 0K 384K CPU18 18 ??? 99.77% [idle{idle: cpu18}]
11 root 155 ki31 0K 384K CPU0 0 ??? 99.53% [idle{idle: cpu0}]
11 root 155 ki31 0K 384K CPU15 15 ??? 98.94% [idle{idle: cpu15}]
11 root 155 ki31 0K 384K CPU21 21 ??? 96.78% [idle{idle: cpu21}]
11 root 155 ki31 0K 384K CPU23 23 ??? 93.30% [idle{idle: cpu23}]
11 root 155 ki31 0K 384K CPU19 19 ??? 91.84% [idle{idle: cpu19}]
11 root 155 ki31 0K 384K CPU1 1 ??? 91.80% [idle{idle: cpu1}]
11 root 155 ki31 0K 384K CPU10 10 ??? 89.88% [idle{idle: cpu10}]
11 root 155 ki31 0K 384K CPU8 8 ??? 84.86% [idle{idle: cpu8}]
11 root 155 ki31 0K 384K CPU13 13 ??? 77.45% [idle{idle: cpu13}]
11 root 155 ki31 0K 384K CPU5 5 ??? 24.94% [idle{idle: cpu5}]
12 root -92 - 0K 1056K WAIT 8 533.4H 15.13% [intr{irq256: bce0}]
12 root -92 - 0K 1056K WAIT 10 304.6H 10.12% [intr{irq257: bce1}]
12 root -72 - 0K 1056K WAIT 15 36.1H 1.08% [intr{swi1: pfsync}]
12 root -92 - 0K 1056K WAIT 12 407:54 0.22% [intr{irq258: bce2}]In terms of NIC queues - how do I view those on the pfSense as can't see to find where they are.
-
I would check the boot log first.
vmstat -i
should also show you the queues.That top output indicates you have 24 CPU cores (including hyperthreading). So Hex core Xeons?
But it shows that 19 of them are basically idle, one has some load at 25% idle. But 4 are not shown so could be 100% loaded.
Try running `top -HaSP' that will show the per core loading.Steve
-
Hi Steve
Apologies, been a bit busy the last couple of days.
Just wanted to say thanks for the suggestions, I'll have a look at the CPU usage when we are seeing packet drops next and if I find anything definitive I'll update the thread.