Packet loss and bandwidth limitations

jms123

We have a failover pair of Dell 610s, each server has inbuilt 4 x 1Gbps NIC (Broadcom - bce) and we have -

bce0 - WAN
bce1 - LAN
bce2 - SYNC

they are connected to a pair of 6500 switches on modules that are wire speed.

Although the bandwidth on the WAN side never really gets above 300 - 400 Mbps combined we see that when the number of pps exceeds around 950, 000 (they generally run between 500 - 700 Mbps fine) we are getting some packet loss and as this is a VoIP setup it starts affecting call quality.

I have modified the buffer size in the loader.conf.local as per Netgate docs and this seems to have made some improvement ie. still some drops but lot less than before.

Are there any limits that I am hitting in terms of number of pps the pfSense can process or is that purely down to CPU/Memory resources etc. ?

I am assuming as bandwidth is not really getting anywhere near 1Gbps then making a LAGG would not help in this situation.

Any suggestions/pointers much appreciated

stephenw10

It could be a PPS limit althouh <1Mpps is not huge. What CPUs are in those?

Do you see any CPU cores at the limit?

Do you see multiple queues in use on each NIC?

Steve

jms123

Steve

Thanks for the reply.

The CPUs are dual Xeon Quad Core 2.4GHz and I'll check again today but at no time does the general CPU usage go over about 10% on the firewall as a whole.

This is the current snapshot but there are not issues at the moment.

PID USERNAME PRI NICE SIZE 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 12 root -92 - 12 root -92 - 12 root -72 - 12 root -92 - RES STATE C TIME WCPU COMMAND
0K 384K CPU3 3 ??? 100.00% [idle{idle: cpu3}]
0K 384K CPU4 4 ??? 100.00% [idle{idle: cpu4}]
0K 384K CPU6 6 ??? 100.00% [idle{idle: cpu6}]
0K 384K CPU16 16 ??? 100.00% [idle{idle: cpu16}]
0K 384K CPU14 14 ??? 100.00% [idle{idle: cpu14}]
0K 384K CPU22 22 ??? 99.89% [idle{idle: cpu22}]
0K 384K RUN 20 ??? 99.84% [idle{idle: cpu20}]
0K 384K CPU12 12 ??? 99.79% [idle{idle: cpu12}]
0K 384K CPU17 17 ??? 99.77% [idle{idle: cpu17}]
0K 384K CPU18 18 ??? 99.77% [idle{idle: cpu18}]
0K 384K CPU0 0 ??? 99.53% [idle{idle: cpu0}]
0K 384K CPU15 15 ??? 98.94% [idle{idle: cpu15}]
0K 384K CPU21 21 ??? 96.78% [idle{idle: cpu21}]
0K 384K CPU23 23 ??? 93.30% [idle{idle: cpu23}]
0K 384K CPU19 19 ??? 91.84% [idle{idle: cpu19}]
0K 384K CPU1 1 ??? 91.80% [idle{idle: cpu1}]
0K 384K CPU10 10 ??? 89.88% [idle{idle: cpu10}]
0K 384K CPU8 8 ??? 84.86% [idle{idle: cpu8}]
0K 384K CPU13 13 ??? 77.45% [idle{idle: cpu13}]
0K 384K CPU5 5 ??? 24.94% [idle{idle: cpu5}]
0K 1056K WAIT 8 533.4H 15.13% [intr{irq256: bce0}]
0K 1056K WAIT 10 304.6H 10.12% [intr{irq257: bce1}]
0K 1056K WAIT 15 36.1H 1.08% [intr{swi1: pfsync}]
0K 1056K WAIT 12 407:54 0.22% [intr{irq258: bce2}]

In terms of NIC queues - how do I view those on the pfSense as can't see to find where they are.

stephenw10

I would check the boot log first. vmstat -i should also show you the queues.

That top output indicates you have 24 CPU cores (including hyperthreading). So Hex core Xeons?

But it shows that 19 of them are basically idle, one has some load at 25% idle. But 4 are not shown so could be 100% loaded.
Try running `top -HaSP' that will show the per core loading.

Steve

jms123

Hi Steve

Apologies, been a bit busy the last couple of days.

Just wanted to say thanks for the suggestions, I'll have a look at the CPU usage when we are seeing packet drops next and if I find anything definitive I'll update the thread.