High CPU (Atom) and low network throughput (Intel Quad Port NIC)
-
Hi there, this is my first post on this forum :)
I have been using internally and recommending pfSense to customers for a few years now. It's an amazing firewall and I have always been extremely happy with it.
Typically I've always used pfSense in a VMware visualized environment and haven't really encountered any performance relates issues at all. Anyway I have recently bought a physical server (Atom D525 CPU and Intel Quad port NIC - igb driver).
After taking the config across from the existing pfSense (version 2.0.1 amd64) VMware instance to the physical server (also 2.0.1 amd64) I ran a performance test, traffic passing from a host behind the ig0 interface through to a host behind the ig1 interface. Basically i was wanting to test the real throughput between physical interfaces.
The resulting performance was a surprise. I was only able to achieve 460MB/s of throughput and the CPU was 40% utilized. I have one quite a bit of Googling and tried changing interrupts and buffers per various suggestions. None of these changes really made a massive difference, I failed to achieve over 500MB/s of throughput.
After making no inroads I decided to try an alternative firewall distribution on the same physical hardware as a benchmark. I installed Microtik 5.14 and performed exactly the same test, between the same hosts and using the same interfaces. On the exact same hardware and doing nothing special at all I was able to achieve 865MB/s throughput and CPU utilization was only 15-20%. Effectively you could argue (in round numbers) that this equates to a 400% improvement (2 x network throughput and half CPU utilization).
I really want to resolve the performance issue with pfSense and don't want to change firewall distros (their GUI I find horrendous!).
Has anyone encountered similar performance issues. And ideally is there any resolutions/changes that can be made?
Regards,
Andrew -
I use an Atom D510 with Intel (em) NICs, and my testing with iperf showed results very similar to yours. The one thing I tried that made a significant difference was to enable "net.inet.ip.fastforwarding" in System: Advanced: System Tunables.
You may also want to have a look at http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards.
-
Thanks very much for the reply. I'd found that link in my search and tried those suggestions, to no avail unfortunately. I also found another (I think generic FreeBSD related) which had a large number of system parameter changes. Those actually ended up bypassing the firewall rules entirely :o and still didn't achieve any better throughput.
I'll have a look into the fast forwarding though and report back.
-
OK, so I've enabled the fast forwarding on the hardware firewall and also tested with the following:
- Enable device polling = Enabled and Disabled
- Disable hardware checksum offload = Enabled and Disabled
- Disable hardware TCP segmentation offload = Enabled and Disabled
- Disable hardware large receive offload = Enabled and Disabled
- kern.ipc.nmbclusters="131072" and hw.igb.num_queues="1" in the '/boot/loader.conf' file
The maximum I was able to get has been 483 Mbits/sec.
I also tried updating to the 2.1 development build to see if perhaps some driver related issue had been resolve between builds. No improvement sadly.
To test my sanity I put the system back to a MikroTik 5.14 install and created firewall rules to pass traffic between two interfaces. Immediately throughput jumps to a massive 861 Mbits/sec @ 20% max CPU.
The discrepancy between those throughput and utilization numbers are insane. I really, and genuinely love pfSense (see posts on our website about pfSense for proof - https://www.xuridisa.com/blog), but unless it can get up to these same levels on the exact same hardware you really have to start asking questions about efficiency, performance and scalability.
Anyone have any other suggestions that might be the silver bullet?
Cheers,
Andrew -
iperf -c crag -t 20 –---------------------------------------------------------- Client connecting to crag, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.85.2 port 55421 connected with 192.168.250.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-20.0 sec 1.02 GBytes 438 Mbits/sec
CPU usage on pfsense was ~75-80%
iperf -c crag -t 20 -P4 –---------------------------------------------------------- Client connecting to crag, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 6] local 192.168.85.2 port 55425 connected with 192.168.250.1 port 5001 [ 4] local 192.168.85.2 port 55422 connected with 192.168.250.1 port 5001 [ 5] local 192.168.85.2 port 55423 connected with 192.168.250.1 port 5001 [ 3] local 192.168.85.2 port 55424 connected with 192.168.250.1 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0-20.0 sec 285 MBytes 119 Mbits/sec [ 5] 0.0-20.0 sec 288 MBytes 121 Mbits/sec [ 3] 0.0-20.0 sec 299 MBytes 125 Mbits/sec [ 6] 0.0-20.0 sec 268 MBytes 112 Mbits/sec [SUM] 0.0-20.0 sec 1.11 GBytes 478 Mbits/sec
Throughput went up slightly with 4 streams, but pfsense CPU usage was 100% on this test.
This is with fastforwarding enabled, checksum offload, TSO and LRO enabled. This system is live and passing ~10-15 mbps of "real" traffic as I test it. I'm sure I saw over 600 mbps with iperf in bench testing, but that would have been on a 2.0 beta. I'm not sure what else would have changed since then.
So there you go. I don't know if these numbers can be improved on, but we're certainly seeing similar performance.
-
There isn't much doubt (in my mind at least!) that a linux based system is likely to give better performance than one based on FreeBSD, especially with newer hardware. There are simply more people working to optimise the drivers.
That's a pretty general statement though and I'm sure you could find exceptions.The difference in speed is similar to that which I found using msk(4) NICs on similar processing power.
However the difference here does seem large because you're using Intel NICs which are using the best supported drivers in FreeBSD. The question then is, are these two OSs actually doing the same thing?
It would be interesting to bridge two interfaces and switch off bridge member filtering. How much traffic can it pass purely packet forwarding?
Steve
-
Your performance figures are similar to the ones I get on my Atom D510 board. 200ish mbit with snort running, 300ish without
-
Sadly I ended up goign the MikroTik way. I frankly spent far too long messing with the kernel etc. attempting to get additional performance. All those efforts failed :(