Slow iperf between pfsense and clients?
-
Hi, sorry for posting a stupid question. Here is my pfsense box setup:
Pfsense 2.2.4-amd64 with a nearly clean install (without qoe/traffic shaping/vpn)
Intel Celeron N2930
4GB RAM
32GB SSD hard-drive
JetWay JNF9HG-2930 motherboard with 4 Intel i211AT Gigabit Ethernet onboard portsI have also disabled hardware checksum offload, TCP segmentation offload, large receive offload and set kern.ipc.nmbclusters to 1 million as suggested in the doc.
Test methods and results: (pfsense with 10.0.1.1 and clients, linux at 10.0.1.10, windows at 10.0.1.13, both plugged into LAN (bridged) ports of pfsense ):
pf: iperf -s
linux: iperf -c 10.0.1.1Client connecting to 10.0.1.1, TCP port 5001
TCP window size: 85.0 KByte (default)
–----------------------------------------------------------
[ 3] local 10.0.1.10 port 43836 connected with 10.0.1.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 490 MBytes 410 Mbits/seclinux: iperf -s
windows: iperf -c 10.0.1.10and result is similar, with around 80 KByte TCP window size and 400 Mbits/sec throughput
I am quite confused with this number because if I plug linux and window computers into the LAN ports of my linksys home router wrt1200ac I could achieve 800 Mbits/sec throughput using exactly the same iperf test.
During the pfsense<–->linux iperf test, the CPU consumption is moderate
CPU: 3.9% user, 0.0% nice, 45.5% system, 0.4% interrupt, 50.2% idle
So I did another pfsense<–>linux iperf test with 4 parallel connections -P 4:
iperf -c 10.0.1.1 -P 4
–----------------------------------------------------------
Client connecting to 10.0.1.1, TCP port 5001
TCP window size: 85.0 KByte (default)...
[SUM] 0.0-10.0 sec 892 MBytes 747 Mbits/secAlso if I use UDP test I can almost max out the gigabit port as well:
pfsense: iperf -s -u
linux: iperf -c 10.0.1.1 -u -b 780MResult on the server side:
[2.2.4-RELEASE][root@pfSense.homenetwork]/root: iperf -s -u
–----------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size: 41.1 KByte (default)[ 3] local 10.0.1.1 port 5001 connected with 10.0.1.10 port 50319
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 3] 0.0-10.0 sec 932 MBytes 782 Mbits/sec 0.020 ms 174/664786 (0.026%)
[ 3] 0.0-10.0 sec 1 datagrams received out-of-orderSo using UDP test shows a similar result as multi-threaded TCP test and the earlier test done on my wrt1200ac home router. The only dramatic difference is single-threaded TCP iperf test on pfsense, which I could absolutely not break even 500 Mbit/s no matter what kind of options and parameter I used for iperf. I also used iperf3 on pfsense but without any improvement either.
So my questions are:
Is this result supposed to be normal?
and if the answer is yes, then
The hardeware on my wrt1200ac I suppose should be no where near the capability of my pfsense box, so why using the same iperf test it can achieve better single-threaded iperf result?Many thanks in advance :)
-
"LAN (bridged) ports of pfsense ):"
So your expecting pfsense to be as fast as a SWITCH? Or to answer traffic directly to it as fast as a workstation OS?
Pfsense is meant to be a router/firewall and route/firewall traffic - its not a switch.. Nor does it have any specific reason to answer traffic directed towards its interfaces very fast.. Its designed to route those packets and firewall them between 2 clients talking..
-
Thank you for your remindings. I deleted the bridge and plugged linux computer into the LAN port and did the test again with plain iperf -c 10.0.1.1 command but still I am getting around 450 Mbps between pfsense<–->linux
Or maybe I misinteruptted your reply, that you actually mean that there is NO WAY I could achieve gigabit TCP single-threaded connection on my pfsense box no matter how? I am little confused here. Thanks
-
I've found that many of the higher end chips have a lot of small features that can make a huge difference for IO related workloads. Your Celeron may have the clock cycles, but it could be missing a lot of little things that all add up to make a large difference in performance. Another thing to note is iperf is a userland program. Routing and firewall runs pretty much all in kernel. CPU load for passing traffic may be quite a bit lower than actually interacting with the traffic.
In my case, my CPU usage was lower doing iperf through my firewall than to my firewall. I had a machine on the WAN side and another on the LAN side. I setup NAT and run iperf from the two devices. Total bandwidth throughput was a bit over 3Gb/s because two 1Gb full duplex ports is a maximum of 4Gb/s. My cpu usage was around 15% on my 3.1ghz i5 quad-core Haswell. When I ran iperf between PFSense and my desktop, the bandwidth was about 1.3Gb/s and my cpu was about 20%.
CPU: 3.9% user, 0.0% nice, 45.5% system, 0.4% interrupt, 50.2% idle
50% cpu usage means you have 2 of your 4 cores pegged at 100%, depending on what tool you're using to report that. I'm not very Unix savvy.
-
Thank you so much for your reply Harvy66. I just connect my linux computer to WAN port and window laptop to LAN port and did an iperf test again, just as you said with a lightly larger window size -w 128k I could easily reach 950+Mbps with less than 10% CPU usage and around 30% interrupt, and the best I could achieve between pfsense and computers on either WAN/LAN port was still around 500 Mbps with high CPU usage, no matter how large the tcp window I chose.
Also I just thought if pfsense could achieve gigabit between WAN and LAN, it doesn't seem to be reasonable to say it could not achieve similar throughput between LAN and LAN. I ran iperf again with 128k window size and this time it could reach 800~900 Mbps even with all the LAN ports bridged together. I could not on earth remember how I was not able to reach this speed with the same window size this afternoon, perhaps I in fact forgot to test LAN<–>LAN with different window size as I naively assumed LAN<-->LAN should be definitely slower than LAN<-->Pfsense so no need for further test >:(. Your explanation about user space and kernel space of routing/iperf makes a lot of sense.
Now I am confident that if I move to a place with gigabit fiber connection this pfsense box surely won't be the bottleneck. Thank you again :D