Slow iperf between pfsense and clients?



  • Hi, sorry for posting a stupid question. Here is my pfsense box setup:

    Pfsense 2.2.4-amd64 with a nearly clean install (without qoe/traffic shaping/vpn)
    Intel Celeron N2930
    4GB RAM
    32GB SSD hard-drive
    JetWay JNF9HG-2930 motherboard with 4 Intel i211AT Gigabit Ethernet onboard ports

    I have also disabled hardware checksum offload, TCP segmentation offload, large receive offload and set kern.ipc.nmbclusters to 1 million as suggested in the doc.

    Test methods and results: (pfsense with 10.0.1.1 and clients, linux at 10.0.1.10, windows at 10.0.1.13, both plugged into LAN (bridged) ports of pfsense ):

    pf:          iperf -s
    linux:      iperf -c 10.0.1.1

    Client connecting to 10.0.1.1, TCP port 5001
    TCP window size: 85.0 KByte (default)
    –----------------------------------------------------------
    [  3] local 10.0.1.10 port 43836 connected with 10.0.1.1 port 5001
    [ ID] Interval      Transfer    Bandwidth
    [  3]  0.0-10.0 sec  490 MBytes  410 Mbits/sec

    linux:            iperf -s
    windows:        iperf -c 10.0.1.10

    and result is similar, with around 80 KByte TCP window size and 400 Mbits/sec throughput

    I am quite confused with this number because if I plug linux and window computers into the LAN ports of my linksys home router wrt1200ac I could achieve 800 Mbits/sec throughput using exactly the same iperf test.

    During the pfsense<–->linux iperf test, the CPU consumption is moderate

    CPU:  3.9% user,  0.0% nice, 45.5% system,  0.4% interrupt, 50.2% idle

    So I did another pfsense<–>linux iperf test with 4 parallel connections -P 4:

    iperf -c 10.0.1.1 -P 4
    –----------------------------------------------------------
    Client connecting to 10.0.1.1, TCP port 5001
    TCP window size: 85.0 KByte (default)

    ...
    [SUM]  0.0-10.0 sec  892 MBytes  747 Mbits/sec

    Also if I use UDP test I can almost max out the gigabit port as well:

    pfsense: iperf -s -u
    linux:    iperf -c 10.0.1.1 -u -b 780M

    Result on the server side:

    [2.2.4-RELEASE][root@pfSense.homenetwork]/root: iperf -s -u
    –----------------------------------------------------------
    Server listening on UDP port 5001
    Receiving 1470 byte datagrams
    UDP buffer size: 41.1 KByte (default)

    [  3] local 10.0.1.1 port 5001 connected with 10.0.1.10 port 50319
    [ ID] Interval      Transfer    Bandwidth        Jitter  Lost/Total Datagrams
    [  3]  0.0-10.0 sec  932 MBytes  782 Mbits/sec  0.020 ms  174/664786 (0.026%)
    [  3]  0.0-10.0 sec  1 datagrams received out-of-order

    So using UDP test shows a similar result as multi-threaded TCP test and the earlier test done on my wrt1200ac home router. The only dramatic difference is single-threaded TCP iperf test on pfsense, which I could absolutely not break even 500 Mbit/s no matter what kind of options and parameter I used for iperf. I also used iperf3 on pfsense but without any improvement either.

    So my questions are:
    Is this result supposed to be normal?
    and if the answer is yes, then
    The hardeware on my wrt1200ac I suppose should be no where near the capability of my pfsense box, so why using the same iperf test it can achieve better single-threaded iperf result?

    Many thanks in advance :)


  • LAYER 8 Global Moderator

    "LAN (bridged) ports of pfsense ):"

    So your expecting pfsense to be as fast as a SWITCH?  Or to answer traffic directly to it as fast as a workstation OS?

    Pfsense is meant to be a router/firewall and route/firewall traffic - its not a switch.. Nor does it have any specific reason to answer traffic directed towards its interfaces very fast..  Its designed to route those packets and firewall them between 2 clients talking..



  • Thank you for your remindings. I deleted the bridge and plugged linux computer into the LAN port and did the test again with plain iperf -c 10.0.1.1 command but still I am getting around 450 Mbps between pfsense<–->linux

    Or maybe I misinteruptted your reply, that you actually mean that there is NO WAY I could achieve gigabit TCP single-threaded connection on my pfsense box no matter how? I am little confused here. Thanks



  • I've found that many of the higher end chips have a lot of small features that can make a huge difference for IO related workloads. Your Celeron may have the clock cycles, but it could be missing a lot of little things that all add up to make a large difference in performance. Another thing to note is iperf is a userland program. Routing and firewall runs pretty much all in kernel. CPU load for passing traffic may be quite a bit lower than actually interacting with the traffic.

    In my case, my CPU usage was lower doing iperf through my firewall than to my firewall. I had a machine on the WAN side and another on the LAN side. I setup NAT and run iperf from the two devices. Total bandwidth throughput was a bit over 3Gb/s because two 1Gb full duplex ports is a maximum of 4Gb/s. My cpu usage was around 15% on my 3.1ghz i5 quad-core Haswell. When I ran iperf between PFSense and my desktop, the bandwidth was about 1.3Gb/s and my cpu was about 20%.

    CPU:  3.9% user,  0.0% nice, 45.5% system,  0.4% interrupt, 50.2% idle

    50% cpu usage means you have 2 of your 4 cores pegged at 100%, depending on what tool you're using to report that. I'm not very Unix savvy.



  • Thank you so much for your reply Harvy66. I just connect my linux computer to WAN port and window laptop to LAN port and did an iperf test again, just as you said with a lightly larger window size -w 128k I could easily reach 950+Mbps with less than 10% CPU usage and around 30% interrupt, and the best I could achieve between pfsense and computers on either WAN/LAN port was still around 500 Mbps with high CPU usage, no matter how large the tcp window I chose.

    Also I just thought if pfsense could achieve gigabit between WAN and LAN, it doesn't seem to be reasonable to say it could not achieve similar throughput between LAN and LAN. I ran iperf again with 128k window size and this time it could reach 800~900 Mbps even with all the LAN ports bridged together. I could not on earth remember how I was not able to reach this speed with the same window size this afternoon, perhaps I in fact forgot to test LAN<–>LAN with different window size as I naively assumed LAN<-->LAN should be definitely slower than LAN<-->Pfsense so no need for further test >:(. Your explanation about user space and kernel space of routing/iperf makes a lot of sense.

    Now I am confident that if I move to a place with gigabit fiber connection this pfsense box surely won't be the bottleneck. Thank you again  :D


Log in to reply