Performance test , more Cores = lower routing performance???



  • We are looking to use PFsense as LAn routers with ACLs in a virtualized environment.

    We made a test set-up.
    As hardware we run a supermicro server with dual CPU Xeon E5-2860 @ 2.7 GHz 8 core each CPU + HT , 128Gb Ram @ 1333Mhz, 10Gbit ethernet uplink.
    The server is running Window Server 2016 HyperV
    All tests are done on the local server, no other VMs are running on that system.

    pfSense version used 2.3.3 up to date on april 2017.

    We used 1 VM pfSense as a router (No NAT) , no rules are applied, only allow any-any between the 2 subnets.

    2 VM with windows 2016 with Iperf3.

    The window VM have each 4 cores and 4Gb Ram.

    Both Windows and pfSense see their NIC as 10Gbit connected.

    We established with a baseline by first connect the 2 Windows servers to the virtual switch without routing.
    Speed in Gbit
    If we use 1 TCP stream: 2.28
    If we use 2 TCP streams: 3.71
    If we use 3 TCP streams: 4.5
    If we use 4 TCP streams: 4.99
    If we use 5 TCP streams: 5.33
    With more than 5 TCP streams it levels out the perfromance start to degrade around 10 streams.

    Now with pfSense routing (In case the pfSense has only 1 CPU assigned):
    If we use 1 TCP stream: 1.26
    If we use 2 TCP streams: 1.9
    If we use 3 TCP streams: 2.38
    If we use 4 TCP streams: 2.43
    If we use 5 TCP streams: 2.61
    With more than 5 TCP streams it levels out the perfromance start to degrade around 10 streams.
    The CPU load is around 90-95%

    I thought if I assign multiple CPU to the pfSense I should get better performance.
    Now with pfSense routing (In case the pfSense has 2 CPU assigned):
    If we use 1 TCP stream: 0.99
    If we use 2 TCP streams: 1.3
    If we use 3 TCP streams: 1.45
    If we use 4 TCP streams: 1.53
    If we use 5 TCP streams: 1.49
    With more than 5 TCP streams it levels out the perfromance start to degrade around 10 streams.
    The CPU load is also around 90-95%
    We see a huge performance drop when assigning 2 CPU.  :o

    With 4 CPUs assigned to the pfSense the performance rises a little vs 2 CPU but the performance is still much worse than the single CPU set-up.
    With 4 CPU we do see that the CPU load is 45-50%

    We managed to boost the performance a little bit by disabling the VMQ setting on the NIC in HyperV.

    It confuses me that the performance drops when assigning more CPU power to pfSense. I can understand that routing is done by a single process which isn't multithread, but then I still cannot see why the total CPU load is also at 95% , this would indicate that 2 threads are running. And with 4 CPUs I get 45-50% CPU load, this would also indicate that 2 threads are being used. But still with 2 CPU assigned and it seems that both are being used the performance is still much lower than single CPU assigned.

    If we switch to production it would be very similar set-up only the set-up would be HA (CARP), anyone has any idea how we can boost the performance above 2.6Gbit ?

    All my tests results can be found in the attached XLS file.

    Iperf-Results.zip



  • I would suspect some kind of configuration problem.
    The network performance between to systems on the same server is basically limited only by the memory-bandwith
    therefore 5 Gbit seems poor.
    Which version of iperf are you using?

    Have a look at https://www.bsdcan.org/2016/schedule/events/681.en.html

    Be aware that it is not advisable to activate tso and lro on a routing device.

    Is it possible to do the folllowing things:

    • PIN CPUs  (as HT-CPUs will harm the performance)
    • Increase RX and TX-Ques ?