PfSense performance test



  • Hi guys,
    I have two complex questions:

    1. I would like to know if pfsense (PF and OpenVPN) can properly use multi core processors
    2. I would like to understand how to determine the maximum thrughput of my firewall (as packet filtering capacity and encryption performance)

    About the question number 1 I see this page
    https://doc.pfsense.org/index.php/Does_pfSense_support_SMP_(multi-processor_and/or_core)_systems
    So the answer I think is YES, but searching on google I found conflicting information and also from my test i see conflicting information.

    About the question number 2 in the past I used this two test:
    A) Iperf loopback test
    On the console I run iperf -s and in another console I run iperf -c 127.0.0.1 -P8 -w 128k -t30
    The option -P use multi thread feauture, but i think is only for the iperf in client mode because only one core run on 100% and the other run on 20% - 30%. I think the core at 100% is running iperf server.
    It is iperf or pf not to be able to use multi thread/core?
    I also turned off pf (pfctl-d) and I did the test, in this test the thrughput was greater than 30% compared to that with pf turned on, but only a core was used at 100%
    From my experience I can conclude that iperf loopback testing is not the best way to determine the maximum thrughput available from my hardware. What test can I do to get this information?

    B) OpenVPN crypto test
    On the console i run:
    openvpn –genkey --secret /tmp/secret
    time openvpn --test-crypto --secret /tmp/secret --verb 0 --tun-mtu 20000 --cipher aes-256-cbc

    Also in this test I see only one core running on 100%
    I tried to run multiple instances simultaneously. In this way I used all the core at 100%.
    From every instance / core I got the same performance, so can I use the multi core only with multiple instances of openvpn?

    I did a comparison between different processors and I got the results you can see in the attached file
    Analyzing the results, it seems that the processor clock frequency is much more relevant than the number of core
    The result is paradoxical, the Intel Celeron G1620 2.7 GHz processor is similar and has the same thrughput of the Intel Xeon D-1528 1.9 GHz processor

    From my experience I can conclude that iperf loopback testing is not the best way to determine the maximum thrughput available from my hardware.
    Openvon is not able to exploit multi core processors and the best performances are only achieved by increasing the clock frequency or using AES-NI extensions.

    What do you think?
    What is the best way to measure the maximum thrughput of a firewall (as packet filtering capacity and encryption performance)?

    Thanks
    Fabio




  • OpenVPN was limited to single-thread when I last read about it, maybe ~8 months ago.



  • I assume the "single threaded" of OpenVPN is probably per tunnel. I would hope it to use more cores as more tunnels are opened.

    Using the loop-back or anything local will not represent load over the network at all. You really need to run an iperf through pfSense and never on or to it. You may also want to reduce the iperf MTU if possible. TCP and UDP will give different performance characteristics. Actual network performance is typically measured in packets per second. Packet size represents a minor portion of the work, but can be memory bandwidth constrained in certain situations.



    1. I would like to know if pfsense (PF and OpenVPN) can properly use multi core processors

    This both piece of software are from different vendors or code writers, and so it must be please seen as two
    parts that we have to talk about, OpenVPN can´t speed up pfSense and vice versa! Based on the information
    that pfSense version 3.0 will be totally new written, it should be able to think about that this version then is
    capable of CPU multi-core usage! (my opinion and sight of things)

    There are some hints out here and there that will be let us hoping there is in the near or far future a version of pfSense
    ….because sometimes I run an advanced variant of "pfSense" and it knows how to use all the cores. ;-)
    SG4860 vs SG-8860 (9 month ago)

    And OpenVPN is only single-threaded, but each tunnel can be using one CPU core and some activities were "merged"
    into one or more activities, that's the only thing that could be done actual at this time, based on information's from OpenVPN. 
    Perhaps also OpenVPN 3.0 will be not only more single-threaded because there are many users who demand this, but who
    knows it really?

    1. I would like to understand how to determine the maximum thrughput of my firewall (as packet filtering capacity and encryption performance)

    This is not really clear to me, because this are two things in my eyes;

    • Do you want to get out the maximum performance of your existing hardware?
    • Or do you want to get or reach the available maximum?

    About the question number 2 in the past I used this two test:

    It would be nice to see if you are able to get two modern and PCs or Laptops and do this test one time over the LAN (LAN-LAN)
    through pfSense and one time over the WAN interface (WAN-LAN)! This would be more clear and matching more closer to the reality.

    B) OpenVPN crypto test

    Why? If you do that test on a single machine, you will be seeing then test results far away from that what you get inside of
    the real world VPN tunnel and then the most users are really pissed of that results too! OpenSSL is able to use multiple cores
    of a CPU, but not OpenVPN! And the most thing that get not into that test is, that a VPN is a two ended connection! And this
    is also very important that the both ends of a connection are fast enough to encrypt and decrypt the tunnel and not only on
    only side only!

    I did a comparison between different processors and I got the results you can see in the attached file
    Analyzing the results, it seems that the processor clock frequency is much more relevant than the number of core
    The result is paradoxical, the Intel Celeron G1620 2.7 GHz processor is similar and has the same thrughput of the Intel Xeon D-1528 1.9 GHz processor

    Why do you mean pfSense and netgate are setting on the Xeon D platform and not on the Celeron G based CPUs?
    Because they are scaling better? What do you think is better if both CPUs are running on 2,7GHz? For sure and that
    said, only for home usage use case and

    From my experience I can conclude that iperf loopback testing is not the best way to determine the maximum thrughput available from my hardware.
    Openvon is not able to exploit multi core processors and the best performances are only achieved by increasing the clock frequency or using AES-NI extensions.

    AES-NI is not really performing OpenVPN as many of the users are thinking of and that is the ground for disabling
    that by default in pfSense, multiple OpenVPN tunnels are able to spread over more CPU cores indeed, each tunnel
    over one CPU core, it is fairly said not a real multi-core usage but also better then all tunnels over one CPU core.

    What do you think?
    What is the best way to measure the maximum throughput of a firewall (as packet filtering capacity and encryption performance)?

    Getting something strong, with enough horse power but also with power saving capabilities on top. And this would for me personally
    the Intel Xeon E3 series since v3 for sure or the newer series such the Xeon D series. For others or only for the home usage it
    will be perhaps the best to go with a Celeron or Core i3, i5 or i7, this might be pending on other peoples budget, and/or the
    performance only is courting or based on other things.



  • Thanks to everyone for your answers but I think I have not well explained my intention.
    I do not want to get the maximum performance of my existing hardware nor reach the maximum available.

    I'm just looking for a method to establish a "performance index" for the processors that I use with pfsense.
    Another aim is to understand which is the best way to use processors (similar to Intel Xeon D) that have a high number of core and a relatively low clock frequency.

    More specifically I am interested in the ability to handle traffic (pps or gbps) and the ability of openvpn to encrypt/decrypt traffic.

    I know that the best test requires the use of more than one machine to simulate a real network as suggested by you. To tell the truth, I also do these tests but besides these results, I'm interested in the maximum capacity of the processor (I know this is only theorical).

    The choice of testing with PF and OpenVPN is dictated by the fact that in my projects are the two most used/stressed components.

    I often use pfsense not only as a firewall but also as a VLAN router. In these cases it is easy to handle many tens of Gbps especially if multiple links are used in LACP. In a project I have a 40 Gbps backbone that carries 20 VLANs and routing is in charge of a pfsense cluster.

    In other situations, I need to manage OpenVPN servers with hundreds of simultaneous connections. So, the maximum speed that can be reached at a point to point connection is not important (at first) but it is much more important to understand if the processor can encrypt/decrypt traffic to serve everyone. If, as it turns out, openvpn is not able to exploit the multi core then I know that I can achieve the result by creating more openvpn servers to take full advantage of the processor.

    I often find appliances equipped with numerous network interfaces and/or 10 Gbps nic and a poor processor. This is unhelpful for me, because in addition to mounting a 10 Gbps card I need a processor able to handle 10 Gbps traffic.
    If my processor has a raw capacity of 4 Mbps, mounting two 10 Gbps cards is crazy and I do not even need to do a iperf test with more than one machine to figure it out, just enough to have the processor capability measured with a method similar to the iperf loopback test.

    In any case, in your responses, I have confirmed that openvpn and pf, in current versions, can successfully use multi-core architectures, so when it comes to getting a lot of power, clock frequency plays a very important role. Can this be true?

    Fabio


  • Netgate Administrator

    You need to run at least some actual throughput tests to determine if your indexing test is at all accurate I would say.

    The Xeon-D CPUs you tested both have turbo speeds of 2.5 and 2.6GHz.

    pf is somewhat multithreaded but OpenVPN is not. You are not testing the complete system though so you might hit some other restriction you're not aware of.

    Steve