OpenVPN 2.4 AES-NI speed



  • Upgraded my Super Micro C2758 a few hours ago and ran some OpenVPN 2.4 speedtests.
    I tried an UDP server with TLS Enc and Auth, DH ECDH Only, ECDH Curve default, AES-256-GCM, No LZO Compression. OpenVPN runs on 127.0.0.1 with Port-Forwards.
    Using the latest Tunnelblick beta switched to OpenVPN 2.4 OpenSSL (also tried with my Windows 10 VM with the native OpenVPN client and an SMB transfer).
    iperf3 server in LAN, MacBook Pro 2014 connected to my WAN Switch.
    Tests were run with 3 streams with and afterwards without -R. I rebooted after turning AES-NI on or off.

    With the above OpenVPN parameters I'm getting roughly 250 MBit when downloading from LAN and 200 MBit uploading to LAN. It does not make a difference if AES-NI is enabled or not, OpenVPN always uses 100% on one core.

    Am I using the wrong config for AES-NI to work?



  • Something's not right with AES-NI support in 2.4. I dropped from 250Mbps+ throughput to <100Mbps on the exact same link with no changes, except 2.4 wouldn't let me select AES-NI accelleration in my client settings. Something about merging bsd cryptodev into the kernel. Nobody admitted there was a problem though…



  • @BlackDwarf:

    Something's not right with AES-NI support in 2.4. I dropped from 250Mbps+ throughput to <100Mbps on the exact same link with no changes, except 2.4 wouldn't let me select AES-NI accelleration in my client settings. Something about merging bsd cryptodev into the kernel. Nobody admitted there was a problem though…

    same here, from 245Mbps+ to 97Mbps



  • there has to be a better way than speedtest's to determine if AESNI is been used, as throughput testing is dependent on network conditions.

    Also that in FreeBSD 11 some tcp tunables have different defaults.

    can anyone post the link to the cryptodev merge? as cryptodev slows things down it should not be enabled.



  • @chrcoluk:

    there has to be a better way than speedtest's to determine if AESNI is been used, as throughput testing is dependent on network conditions.

    Also that in FreeBSD 11 some tcp tunables have different defaults.

    can anyone post the link to the cryptodev merge? as cryptodev slows things down it should not be enabled.

    I did A LOT of testing, and my VPN provider had no issues with throughput. See, in 2.3.2, I'd have AES-NI acceleration enabled in System>Advanced, and select the "bsd cryptodev" in the OpenVPN client settings. 2.4 wouldn't let me select anything in the client.

    The link to the ticket is here, I believe: https://redmine.pfsense.org/issues/5976



  • AES-NI is basically impossible to turn off in OpenSSL+OpenVPN. The old button in pfsense just confused a lot of people into turning on cryptodev, which used AES-NI in a different way and which was actually slower than the built-in mechanism that didn't need anything selected. So there may be a problem, but it's not because you can't shoot yourself in the foot with cryptodev.



  • Furthermore, we should not load the AES-NI module at all.
    Leaving everything at default might give the best AES-NI accelleration for OpenVPN…

    https://www.reddit.com/r/PFSENSE/comments/5nnwzy/openvpn_throughput_question/
    https://www.reddit.com/r/PFSENSE/comments/5lric3/aesni_not_selectable_in_24_beta/

    Turning off the AES-NI gave me another 15 more MBit/s in "real life" usage, at 115MBit/s now.

    Anyway, the only really speedy VPN solution I tested over the last years was Softether. You can select the number of concurrent HTTPS connections there. No problems maxing out my 400 MBit line with this thing.



  • @athurdent:

    Furthermore, we should not load the AES-NI module at all.

    You need the aesni.ko module for AES accelerated ipsec. The key is that you don't need the cryptodev.ko module. (Unless you're running something exotic and in most cases obsolete.)



  • Ok I had a read of https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux which is interesting to say the least, as some information there I was not aware off.

    What is interesting is that document states to use AESNI offloading one should add the 'engine aesni' flag to openvpn.

    I checked the generated config on my pfsense unit and the flag is not set.

    First I enabled jumbo frames and wow.

    These results are on my braswell N3150 unit.

    Previous settings using mtu 1500

    root@PFSENSE etc # iperf3 -c 192.168.0.1
    Connecting to host 192.168.0.1, port 5201
    [  4] local 192.168.0.2 port 27231 connected to 192.168.0.1 port 5201
    [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
    [  4]   0.00-1.00   sec  1.13 MBytes  9.47 Mbits/sec    6    130 MBytes       
    [  4]   1.00-2.00   sec  1.11 MBytes  9.34 Mbits/sec    0    148 MBytes       
    [  4]   2.00-3.00   sec  1.28 MBytes  10.8 Mbits/sec    0    164 MBytes       
    [  4]   3.00-4.00   sec  1.40 MBytes  11.8 Mbits/sec    0    182 MBytes       
    [  4]   4.00-5.00   sec  1.57 MBytes  13.2 Mbits/sec    0    200 MBytes       
    [  4]   5.00-6.00   sec  1.69 MBytes  14.1 Mbits/sec    0    216 MBytes       
    [  4]   6.00-7.02   sec  1.29 MBytes  10.7 Mbits/sec    1   1.60 MBytes       
    [  4]   7.02-8.00   sec   278 KBytes  2.31 Mbits/sec  178   12.8 MBytes       
    [  4]   8.00-9.00   sec   267 KBytes  2.19 Mbits/sec    0   45.1 MBytes       
    [  4]   9.00-10.00  sec   459 KBytes  3.75 Mbits/sec    0   62.9 MBytes       
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-10.00  sec  10.5 MBytes  8.77 Mbits/sec  185             sender
    [  4]   0.00-10.00  sec  10.1 MBytes  8.49 Mbits/sec                  receiver
    
    iperf Done.
    

    new results using mtu 6000 and disabled fragmentation

    root@PFSENSE etc # iperf3 -c 192.168.0.1
    Connecting to host 192.168.0.1, port 5201
    [  4] local 192.168.0.2 port 10646 connected to 192.168.0.1 port 5201
    [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
    [  4]   0.00-1.00   sec   892 KBytes  7.31 Mbits/sec    0    636 MBytes       
    [  4]   1.00-2.00   sec  1.36 MBytes  11.4 Mbits/sec    0    941 MBytes       
    [  4]   2.00-3.00   sec  1.88 MBytes  15.8 Mbits/sec    0   1.25 GBytes       
    [  4]   3.00-4.00   sec  1.81 MBytes  15.2 Mbits/sec    3    723 MBytes       
    [  4]   4.00-5.00   sec   854 KBytes  6.99 Mbits/sec    1    643 MBytes       
    [  4]   5.00-6.00   sec  1.34 MBytes  11.2 Mbits/sec    0    982 MBytes       
    [  4]   6.00-7.00   sec  1.96 MBytes  16.4 Mbits/sec    0   1.29 GBytes       
    [  4]   7.00-8.00   sec  1.49 MBytes  12.5 Mbits/sec    7    237 MBytes       
    [  4]   8.00-9.00   sec   964 KBytes  7.91 Mbits/sec    0    677 MBytes       
    [  4]   9.00-10.00  sec  1.40 MBytes  11.7 Mbits/sec    0   1016 MBytes       
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-10.00  sec  13.9 MBytes  11.6 Mbits/sec   11             sender
    [  4]   0.00-10.00  sec  13.6 MBytes  11.4 Mbits/sec                  receiver
    
    iperf Done.
    

    Now the next issue is that my pfSense unit is not cpu bound.

    Adding 'engine aesni' prevents openvpn from starting so I believe that document is now out of date and I believe AESNI to be used by default when supported.

    I then swapped intel RDRAND to no acceleration in the gui settings.

    Results were same when factoring random variance.

    Finally I compared cpu usage during the test run.  It seems the same regardless of what the option is set to although I did see higher spikes with it off but not sustained usage.

    In my case around 11mbit/sec of throughput uses about 1.5-2% sustained cpu usage, this is using AES 256-GCM and SHA1 hash.

    Compared to my previous vpn endpoint my asus ac68 which was overclocked 50% over its stock settings, that used about 70% cpu usage for just 6mbit/sec throughput.

    The performance difference is way bigger than the raw cpu power difference so this suggests on my unit AES offloading is working.

    I hope this helps others.



  • @chrcoluk:

    Ok I had a read of https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux which is interesting to say the least, as some information there I was not aware off.

    What is interesting is that document states to use AESNI offloading one should add the 'engine aesni' flag to openvpn.

    Note at the bottom where it says "last modified 6 years ago"; out of date documents tend to propagate confused ideas that are really hard to stamp out. (Witness all the people convinced that turning on cryptodev aes makes their cpu perform aes in essentially zero time, because someone screwed up a command line for openssl a decade ago.) In general, if there's a simple configuration tweak that's supposedly a major performance improvement, but years later it isn't the default, there's probably a reason.

    First I enabled jumbo frames and wow.

    Yup, absolutely killer for running iperf on a LAN. Not great for real world use.



  • AES 256-GCM and SHA1 hash

    SHA1 for the data channel is not used in this case, OpenVPN will ignore it because AES-GCM already includes hashing.

    Side note:
    Selecting a different –cipher, AES-CBC for instance, can lead to it being ignored if the client (2.4) supports a "better" cipher, AES-GCM.
    This behaviour is called NCP (Negotiable Crypto Parameters) and if one specifically wants to have AES-CBC then one has to set
    --ncp-disable in config.
    See manual 2.4, start reading at "--auth alg"



  • the settings are now different, I was just reporting what was in use at the time of the test.

    For reference I now have it using aes 128-gcm as I consider aes 256 wasting resources, and sha256 for the reason you said, sha when used with gcm is only used for the control channel so the impact of strengthening it wont be noticeable.

    I am pretty sure aes hardware offloading is working in my case regardless of the settings tho.



  • @VAMike:

    AES-NI is basically impossible to turn off in OpenSSL+OpenVPN. The old button in pfsense just confused a lot of people into turning on cryptodev, which used AES-NI in a different way and which was actually slower than the built-in mechanism that didn't need anything selected. So there may be a problem, but it's not because you can't shoot yourself in the foot with cryptodev.

    How is it when in 2.3 With AES-NI module and cryptodev selected i can saturate my 300Mbps connection over VPN, but in 2.4 with a variance of modules loaded/unloaded/selected/unselected I lost nearly 75% of my throughput through the same link?



  • @BlackDwarf:

    How is it when in 2.3 With AES-NI module and cryptodev selected i can saturate my 300Mbps connection over VPN, but in 2.4 with a variance of modules loaded/unloaded/selected/unselected I lost nearly 75% of my throughput through the same link?

    I don't know, except that it's not because you lack AES-NI. A lot of things changed, and it's really hard to debug your box from here. I've never seen  a system where aesni.ko + cryptodev was faster than not, so you could probably speed things up on 2.3 by turning it off. In 2.4 are you running 100% cpu? If not, that would suggest a networking bottleneck somewhere.



  • @VAMike:

    I don't know, except that it's not because you lack AES-NI. A lot of things changed, and it's really hard to debug your box from here. I've never seen  a system where aesni.ko + cryptodev was faster than not, so you could probably speed things up on 2.3 by turning it off. In 2.4 are you running 100% cpu? If not, that would suggest a networking bottleneck somewhere.

    In 2.3 turning off cryptodev reduces throughput.
    Got dual E5-2670's with 4 cores provisioned (not that it matters for single-threaded), and CPU hits ~50% in 2.3, and when I tried 2.4 it didn't go over 10%, suggesting it wasn't working properly.



  • @BlackDwarf:

    CPU hits ~50% in 2.3, and when I tried 2.4 it didn't go over 10%, suggesting it wasn't working properly.

    That pretty much confirms that it isn't an AESNI problem, or you'd be pegged at 100% of a core doing crypto operations. So it's mostly likely either network or openvpn config related.


  • Rebel Alliance Developer Netgate

    @VAMike:

    @BlackDwarf:

    CPU hits ~50% in 2.3, and when I tried 2.4 it didn't go over 10%, suggesting it wasn't working properly.

    That pretty much confirms that it isn't an AESNI problem, or you'd be pegged at 100% of a core doing crypto operations. So it's mostly likely either network or openvpn config related.

    Depends on how you're viewing CPU usage. In a dual core box, 50% could mean one core is 100% utilized in some utilities.



  • @jimp:

    @VAMike:

    @BlackDwarf:

    CPU hits ~50% in 2.3, and when I tried 2.4 it didn't go over 10%, suggesting it wasn't working properly.

    That pretty much confirms that it isn't an AESNI problem, or you'd be pegged at 100% of a core doing crypto operations. So it's mostly likely either network or openvpn config related.

    Depends on how you're viewing CPU usage. In a dual core box, 50% could mean one core is 100% utilized in some utilities.

    Yeah, but 10% utilization might indicate a cpu-bottlenecked process on a 10 core system, not a 4 core system.

    Rereading, this talk of "provisioning" suggest a VM is involved–possible regression in that area?



  • A little OT regarding AES-NI, but I just came across those and put them into the Custom Options box:

    sndbuf 393216;
    rcvbuf 393216;
    push "sndbuf 393216";
    push "rcvbuf 393216";
    

    Source:
    http://winaero.com/blog/speed-up-openvpn-and-get-faster-speed-over-its-channel/

    Doubled my Windows OpenVPN Speed over the Internet. Now I'm close to the max. 200MBit that I normally only get when connecting directly via Ethernet to my WAN switch :)



  • Was curious if anyone else is still experiencing this? Testing openvpn with aes-256-gcm on 2.4.0.b.20170311.1958 my C2758 is pegging 1 cpu core and tapping out at ~150Mbit/s tops. Should drastic performance improvements be expected in the future or do i need to bite the bullet and upgrade to faster hardware to hit ~400Mbit/s? IPSEC isn't an option , i need policy based routing over the vpn.



  • @diablo266:

    Was curious if anyone else is still experiencing this? Testing openvpn with aes-256-gcm on 2.4.0.b.20170311.1958 my C2758 is pegging 1 cpu core and tapping out at ~150Mbit/s tops. Should drastic performance improvements be expected in the future or do i need to bite the bullet and upgrade to faster hardware to hit ~400Mbit/s? IPSEC isn't an option , i need policy based routing over the vpn.

    That's about what you can expect out of a C2758. You can run multiple OpenVPN processes to scale across more cores, at the cost of configuration complexity.



  • On somewhat the same note - I found running pfsense in a hyper-v VM, negates having AES-NI, even though it shows up.  Moved my router to a 7700K, and now my PIA VPN connection gets 100% of my speed.  Without a VPN I get 340mb, with the VPN, I still get 340.  Inside hyper-v, I was lucky to get 120mb.

    I will never put Pfsense in a VM again I guess lol.


  • Banned

    @diablo266:

    Was curious if anyone else is still experiencing this? Testing openvpn with aes-256-gcm on 2.4.0.b.20170311.1958 my C2758 is pegging 1 cpu core and tapping out at ~150Mbit/s tops. Should drastic performance improvements be expected in the future or do i need to bite the bullet and upgrade to faster hardware to hit ~400Mbit/s? IPSEC isn't an option , i need policy based routing over the vpn.

    Try AES-128

    @psulions5:

    I found running pfsense in a hyper-v VM, negates having AES-NI, even though it shows up.  Moved my router to a 7700K, and now my PIA VPN connection gets 100% of my speed.  Without a VPN I get 340mb, with the VPN, I still get 340.  Inside hyper-v, I was lucky to get 120mb.

    I will never put Pfsense in a VM again I guess lol.

    There are plenty of people with working AES-NI in a VM.
    And of course a 7700K maxes out 340Mbps VPN  ::), so will a G3950.



  • @pfBasic:

    @diablo266:

    Was curious if anyone else is still experiencing this? Testing openvpn with aes-256-gcm on 2.4.0.b.20170311.1958 my C2758 is pegging 1 cpu core and tapping out at ~150Mbit/s tops. Should drastic performance improvements be expected in the future or do i need to bite the bullet and upgrade to faster hardware to hit ~400Mbit/s? IPSEC isn't an option , i need policy based routing over the vpn.

    Try AES-128

    @psulions5:

    I found running pfsense in a hyper-v VM, negates having AES-NI, even though it shows up.  Moved my router to a 7700K, and now my PIA VPN connection gets 100% of my speed.  Without a VPN I get 340mb, with the VPN, I still get 340.  Inside hyper-v, I was lucky to get 120mb.

    I will never put Pfsense in a VM again I guess lol.

    There are plenty of people with working AES-NI in a VM.
    And of course a 7700K maxes out 340Mbps VPN  ::), so will a G3950.

    Don't buzz kill, Im excited haha :).  Now if I could get this darn thing to reboot, id be in business! :).  Only 2.4 supports the AES-NI right?



  • Only 2.4 supports the AES-NI right?

    No.
    OpenVPN uses OpenSSL for the crypto part.
    Support for the AES-NI instruction set was included in OpenSSL 1.0.0.



  • OpenVPN 2.4 adds support for the AES-GCM algorithm, which takes full advantage of the AES-NI hardware acceleration without also requiring the CPU to compute the hash for authentication. Up until OpenVPN 2.4, the only way to use that algorithm with pfSense was IPSEC, I believe. That lets you use your CPU for other functions rather than supporting the VPN connection. (yeah, technically it's all built into the processor, so it's really doing everything anyway, but AES-NI with AES-GCM doesn't affect CPU cycles available for other tasks).



  • @virgiliomi:

    OpenVPN 2.4 adds support for the AES-GCM algorithm, which takes full advantage of the AES-NI hardware acceleration without also requiring the CPU to compute the hash for authentication. Up until OpenVPN 2.4, the only way to use that algorithm with pfSense was IPSEC, I believe. That lets you use your CPU for other functions rather than supporting the VPN connection. (yeah, technically it's all built into the processor, so it's really doing everything anyway, but AES-NI with AES-GCM doesn't affect CPU cycles available for other tasks).

    This is mostly not true/confused. AES-GCM is a new cryptographic mode that combines encryption and authentication instead of using a separate algorithm for authentication. (As was historically the case with AES+SHA1 or AES+SHA256 or AES+UMAC, etc.) GCM is dramatically faster than AES-CBC+HMAC on amd/intel architecture CPUs, especially those with the carry-less multiplication operators (PCLMULQDQ, etc.), because it pipelines well. It is not the case that AES-GCM "uses the AES-NI more", it's that the algorithm is simply more efficient on current CPUs. (The catch is that it's either slower or impossible to implement on other kinds of cryptographic accelerators, so it's generally less efficient on older mobile devices or things like intel's quick assist.) AES-GCM doesn't affect CPU cycles for other tasks any differently than AES-CBC except insofar as it may require fewer cycles. (You may be confusing AES-NI with older architectures which used a distinct processor for crypto: in those, you could do other things with the main CPU while the coprocessor was doing crypto.) You generally won't see a dramatic speedup moving OpenVPN to AES-GCM because its architecture prevents the CPU from being able to really crunch on large blocks of data. It'll be a somewhat more efficient (and more secure) option, but it won't work miracles.

    FWIW, the lastest intel/amd CPUs include SHA acceleration, so there's hardware acceleration for both encryption and authentication with AES-CBC-SHA1 just as there is with AES-GCM (using AES-NI+PCLMULQDQ). AES-GCM is still faster. The fact that there is a faster cipher mode doesn't make a different cipher mode less accelerated–AES-CBC with AES-NI is still tremendously faster than AES-CBC without AES-NI.


Log in to reply