OpenVPN 2.4 AES-NI speed
-
Upgraded my Super Micro C2758 a few hours ago and ran some OpenVPN 2.4 speedtests.
I tried an UDP server with TLS Enc and Auth, DH ECDH Only, ECDH Curve default, AES-256-GCM, No LZO Compression. OpenVPN runs on 127.0.0.1 with Port-Forwards.
Using the latest Tunnelblick beta switched to OpenVPN 2.4 OpenSSL (also tried with my Windows 10 VM with the native OpenVPN client and an SMB transfer).
iperf3 server in LAN, MacBook Pro 2014 connected to my WAN Switch.
Tests were run with 3 streams with and afterwards without -R. I rebooted after turning AES-NI on or off.With the above OpenVPN parameters I'm getting roughly 250 MBit when downloading from LAN and 200 MBit uploading to LAN. It does not make a difference if AES-NI is enabled or not, OpenVPN always uses 100% on one core.
Am I using the wrong config for AES-NI to work?
-
Something's not right with AES-NI support in 2.4. I dropped from 250Mbps+ throughput to <100Mbps on the exact same link with no changes, except 2.4 wouldn't let me select AES-NI accelleration in my client settings. Something about merging bsd cryptodev into the kernel. Nobody admitted there was a problem though…
-
Something's not right with AES-NI support in 2.4. I dropped from 250Mbps+ throughput to <100Mbps on the exact same link with no changes, except 2.4 wouldn't let me select AES-NI accelleration in my client settings. Something about merging bsd cryptodev into the kernel. Nobody admitted there was a problem though…
same here, from 245Mbps+ to 97Mbps
-
there has to be a better way than speedtest's to determine if AESNI is been used, as throughput testing is dependent on network conditions.
Also that in FreeBSD 11 some tcp tunables have different defaults.
can anyone post the link to the cryptodev merge? as cryptodev slows things down it should not be enabled.
-
there has to be a better way than speedtest's to determine if AESNI is been used, as throughput testing is dependent on network conditions.
Also that in FreeBSD 11 some tcp tunables have different defaults.
can anyone post the link to the cryptodev merge? as cryptodev slows things down it should not be enabled.
I did A LOT of testing, and my VPN provider had no issues with throughput. See, in 2.3.2, I'd have AES-NI acceleration enabled in System>Advanced, and select the "bsd cryptodev" in the OpenVPN client settings. 2.4 wouldn't let me select anything in the client.
The link to the ticket is here, I believe: https://redmine.pfsense.org/issues/5976
-
AES-NI is basically impossible to turn off in OpenSSL+OpenVPN. The old button in pfsense just confused a lot of people into turning on cryptodev, which used AES-NI in a different way and which was actually slower than the built-in mechanism that didn't need anything selected. So there may be a problem, but it's not because you can't shoot yourself in the foot with cryptodev.
-
Furthermore, we should not load the AES-NI module at all.
Leaving everything at default might give the best AES-NI accelleration for OpenVPN…https://www.reddit.com/r/PFSENSE/comments/5nnwzy/openvpn_throughput_question/
https://www.reddit.com/r/PFSENSE/comments/5lric3/aesni_not_selectable_in_24_beta/Turning off the AES-NI gave me another 15 more MBit/s in "real life" usage, at 115MBit/s now.
Anyway, the only really speedy VPN solution I tested over the last years was Softether. You can select the number of concurrent HTTPS connections there. No problems maxing out my 400 MBit line with this thing.
-
Furthermore, we should not load the AES-NI module at all.
You need the aesni.ko module for AES accelerated ipsec. The key is that you don't need the cryptodev.ko module. (Unless you're running something exotic and in most cases obsolete.)
-
Ok I had a read of https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux which is interesting to say the least, as some information there I was not aware off.
What is interesting is that document states to use AESNI offloading one should add the 'engine aesni' flag to openvpn.
I checked the generated config on my pfsense unit and the flag is not set.
First I enabled jumbo frames and wow.
These results are on my braswell N3150 unit.
Previous settings using mtu 1500
root@PFSENSE etc # iperf3 -c 192.168.0.1 Connecting to host 192.168.0.1, port 5201 [ 4] local 192.168.0.2 port 27231 connected to 192.168.0.1 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 1.13 MBytes 9.47 Mbits/sec 6 130 MBytes [ 4] 1.00-2.00 sec 1.11 MBytes 9.34 Mbits/sec 0 148 MBytes [ 4] 2.00-3.00 sec 1.28 MBytes 10.8 Mbits/sec 0 164 MBytes [ 4] 3.00-4.00 sec 1.40 MBytes 11.8 Mbits/sec 0 182 MBytes [ 4] 4.00-5.00 sec 1.57 MBytes 13.2 Mbits/sec 0 200 MBytes [ 4] 5.00-6.00 sec 1.69 MBytes 14.1 Mbits/sec 0 216 MBytes [ 4] 6.00-7.02 sec 1.29 MBytes 10.7 Mbits/sec 1 1.60 MBytes [ 4] 7.02-8.00 sec 278 KBytes 2.31 Mbits/sec 178 12.8 MBytes [ 4] 8.00-9.00 sec 267 KBytes 2.19 Mbits/sec 0 45.1 MBytes [ 4] 9.00-10.00 sec 459 KBytes 3.75 Mbits/sec 0 62.9 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 10.5 MBytes 8.77 Mbits/sec 185 sender [ 4] 0.00-10.00 sec 10.1 MBytes 8.49 Mbits/sec receiver iperf Done.
new results using mtu 6000 and disabled fragmentation
root@PFSENSE etc # iperf3 -c 192.168.0.1 Connecting to host 192.168.0.1, port 5201 [ 4] local 192.168.0.2 port 10646 connected to 192.168.0.1 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 892 KBytes 7.31 Mbits/sec 0 636 MBytes [ 4] 1.00-2.00 sec 1.36 MBytes 11.4 Mbits/sec 0 941 MBytes [ 4] 2.00-3.00 sec 1.88 MBytes 15.8 Mbits/sec 0 1.25 GBytes [ 4] 3.00-4.00 sec 1.81 MBytes 15.2 Mbits/sec 3 723 MBytes [ 4] 4.00-5.00 sec 854 KBytes 6.99 Mbits/sec 1 643 MBytes [ 4] 5.00-6.00 sec 1.34 MBytes 11.2 Mbits/sec 0 982 MBytes [ 4] 6.00-7.00 sec 1.96 MBytes 16.4 Mbits/sec 0 1.29 GBytes [ 4] 7.00-8.00 sec 1.49 MBytes 12.5 Mbits/sec 7 237 MBytes [ 4] 8.00-9.00 sec 964 KBytes 7.91 Mbits/sec 0 677 MBytes [ 4] 9.00-10.00 sec 1.40 MBytes 11.7 Mbits/sec 0 1016 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 13.9 MBytes 11.6 Mbits/sec 11 sender [ 4] 0.00-10.00 sec 13.6 MBytes 11.4 Mbits/sec receiver iperf Done.
Now the next issue is that my pfSense unit is not cpu bound.
Adding 'engine aesni' prevents openvpn from starting so I believe that document is now out of date and I believe AESNI to be used by default when supported.
I then swapped intel RDRAND to no acceleration in the gui settings.
Results were same when factoring random variance.
Finally I compared cpu usage during the test run. It seems the same regardless of what the option is set to although I did see higher spikes with it off but not sustained usage.
In my case around 11mbit/sec of throughput uses about 1.5-2% sustained cpu usage, this is using AES 256-GCM and SHA1 hash.
Compared to my previous vpn endpoint my asus ac68 which was overclocked 50% over its stock settings, that used about 70% cpu usage for just 6mbit/sec throughput.
The performance difference is way bigger than the raw cpu power difference so this suggests on my unit AES offloading is working.
I hope this helps others.
-
Ok I had a read of https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux which is interesting to say the least, as some information there I was not aware off.
What is interesting is that document states to use AESNI offloading one should add the 'engine aesni' flag to openvpn.
Note at the bottom where it says "last modified 6 years ago"; out of date documents tend to propagate confused ideas that are really hard to stamp out. (Witness all the people convinced that turning on cryptodev aes makes their cpu perform aes in essentially zero time, because someone screwed up a command line for openssl a decade ago.) In general, if there's a simple configuration tweak that's supposedly a major performance improvement, but years later it isn't the default, there's probably a reason.
First I enabled jumbo frames and wow.
Yup, absolutely killer for running iperf on a LAN. Not great for real world use.
-
AES 256-GCM and SHA1 hash
SHA1 for the data channel is not used in this case, OpenVPN will ignore it because AES-GCM already includes hashing.
Side note:
Selecting a different –cipher, AES-CBC for instance, can lead to it being ignored if the client (2.4) supports a "better" cipher, AES-GCM.
This behaviour is called NCP (Negotiable Crypto Parameters) and if one specifically wants to have AES-CBC then one has to set
--ncp-disable in config.
See manual 2.4, start reading at "--auth alg" -
the settings are now different, I was just reporting what was in use at the time of the test.
For reference I now have it using aes 128-gcm as I consider aes 256 wasting resources, and sha256 for the reason you said, sha when used with gcm is only used for the control channel so the impact of strengthening it wont be noticeable.
I am pretty sure aes hardware offloading is working in my case regardless of the settings tho.
-
AES-NI is basically impossible to turn off in OpenSSL+OpenVPN. The old button in pfsense just confused a lot of people into turning on cryptodev, which used AES-NI in a different way and which was actually slower than the built-in mechanism that didn't need anything selected. So there may be a problem, but it's not because you can't shoot yourself in the foot with cryptodev.
How is it when in 2.3 With AES-NI module and cryptodev selected i can saturate my 300Mbps connection over VPN, but in 2.4 with a variance of modules loaded/unloaded/selected/unselected I lost nearly 75% of my throughput through the same link?
-
How is it when in 2.3 With AES-NI module and cryptodev selected i can saturate my 300Mbps connection over VPN, but in 2.4 with a variance of modules loaded/unloaded/selected/unselected I lost nearly 75% of my throughput through the same link?
I don't know, except that it's not because you lack AES-NI. A lot of things changed, and it's really hard to debug your box from here. I've never seen a system where aesni.ko + cryptodev was faster than not, so you could probably speed things up on 2.3 by turning it off. In 2.4 are you running 100% cpu? If not, that would suggest a networking bottleneck somewhere.
-
I don't know, except that it's not because you lack AES-NI. A lot of things changed, and it's really hard to debug your box from here. I've never seen a system where aesni.ko + cryptodev was faster than not, so you could probably speed things up on 2.3 by turning it off. In 2.4 are you running 100% cpu? If not, that would suggest a networking bottleneck somewhere.
In 2.3 turning off cryptodev reduces throughput.
Got dual E5-2670's with 4 cores provisioned (not that it matters for single-threaded), and CPU hits ~50% in 2.3, and when I tried 2.4 it didn't go over 10%, suggesting it wasn't working properly. -
CPU hits ~50% in 2.3, and when I tried 2.4 it didn't go over 10%, suggesting it wasn't working properly.
That pretty much confirms that it isn't an AESNI problem, or you'd be pegged at 100% of a core doing crypto operations. So it's mostly likely either network or openvpn config related.
-
CPU hits ~50% in 2.3, and when I tried 2.4 it didn't go over 10%, suggesting it wasn't working properly.
That pretty much confirms that it isn't an AESNI problem, or you'd be pegged at 100% of a core doing crypto operations. So it's mostly likely either network or openvpn config related.
Depends on how you're viewing CPU usage. In a dual core box, 50% could mean one core is 100% utilized in some utilities.
-
CPU hits ~50% in 2.3, and when I tried 2.4 it didn't go over 10%, suggesting it wasn't working properly.
That pretty much confirms that it isn't an AESNI problem, or you'd be pegged at 100% of a core doing crypto operations. So it's mostly likely either network or openvpn config related.
Depends on how you're viewing CPU usage. In a dual core box, 50% could mean one core is 100% utilized in some utilities.
Yeah, but 10% utilization might indicate a cpu-bottlenecked process on a 10 core system, not a 4 core system.
Rereading, this talk of "provisioning" suggest a VM is involved–possible regression in that area?
-
A little OT regarding AES-NI, but I just came across those and put them into the Custom Options box:
sndbuf 393216; rcvbuf 393216; push "sndbuf 393216"; push "rcvbuf 393216";
Source:
http://winaero.com/blog/speed-up-openvpn-and-get-faster-speed-over-its-channel/Doubled my Windows OpenVPN Speed over the Internet. Now I'm close to the max. 200MBit that I normally only get when connecting directly via Ethernet to my WAN switch :)
-
Was curious if anyone else is still experiencing this? Testing openvpn with aes-256-gcm on 2.4.0.b.20170311.1958 my C2758 is pegging 1 cpu core and tapping out at ~150Mbit/s tops. Should drastic performance improvements be expected in the future or do i need to bite the bullet and upgrade to faster hardware to hit ~400Mbit/s? IPSEC isn't an option , i need policy based routing over the vpn.