OpenVPN and AES NI Hardware Crypto Acceleration
-
I added a VPN connection from pfSense Box (FW-7525) to NordVPN following their online tutorial. VPN connection comes up and works. Happy with everything but would like to make sure I've optimized the throughput of my box.
I enable hardware acceleration and come up with more questions than answers. Despite efforts on my part, my CPU appears to max out at about ~25-30% (one core saturated?) and my VPN speed tops out at 100mbps despite having a 300mbps connection. I'd like to better understand my results regarding software/hardware cyrpto acceleration and hopefully, fix the problem.
Details:
Without VPN:
Current ISP speed without VPN is 300/300Mbps.
CPU usage without VPN is ~15-20% during speed test at 300/300.With VPN with no hardware acceleration:
Speed over VPN is 100/100Mbps.
CPU usage with VPN is ~25-30% (I assume this means one core is at max).With VPN with hardware acceleration:
Speed over VPN is 100/100Mbps.
CPU usage with VPN is ~25-30% (I assume this means one core is at max).At this point it is important to consider that my VPN provider cannot support 300/300. This is definitely a fair question. My assumption is that because one of my CPU cores is maxed out, my box is the limitation on the link.
Basic NordVPN set up disables hardware encryption acceleration.
Turn on hardware acceleration on:
System/Advanced/Misc/CryptographicHardware AES-NI BSD Crypto.Confirmed on System Screen:
pfSense 2.4.4-RELEASE-p3 (amd64)
built on Wed May 15 18:53:44 EDT 2019
FreeBSD 11.2-RELEASE-p10CPU Type Intel(R) Atom(TM) CPU C2518 @ 1.74GHz
4 CPUs: 1 package(s) x 4 core(s)
AES-NI CPU Crypto: Yes (active)
Hardware crypto AES-CBC,AES-XTS,AES-GCM,AES-ICMSet OpenVPN to use hardware encryption:
VPN/OpenVPN/CLIENTS/Edit/Hardware Crypto: BSD CryptoDev EngineI also forced the VPN connection to use AES-256-CBC rather than its default AES-256-GCM because CBC can be hardware accelerated but CGM cannot. (Logs confirm that the encryption method on the link changes as requested).
Using console, verify hardware encryption is running.
openssl engine -c -t
(cryptodev) BSD cryptodev engine
[RSA, DSA, DH, AES-128-CBC, AES-192-CBC, AES-256-CBC]
[ available ]
(rdrand) Intel RDRAND engine
[RAND]
[ available ]
(dynamic) Dynamic engine loading support
[ unavailable ]Test Hardware Acceleration:
openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 480838 aes-256-cbc's in 0.55s
Doing aes-256-cbc for 3s on 64 size blocks: 463801 aes-256-cbc's in 0.52s
Doing aes-256-cbc for 3s on 256 size blocks: 382791 aes-256-cbc's in 0.50s
Doing aes-256-cbc for 3s on 1024 size blocks: 241160 aes-256-cbc's in 0.25s
Doing aes-256-cbc for 3s on 8192 size blocks: 51912 aes-256-cbc's in 0.08s
OpenSSL 1.0.2o-freebsd 27 Mar 2018
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 14067.95k 57567.54k 195988.99k 987791.36k 5443367.73kTest without Hardware Acceleration:
openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 3897780 aes-256 cbc's in 3.02s
Doing aes-256 cbc for 3s on 64 size blocks: 1067949 aes-256 cbc's in 3.03s
Doing aes-256 cbc for 3s on 256 size blocks: 276845 aes-256 cbc's in 3.04s
Doing aes-256 cbc for 3s on 1024 size blocks: 190522 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 8192 size blocks: 24472 aes-256 cbc's in 3.05s
OpenSSL 1.0.2o-freebsd 27 Mar 2018
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256 cbc 20680.45k 22548.04k 23320.45k 64862.60k 65628.52kQuestions:
-
Can I assume based on the console tests and the significant difference in CPU time listed in the encryption tests that the hardware encryption is actually working?
-
I see no reduction in CPU loading and little or no change in throughput with or without hardware acceleration. Is this normal for this hardware at these data rates?
-
I've tried changing the send and receive buffer sizes from 512kB to 1 and 2 MB with no apparent effect. However, the log contains a PUSH message that seems to suggest the buffer size is being forced to 512k from the VPN server:
PUSH: Received control message: 'PUSH_REPLY,redirect-gateway def1,dhcp-option DNS xxx.xxx.xxx.xxx,dhcp-option DNS xxx.xxx.xxx.xxx,sndbuf 524288,rcvbuf 524288,explicit-exit-notify,comp-lzo no,route-gateway xxx.xxx.xxx.xxx,topology subnet,ping 60,ping-restart 180,ifconfig xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx,peer-id 6'
Is there something else to do in this regard?
Sorry for the long post. I tried to be thorough to help in the analysis. I really would like to get better performance out of my box, or at least get enough understanding of what is going on so that if I were to purchase other hardware, I'd end up with something that would work at these speeds.
-
-
I did have this same issue with Nord. I believe it is something to with their config. When I would setup Nord to run on my 7gen intel pfsense box, i would get 120md down before i routed my traffic to it. When I routed my traffic to my nord connection I would lose approx 75 to 80% of my bandwidth. I was told by Nord that it was an openvpn issue. So I decided to test it with ExpressVPN. I did not experience any bandwidth loss with ExpressVPN. Take it for what it's worth.