OpenVPN speed on AES-NI supported CPU



  • Hi all.

    Recently I've built a DIY router using gigabyte N3150N-D3V motherboard which has N3150 CPU. This is a 4-core CPU with AES-NI support (https://ark.intel.com/products/87258/Intel-Celeron-Processor-N3150-2M-Cache-up-to-2_08-GHz). Base frequency is 1.60 Ghz. I installed pfsense (nightly 2.4.0.b build). I set up OpenVPN client (which uses AES-256-CBC, SHA-512 and 1MiB buffers) and activated 'AES-NI CPU-based acceleration' in 'System / Advanced / Miscellaneous' (default value was 'BSD crypto device (cryptodev)'). Main page shows: 'AES-NI CPU Crypto: Yes (active)' nad 'Hardware crypto AES-CBC,AES-XTS,AES-GCM,AES-ICM'.

    Regardless, openvpn process is using around 25-33% of CPU giving around 24Mbps on a 50Mbps link. Am I doing something wrong? I expected CPU with AES-NI support should lower CPU usage of openvpn. And I think the speed is only about 24Mbps because of that high CPU usage.

    Thanks in advance.

    P.S. some testing made:

    'openssl speed -evp aes-256-cbc' gives me:

    Doing aes-256-cbc for 3s on 16 size blocks: 22791621 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 64 size blocks: 7994445 aes-256-cbc's in 2.99s
    Doing aes-256-cbc for 3s on 256 size blocks: 2301198 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 1024 size blocks: 607224 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 8192 size blocks: 75187 aes-256-cbc's in 3.03s
    OpenSSL 1.0.2k-freebsd  26 Jan 2017
    built on: date not available
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: clang
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-256-cbc     121555.31k   170993.46k   196368.90k   207265.79k   203194.03k
    


  • @pf_newbie:

    Recently I've built a DIY router using gigabyte N3150N-D3V motherboard which has N3150 CPU. This is a 4-core CPU with AES-NI support (https://ark.intel.com/products/87258/Intel-Celeron-Processor-N3150-2M-Cache-up-to-2_08-GHz). Base frequency is 1.60 Ghz. I installed pfsense (nightly 2.4.0.b build). I set up OpenVPN client (which uses AES-256-CBC, SHA-512 and 1MiB buffers) and activated 'AES-NI CPU-based acceleration' in 'System / Advanced / Miscellaneous' (default value was 'BSD crypto device (cryptodev)'). Main page shows: 'AES-NI CPU Crypto: Yes (active)' nad 'Hardware crypto AES-CBC,AES-XTS,AES-GCM,AES-ICM'.

    It hasn't been necessary to play with any settings to get AES-NI accleration in years. People should just stay out of the crypto acceleration screen at this point, don't be confused by guides written a decade ago.

    Regardless, openvpn process is using around 25-33% of CPU giving around 24Mbps on a 50Mbps link. Am I doing something wrong? I expected CPU with AES-NI support should lower CPU usage of openvpn. And I think the speed is only about 24Mbps because of that high CPU usage.

    Thanks in advance.

    P.S. some testing made:

    'openssl speed -evp aes-256-cbc' gives me:

    The benchmark numbers look reasonable. If possible, try switching to AES-128-GCM. Note that in your scenario your crypto bottleneck is the SHA512 somewhere around 50MByte/s, not the AES itself (GCM mode doesn't need SHA). SHA1 would increase performance, but there is a security tradeoff there, ditto switching to AES-128-CBC. (Note the mode: AES-128-GCM is roughly as secure as AES-128-CBC.) In general you should be able to go a bit faster, but you'll bottleneck inside openvpn long before you hit the theoretical crypto limits of the hardware. (The N3150 just isn't clocked very high.)



  • This is exactly the thread I was looking for as I want to order a small AES-NI enabled box for pfsense which would be able to reach at least 75mbps on OpenVPN (256). I am willing to wait if pfsense needs some time/tuning to improve its performance but I don't want to be limited by the hardware.

    Can anyone give me a hint on a cheap box that meets this requirements?

    • 4 NICs - preferably Intel ones
    • AES-NI enabled CPU, pfsense2.4 ready
    • fanless or with fan that would start only when it gets hot (hate noise)
    • available in UK or shipped to

    Regarding your results, am I wrong to assume that probably OpenVPN is saturating one of the cores and not using any of the others, this being the cause for limit in performance? Does this means that with two OpenVPN connections in parallel you could like double your speed? I am wondering how hard it would be do this with pfsense, especially assuring that those connections are running on different cores and that the routing is balancing the traffic between the those two.

    PS. Please don't propose repurposing other hardware, I do value having small box.

    Thanks
    Sorin



  • @pf_newbie:

    Hi all.

    Recently I've built a DIY router using gigabyte N3150N-D3V motherboard which has N3150 CPU. This is a 4-core CPU with AES-NI support (https://ark.intel.com/products/87258/Intel-Celeron-Processor-N3150-2M-Cache-up-to-2_08-GHz). Base frequency is 1.60 Ghz. I installed pfsense (nightly 2.4.0.b build). I set up OpenVPN client (which uses AES-256-CBC, SHA-512 and 1MiB buffers) and activated 'AES-NI CPU-based acceleration' in 'System / Advanced / Miscellaneous' (default value was 'BSD crypto device (cryptodev)'). Main page shows: 'AES-NI CPU Crypto: Yes (active)' nad 'Hardware crypto AES-CBC,AES-XTS,AES-GCM,AES-ICM'.

    Regardless, openvpn process is using around 25-33% of CPU giving around 24Mbps on a 50Mbps link. Am I doing something wrong? I expected CPU with AES-NI support should lower CPU usage of openvpn. And I think the speed is only about 24Mbps because of that high CPU usage.

    Thanks in advance.

    P.S. some testing made:

    'openssl speed -evp aes-256-cbc' gives me:

    Doing aes-256-cbc for 3s on 16 size blocks: 22791621 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 64 size blocks: 7994445 aes-256-cbc's in 2.99s
    Doing aes-256-cbc for 3s on 256 size blocks: 2301198 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 1024 size blocks: 607224 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 8192 size blocks: 75187 aes-256-cbc's in 3.03s
    OpenSSL 1.0.2k-freebsd  26 Jan 2017
    built on: date not available
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: clang
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-256-cbc     121555.31k   170993.46k   196368.90k   207265.79k   203194.03k
    

    Weird. With my small box based on the same CPU no problem to reach 120 Mbps OpenVPN 256 connecting to PIA, IPVanish or PureVPN

    https://forum.pfsense.org/index.php?topic=115673.0

    https://it.aliexpress.com/item/New-Braswell-mini-pc-M150S-with-2G-ram-8G-SSD-celeron-N3150-Dual-H-D-M/32533935685.html



  • FWIW, I just tried with my firewall, a HP 5150 computer, with AMD64 3200+ CPU and 4 GB of memory, running 2.3.4-RELEASE-p1.  I ran speedtest.net on my notebook computer, connected to my local LAN.  I verified that the test was going through the VPN and not direct to my firewall.  I have a 60/10 service from my ISP, but generally get mid 70s down and about 11 Mb up.  Running speedtest, I couldn't notice any difference between with or without going through the VPN.  Any difference was within the normal variations on speedtest.net.  During the download the CPU was running at about 70% and 30% on upload.  OpenVPN is configured for AES-256-CBC & 2048 bit static key.



  • Figured I'd post my results from tonight…

    SG-4860 w/ 4 tunnels in a load-balanced gw group spread across 2 WANs. NordVPN. 256k buffer, comp-lzo, fast-io + RDRAND.

    Was able to sustain 250Mbit/s with CPU load between 9-12%
    Pretty happy with this, but will continue striving for higher highs.