AES-NI actually slowing things DOWN ?!



  • Hi,

    I'm trying to understand hardware acceleration for VPN encryption and have observed the following on my 2.2.1-RELEASE install:

    Without aesni.ko loaded:

    $ /usr/bin/openssl speed -evp aes-256-cbc -elapsed -engine cryptodev
    engine "cryptodev" set.
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-256-cbc for 3s on 16 size blocks: 94426350 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 64 size blocks: 25233829 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 256 size blocks: 6420564 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 1024 size blocks: 1612343 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 8192 size blocks: 201323 aes-256-cbc's in 3.00s
    OpenSSL 1.0.1l-freebsd 15 Jan 2015
    built on: date not available
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: clang
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-256-cbc     503607.20k   538321.69k   547888.13k   550346.41k   549746.01k
    

    But if I now load the kernel module (kldload aesni), I get 50x slower performance!!!

    With aesni.ko loaded:

    $ /usr/bin/openssl speed -evp aes-256-cbc -elapsed -engine cryptodev
    engine "cryptodev" set.
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-256-cbc for 3s on 16 size blocks: 1876612 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 64 size blocks: 1797552 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 256 size blocks: 1452274 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 1024 size blocks: 828219 aes-256-cbc's in 3.00s
    Doing aes-256-cbc for 3s on 8192 size blocks: 165276 aes-256-cbc's in 3.01s
    OpenSSL 1.0.1l-freebsd 15 Jan 2015
    built on: date not available
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: clang
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-256-cbc      10008.60k    38347.78k   123927.38k   282698.75k   450141.42k
    

    Why is that?! Thanks for any pointers!

    I'm running a HP Proliant DL380 G6, with this processor:

    $ sysctl hw.model hw.machine hw.ncpu
    hw.model: Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
    hw.machine: amd64
    hw.ncpu: 12
    

  • Banned

    Wrong test. You cannot use engine for this. Random link: https://bbs.archlinux.org/viewtopic.php?id=173876



  • Thanks for that!

    But I get more or less identical results just using openssl speed -elapsed -evp aes-256-cbc, so without the "engine" setting.

    But generally, on my system, should I select AES-NI in the pfSense System > Advanced > Miscellaneous settings for Crypto Hardware Acceleration or not? And do I need to also set it (or not set it) on the OpenVPN server config?


  • Banned

    Please, re-read the linked discussion. You simply cannot test this like you are trying to do. AES-NI is used by default by openssl when available, the command you tried is NOT the way to test without AES-NI. (And yeah, you obviously should configure this correctly in the GUI.)



  • I see, so you are saying that while openssl (as used by me from the command line) uses AES-NI automagically and without loading additional kernel modules, pfSense / OpenVPN itself will not, i.e. I should make those settings in the GUI regardless. Sorry, that wasn't entirely clear to me.



  • But generally, on my system, should I select AES-NI in the pfSense System > Advanced > Miscellaneous settings for Crypto Hardware Acceleration or not? And do I need to also set it (or not set it) on the OpenVPN server config?

    Testing is really one way to find something out, but in real life if your cpu is comming with the AES-NI
    instruction set inside I would activating it even for all things I am able to do so in the pfSense WebGui!

    Perhaps not now but under stronger usage it will be then used by the pfSense and you gets more
    throughout or whatever benefits from.


  • Netgate Administrator

    OpenVPN will use the AES-NI module if it's loaded regardless of the setting in the OpenVPN config. If the module is not loaded then OpenSSL uses it's own internal code.
    My own crude testing showed that OpenVPN is currently faster without the AES-NI module loaded. It was a pretty crude test though.

    Steve



  • It looks like AES-GCM is accelerated but AES-CBC is not.

    WITH AES-NI: aes-256-cbc       3504.26k    13830.31k    45897.56k   109815.19k   182678.87k
    WITHOUT:     aes-256-cbc       3572.98k    13953.79k    46092.03k   110340.10k   181253.85k
    
    WITH AES-NI: aes-256-gcm      74446.93k   137183.21k   173182.54k   187543.55k   189565.61k
    WITHOUT:     aes-256-gcm      17320.93k    19921.12k    49907.36k    54386.01k    55413.42k
    

    NOTE: I disabled AES-NI by prefixing the openssl commands with the following:

    env OPENSSL_ia32cap="~0x200000200000000"
    

    NOTE: This testing done on an RCC-VE 2440 with an Intel Atom C2358 processor and pfSense 2.2.1.

    UPDATE: Okay, it's about to get weird. Unloading the AES-NI module (`kldunload aesni') dramatically changes the scenario for AES-CBC (AES-GCM is unchanged):

    WITH AES-NI*: aes-256-cbc     123681.24k   177682.26k   206451.11k   214186.33k   217617.75k
    WITHOUT*:     aes-256-cbc      19431.43k    22690.37k    22996.82k    23134.55k    23251.63k
    
    WITH AES-NI*: aes-256-gcm      77262.42k   137245.82k   173551.53k   187439.10k   190726.14k
    WITHOUT*:     aes-256-gcm      15771.72k    19914.07k    49912.92k    54334.10k    55410.69k
    

    It looks like this issue has been discussed at length on this forum in the past. From that thread, it seems as though the AES-NI acceleration within OpenSSL is better than that of the kernel module, and when the kernel module is loaded, OpenSSL will use that instead. Hopefully this is on their roadmap to address.


  • Netgate Administrator

    Unfortunately you don't see anything like that speed improvement in OpenVPN when you unload the module.  ;)

    Steve



  • OpenVPN uses OpenSSL, so I wonder why? Maybe it doesn't use the EVP library?


  • Netgate Administrator

    The encryption alone is not everything. There is some improvement.

    Steve



  • @mwp821:

    It looks like AES-GCM is accelerated but AES-CBC is not.

    AES-CBC is accelerated, (as you've shown by unloading the kernel module) but AES-GCM is faster, for a number of reasons.

    One of these happens to be that calls into the kernel from a userland process are still expensive.

    Someday OpenVPN 2.4 will ship, and pfSense will implement it, and you'll have a better time with AES-GCM: https://community.openvpn.net/openvpn/ticket/301

    But no matter how fast we make the crypto go, at the end of the day, OpenVPN is still going to be hampered by the fact that it's implemented
    in userspace, and the TUN/TAP interface.  tl;dr: Context switches suck.

    There are plans to address even this, but now is not the time or place to do so.



  • @gonzopancho:

    But no matter how fast we make the crypto go, at the end of the day, OpenVPN is still going to be hampered by the fact that it's implemented in userspace, and the TUN/TAP interface.

    Thanks Gonzo. Can I call you Gonzo? :-) I appreciate all the information you've shared on this forum; it's been a fun learning process since getting my 2440 just last week! I was running pfSense on an Atom D525 before then but I was definitely not taking advantage of all it has to offer.

    So I take it I shouldn't use OpenVPN when I'm setting up my VPN this weekend. L2TP/IPSec with AES-GCM it is, then.

    Mike


Log in to reply