AMD AES-NI performance issues? faster when off

  • Whenever I enable AES-NI acceleration on AMD platforms (both my ATHLON 5150  and A4-5300), vpn performance drops a lot


    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128-cbc      3410.06k    13240.97k    48273.89k  145732.61k  356466.69k


    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128-cbc    265019.69k  364898.52k  453860.48k  480797.35k  487650.65k

    Anyone else having issues with AES-NI on AMD platforms ?

    Im running pfsense directly installed on the computer (no virtualization.  But this also happens on virtualized environments)


  • afaik aes-ni will only help with with aes-gcm

  • I read on this forum it only works with cbc.

    anyway, I tried gcm and the results are equal both with aes-ni on and off

  • yes that explains the huge performance drop when the options is enabled.

    thank you!

  • The AES-NI checkbox in the GUI enables AES-NI for AES 128/192/256 CBC via cryptodev. That means that for each block of data to encrypt, the openssl library will issue an ioctl to send that block to the kernel, suffering a context switch penalty. Since the computation being performed is exactly the same as what openssl would do without cryptodev (and in that case, without the context switch) it is necessarily slower; there is no advantage at all in enabling AES-NI via cryptodev. You do not see a penalty for GCM modes because those are not implemented in cryptodev and so openssl continues to use its internal routines.

    So why does the AES-NI kernel module exist at all? If you are using ipsec, which does all of its encryption in the kernel, then you need the AES-NI kernel module to let the ipsec module to use AES-NI–and in that case it's a performance gain because everything is happening in-kernel. Ideally, pfsense would enable a configuration in which you can load aesni.ko for ipsec without loading cryptodev, so you can get the benefits without the drawbacks.

    So when would you ever want cryptodev? The /dev/crypto interface is only worth using with external crypto processors, like the old via padlock or the hifn cards (though you're generally much better off just throwing out such hardware and buying something new if you care at all about crypto performance; the crypto accelerator on the old alix boards, for example, was about as fast as a new raspberry pi or an APU1 without hardware crypto, and an order of magnitude slower than an APU2). In theory it might also have a benefit for quick assist, but I think that's implemented in openssl in a way that avoids using /dev/crypto. There's been speculation over the years that cryptodev might help improve cpu utilization, but I haven't seen results on modern hardware where any speculative gain outweighs the performance penalty of the context switching overhead.