Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    AES-NI performance

    Scheduled Pinned Locked Moved Hardware
    83 Posts 23 Posters 24.4k Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • L Offline
      LucaTo
      last edited by

      AMD Athlon™ 5350 APU with Radeon(tm) R3
      4 CPUs: 1 package(s) x 4 core(s)
      AES-NI CPU Crypto: Yes (active)

      openssl speed -evp aes-256-cbc
      Doing aes-256-cbc for 3s on 16 size blocks: 52378144 aes-256-cbc's in 3.00s
      Doing aes-256-cbc for 3s on 64 size blocks: 17296394 aes-256-cbc's in 3.00s
      Doing aes-256-cbc for 3s on 256 size blocks: 5031667 aes-256-cbc's in 3.00s
      Doing aes-256-cbc for 3s on 1024 size blocks: 1307810 aes-256-cbc's in 3.00s
      Doing aes-256-cbc for 3s on 8192 size blocks: 165573 aes-256-cbc's in 3.00s
      OpenSSL 1.0.2k-freebsd  26 Jan 2017
      built on: date not available
      options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
      compiler: clang
      The 'numbers' are in 1000s of bytes per second processed.
      type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
      aes-256-cbc     279350.10k   368989.74k   429368.92k   446399.15k   452124.67k
      
      1 Reply Last reply Reply Quote 0
      • K Offline
        kejianshi
        last edited by

        You know what?  I still don't know what is good or bad or what these results mean to me in the real world:

        openssl speed -evp aes-256-cbc -elapsed
        You have chosen to measure elapsed time instead of user CPU time.
        Doing aes-256-cbc for 3s on 16 size blocks: 50744813 aes-256-cbc's in 3.00s
        Doing aes-256-cbc for 3s on 64 size blocks: 13939575 aes-256-cbc's in 3.00s
        Doing aes-256-cbc for 3s on 256 size blocks: 3914297 aes-256-cbc's in 3.00s
        Doing aes-256-cbc for 3s on 1024 size blocks: 1010884 aes-256-cbc's in 3.00s
        Doing aes-256-cbc for 3s on 8192 size blocks: 127631 aes-256-cbc's in 3.00s
        OpenSSL 1.0.2g  1 Mar 2016
        built on: reproducible build, date unspecified
        options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
        compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,–noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
        The 'numbers' are in 1000s of bytes per second processed.
        type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
        aes-256-cbc    270639.00k  297377.60k  334020.01k  345048.41k  348517.72k

        cpuid | grep -i aes
              AES instruction                        = true
              AES instruction                        = true
              AES instruction                        = true
              AES instruction                        = true
              AES instruction                        = true
              AES instruction                        = true
              AES instruction                        = true
              AES instruction                        = true

        Interestingly enough I ran the same test on a VM running on a i7 Q70 that has no aes acceleration at all and the numbers were about half what the AES accelerated chip did.
        The first test is running on a 8 core AMD 8150 and the second (values are all approx half) ran on a very old wimpy i7 quad core with no AES-NI.

        I would expect the AMD to run 2 or 3 times faster even if it had no AES-NI.  Basically I don't feel these test mean very much and that the only way to gauge performance is an actual throughput test using vpn traffic.

        1 Reply Last reply Reply Quote 0
        • V Offline
          VAMike
          last edited by

          @kejianshi:

          type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
          aes-256-cbc    270639.00k  297377.60k  334020.01k  345048.41k  348517.72k
          […]
          nterestingly enough I ran the same test on a VM running on a i7 Q70 that has no aes acceleration at all and the numbers were about half what the AES accelerated chip did.
          The first test is running on a 8 core AMD 8150 and the second (values are all approx half) ran on a very old wimpy i7 quad core with no AES-NI.

          I would expect the AMD to run 2 or 3 times faster even if it had no AES-NI.  Basically I don't feel these test mean very much and that the only way to gauge performance is an actual throughput test using vpn traffic.

          The number of cores is irrelevant, it's a single threaded test. (It's also worth pointing out that your bulldozer era chip isn't really 8 cores, it's 4 cores that have a multi-thread implementation similar to intel's hyperthreading, and the early releases weren't tuned very well.) I don't have any numbers for the FX-8150, but it's is an old CPU, so your results aren't necessarily unreasonable. I have tested bulldozer-based opterons and I'd have expected your results to be a bit higher based on clockspeed, but I don't have the data points to know how the results should scale on the desktop chips of that line. I would double check that you have the cryptodev checkbox turned off because that will slow things down, but that might be as good as it gets.

          It's important to remember that AES-NI implementations have evolved a lot over the years, so there's a whole lot more to performance than its simple presence. You are correct that the openssl speed results alone aren't going to predict OpenVPN performance, but they are a datapoint that can help predict performance relative to other known systems, and can help establish a ceiling on performance. (E.g., a system that can only perform AES-256-CBC at 30MByte/s is never going to get more than 240Mbit/s of VPN, and much less in practice.)

          1 Reply Last reply Reply Quote 0
          • K Offline
            kejianshi
            last edited by

            The AES test is single threaded?  Is openssl also single threaded during normal use?

            1 Reply Last reply Reply Quote 0
            • V Offline
              VAMike
              last edited by

              @kejianshi:

              The AES test is single threaded?  Is openssl also single threaded during normal use?

              Yes, as is OpenVPN (what you probably mean to be asking about).

              1 Reply Last reply Reply Quote 0
              • K Offline
                kejianshi
                last edited by

                Nope - I know that openvpn is single threaded in that each instance gets a single thread.

                What I'm wondering is do multiple instances of openvpn, which result in multiple openvpn threads each also result in multiple threads of openssl?

                Example.  Do 4 openvpn instances rely on a single instance of openssl working on the crypt or 4 threads?

                1 Reply Last reply Reply Quote 0
                • V Offline
                  VAMike
                  last edited by

                  @kejianshi:

                  Nope - I know that openvpn is single threaded in that each instance gets a single thread.

                  What I'm wondering is do multiple instances of openvpn, which result in multiple openvpn threads each also result in multiple threads of openssl?

                  the "openssl" command line utility is single threaded unless you pass -multi (which produces an output which is pretty meaningless and hard to compare across platforms, just don't do that). The ssl library is single threaded with a process. If you run multiple instances of openvpn you are running multiple independent processes, not threads, and can utilize different cores with each process.

                  You didn't answer whether the cryptodev stuff was disabled in the gui.

                  1 Reply Last reply Reply Quote 0
                  • K Offline
                    kejianshi
                    last edited by

                    Yes - cryptodev is disabled and AES-NI is enabled.  The pfsense VM gets about the same scores at the physical machine also, which is pretty nice to see.

                    I was only in the box to test why its getting random crashes, so I was just playing around and running process to stress the machine to wait for the crash.

                    And it died…  I think the power supply is failing.  Going to have to get that replaced before I can further study the mysteries of AES-NI on the AMD 8150.

                    1 Reply Last reply Reply Quote 0
                    • J Offline
                      jazzl0ver
                      last edited by jazzl0ver

                      Hi all,

                      Version 	2.4.3-RELEASE-p1 (amd64) 
                      CPU Type 	Intel(R) Xeon(R) CPU X5650 @ 2.67GHz 24 CPUs: 2 package(s) x 6 core(s) x 2 hardware threads
                      AES-NI CPU Crypto: Yes (active) 
                      

                      I performed several tests with the following commands:

                      openssl speed -evp aes-128-cbc -elapsed
                      openssl speed -evp aes-128-gcm -elapsed
                      

                      with different Cryptographic Hardware and Kernel PTI settings (+PTI means Kernel PTI is enabled):

                      +------------------------+--------------------------+--------------------------+--------------+--------------+-----------------+-----------------+
                      |                        | AES-NI + Cryptodev + PTI | AES-NI + Cryptodev - PTI | AES-NI + PTI | AES-NI - PTI | Cryptodev + PTI | Cryptodev - PTI |
                      +------------------------+--------------------------+--------------------------+--------------+--------------+-----------------+-----------------+
                      | aes-128-cbc 16 bytes   |                     7189 |                     7794 |       612843 |       612249 |          605915 |          588186 |
                      | aes-128-cbc 8192 bytes |                   568785 |                   591544 |       765053 |       763943 |          763748 |          764321 |
                      | aes-128-gcm 16 bytes   |                   243029 |                   243885 |       238457 |       251084 |          250158 |          229928 |
                      | aes-128-gcm 8192 bytes |                   942211 |                   943865 |       944693 |       943185 |          944543 |          946034 |
                      +------------------------+--------------------------+--------------------------+--------------+--------------+-----------------+-----------------+
                      
                      

                      The router was rebooted after changing each setting.

                      Can anybody explain the very small values in aes-128-cbc 16 bytes test as well as remarkably smaller values in aes-128-cbc 8192 bytes test when both AES-NI and Cryptodev enabled?

                      Thanks in advance!

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S Offline
                        stephenw10 Netgate Administrator
                        last edited by

                        I suggest that when both are enabled the AES-NI module registers itself as a crypto device in the framework for AES-CBC and openssl tries to use it. That results in massive additional switching especially for small packets.
                        Though there is a load of misinformation surrounding this and I have managed to get it wrong before!

                        Perhaps more interesting is that you seem to be seeing a better result with PTI enabled in some cases there. I have no explanation for that.

                        Steve

                        J V 2 Replies Last reply Reply Quote 0
                        • J Offline
                          jazzl0ver @stephenw10
                          last edited by

                          @stephenw10 , thanks for your prompt reply!

                          What is the best Cryptographic Hardware setting then? The router mainly serves as a proxy (haproxy) and openvpn server.
                          And why does the option "AES-NI and Cryptodev" ever exist if it degrades the performance?

                          Regarding better results with PTI enabled - they look more like a measurement error.

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S Offline
                            stephenw10 Netgate Administrator
                            last edited by

                            Cryptodev exists because there are other cryptographic accelerators in use on other hardware. Though almost everything easily available is now relatively ancient and surpassed by general software encryption on modern CPUs.
                            AES-NI exists because some code was not written/compiled to the AES instructions directly and it provides a way to access that.

                            Personally I use AES-NI only.

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • V Offline
                              VAMike @stephenw10
                              last edited by

                              @stephenw10 said in AES-NI performance:

                              Perhaps more interesting is that you seem to be seeing a better result with PTI enabled in some cases there. I have no explanation for that.

                              Run-to-run variation. The affects of PTI should be minimal for this sort of workload. Note that the AES-NI and the cryptodev columns are effectively identical (they're executing the same code) yet they have significant differences in some cases--which are just testing artifacts. Likewise, the AES-GCM tests should be identical in all three columns PTI/non-PTI, but there's noise between runs and not enough samples to average. But mostly the only significant result is the performance of aes-ni cbc+cryptodev--don't do that!

                              @jazzl0ver said in AES-NI performance:

                              And why does the option "AES-NI and Cryptodev" ever exist if it degrades the performance?

                              Bad UI design, basically. And a lot of really misinformed people running tests which confused a lot of other people.

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S Offline
                                stephenw10 Netgate Administrator
                                last edited by

                                Re-reading this thread is.... painful.

                                Steve

                                1 Reply Last reply Reply Quote 0
                                • J Offline
                                  jazzl0ver
                                  last edited by

                                  Just to confirm: if I leave AES-NI only in Cryptographic Hardware, won't this affect OpenVPN performance which Hardware Crypto setting is BSD cryptodev engine? Or I'll have to change it to No Hardware Crypto Acceleration (since it will still utilize internal OpenSSL's AES-NI code)?

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    It shouldn't make any difference since the cryptodev module will not be loaded. I would set that no No Hardware Crypto Acceleration there anyway though.

                                    Steve

                                    J 1 Reply Last reply Reply Quote 0
                                    • J Offline
                                      jazzl0ver @stephenw10
                                      last edited by

                                      Thanks @stephenw10 ! I appreciate your help!

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.