Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    AES-NI performance

    Scheduled Pinned Locked Moved Hardware
    83 Posts 23 Posters 24.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      aesguy
      last edited by

      When they stop laughing at the idea of a raspberry pi getting 7GByte/s AES 256 CBC (the hardware is capable of something around 40Mbyte/s) they'll tell you what I explained above.

      Who ever said anything about 7GB/s?!  We're comparing the relative performance of different AES-NI hardware implementations.

      1 Reply Last reply Reply Quote 0
      • V
        VAMike
        last edited by

        @aesguy:

        Who ever said anything about 7GB/s?!  We're comparing the relative performance of different AES-NI hardware implementations.

        No, you aren't. You're comparing their context switching rates, which has nothing to do with their crypto processing rates. You can't rationalize this into something positive. And the icing on the cake is that when people post the right numbers you turn them down because they don't fit your misconceptions.

        Edit to add: actually, it's worse than that–given two cpus that are otherwise equal, this methodology will actually penalize the one with the more efficient crypto implementation (because it will spend relatively less time doing crypto in kernel space where the time isn't counted and more time in user space doing context switches, which are the only time counted).

        1 Reply Last reply Reply Quote 0
        • A
          aesguy
          last edited by

          VAMike, the keyword is "relative" - and in this context refers to comparing results from different hardware.

          1 Reply Last reply Reply Quote 0
          • V
            VAMike
            last edited by

            @aesguy:

            VAMike, the keyword is "relative" - and in this context refers to comparing results from different hardware.

            Again, the words put together don't make any sense. Why would you reject solid data in favor of bogus data, it's not like it's any harder to gather. If you collected the real numbers, then you'd have an actual "relative" comparison rather than an "irrelevant" comparison. Is it really that difficult to just admit you were wrong and move on?

            1 Reply Last reply Reply Quote 0
            • A
              aesguy
              last edited by

              Again, the words put together don't make any sense.

              Of course they don't make any sense to you - because you're either not reading or not trying to understand what I'm saying.

              1 Reply Last reply Reply Quote 0
              • ?
                Guest
                last edited by

                @Engineer

                SuperMicro Board: X11SBA-LN4F with Intel N3700.

                Why you are not using the IPSec VPN together with AES-GCM with that CPU?

                @aesguy

                …a big difference and hence why AES-NI offers a big gain in performance - if you can harness it properly.

                Another member of this forum was posting in on reddit that he got a real throughput of nearly ~500 MBit/s
                together with IPSec VPN over AES-GCM, based on a pfSense SG-4860 on a 1 GBit/s Internet connection.

                @VAMike
                I consider to the circumstance that a real life VPN connection is better then all the testing runs on a bare
                hardware machine. What I can get out from a device is not able to test on that device alone and only over
                an OpenSSL test, OpenSSL is multi core using and the OpenVPN part isn´t tight now using that.

                Also talking over crypto cards such the soekris vpn14x1 is today a little bit outdated, with an viewing eyes
                on the todays Internet speed. But in the past getting instead of ~14 MBit/s without it and then ~42 MBit/s
                using it (vpn1411) was really impressive for me on a net5501 or an Alix Board. It was nearly the 3x speed!

                If I am using site-2-side VPN I only use IPSec with AES-GCM.

                1 Reply Last reply Reply Quote 0
                • V
                  VAMike
                  last edited by

                  @BlueKobold:

                  I consider to the circumstance that a real life VPN connection is better then all the testing runs on a bare
                  hardware machine. What I can get out from a device is not able to test on that device alone and only over
                  an OpenSSL test

                  Certainly it makes sense to optimize for the actual application. That said, running the openssl speed routine will give a ceiling for your performance. If you need to get N and openssl speed (the real results, not the meaningless /dev/crypto ones without -elapsed) says your hardware delivers N/2, no amount of tweaking is going to get you the results you need. The results (again, the real ones) are also useful to compare hardware: you may find (this is really a thing) that at a given price point three different systems have order of magnitude differences in their crypto processing rate–that's valuable information that's definitely worth knowing if crypto processing is a factor in choosing a solution. Actual VPN throughput would be a better basis for comparison, but that's much more configuration dependent and hard to communicate as a single repeatable value that you can ask someone for.

                  OpenSSL is multi core using and the OpenVPN part isn´t tight now using that.

                  The openssl speed routine is single threaded. If you add the -multi N parameter with a new enough version it will launch N single threaded processes and combine the results.

                  Also talking over crypto cards such the soekris vpn14x1 is today a little bit outdated, with an viewing eyes
                  on the todays Internet speed. But in the past getting instead of ~14 MBit/s without it and then ~42 MBit/s
                  using it (vpn1411) was really impressive for me on a net5501 or an Alix Board. It was nearly the 3x speed!
                  If I am using site-2-side VPN I only use IPSec with AES-GCM.

                  If 42Mbit/s is acceptable for you, then you're golden. Almost anything modern will run rings around that, though, without the vpn card. You're right that AES GCM is generally a winner. People sometimes compare it to AES CBC and get disappointed, but the proper comparison is to AES CBC + SHA HMAC (because GCM includes MAC) and that changes things, especially on the lower end where GCM isn't optimized as well as it is on the better architectures so the comparison between GCM and CBC without HMAC looks worse:

                  (GX-412TC / APU2 @1GHZ)
                  type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                  aes-128-cbc    129634.49k  180885.10k  218398.55k  230835.20k  233439.23k
                  aes-128-cbc-hmac-sha1    37836.21k    57600.58k    64128.26k    76066.47k    81273.41k
                  aes-128-gcm      66775.78k  171264.79k  256270.08k  293397.23k  304955.39k

                  (silvermont @2.4GHz)
                  type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                  aes-128-cbc    212028.47k  324202.84k  387726.68k  407513.09k  412789.42k
                  aes-128-cbc-hmac-sha1    82160.17k  127661.48k  144141.78k  152486.91k  155320.32k
                  aes-128-gcm    127695.16k  218440.90k  280572.06k  304679.94k  310804.48k

                  (sandy bridge @2.5GHz)
                  type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                  aes-128-cbc    466082.23k  481016.45k  527165.95k  575894.53k  572252.16k
                  aes-128-cbc-hmac-sha1  177098.15k  224449.86k  325795.96k  379470.51k  400328.52k
                  aes-128-gcm    237922.04k  572122.82k  761623.45k  835320.15k  919997.10k

                  (haswell @2.6GHz)
                  type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                  aes-128-cbc    617610.13k  704447.23k  724869.63k  723632.81k  718821.97k
                  aes-128-cbc-hmac-sha1  214205.59k  341683.78k  514962.58k  617723.90k  656337.58k
                  aes-128-gcm    422036.43k  1069918.31k  1470884.44k  1609671.68k  1635520.47k

                  (skylake @3.7GHz)
                  type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes  16384 bytes
                  aes-128-cbc    919971.09k  1366752.68k  1394404.61k  1400528.21k  1400209.41k  1400182.10k
                  aes-128-cbc-hmac-sha1  335511.72k  513717.82k  675188.48k  804944.90k  874438.66k  882191.02k
                  aes-128-gcm    643566.37k  1481056.34k  2880229.72k  4531479.55k  5638567.25k  5748069.72k

                  You can see the silvermont GCM is actually slower than CBC, but ~2x CBC+HMAC. And for skylake the difference between CBC & GCM is huge and CBC+HMAC is just blown away. You can also see where the relatively inefficient PCLMULQDQ implementation hurts silvermont, to the point that an APU2 running at half the speed is actually competitive (real world VPN performance won't be nearly as close because the overall platform is much slower, and the small block results highlight the difference in a case where the crypto instructions have less room to run). And you can see the really impressive improvements intel has made over the past few years from sandy bridge to haswell to skylake. N.b., I didn't make any attempt to quiesce the systems or do real multi-trial benchmarking, but the numbers should be within about 20% or so across platforms and pretty consistent within a platform–certainly good enough for the discussion. It's also worth noting those are fairly recent versions of openssl, and older versions don't implement the CBC+HMAC EVP mode (so don't try to compare apples and oranges).

                  1 Reply Last reply Reply Quote 0
                  • M
                    MoonKnight
                    last edited by

                    Here is my test. pfsense spec is in the signature :)

                    [2.3.2-RELEASE][root@pfsense.local]/root: openssl speed -evp aes-256-cbc
                    Doing aes-256-cbc for 3s on 16 size blocks: 1697468 aes-256-cbc's in 0.23s
                    Doing aes-256-cbc for 3s on 64 size blocks: 1735785 aes-256-cbc's in 0.27s
                    Doing aes-256-cbc for 3s on 256 size blocks: 1514519 aes-256-cbc's in 0.28s
                    Doing aes-256-cbc for 3s on 1024 size blocks: 1025506 aes-256-cbc's in 0.22s
                    Doing aes-256-cbc for 3s on 8192 size blocks: 253309 aes-256-cbc's in 0.05s
                    OpenSSL 1.0.1s-freebsd  1 Mar 2016
                    built on: date not available
                    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
                    compiler: clang
                    The 'numbers' are in 1000s of bytes per second processed.
                    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                    aes-256-cbc    115880.48k  418222.08k  1378548.85k  4800540.09k 37944819.71k

                    --- 24.11 ---
                    Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz
                    Kingston DDR4 2666MHz 16GB ECC
                    2 x HyperX Fury SSD 120GB (ZFS-mirror)
                    2 x Intel i210 (ports)
                    4 x Intel i350 (ports)

                    1 Reply Last reply Reply Quote 0
                    • V
                      VAMike
                      last edited by

                      @CiscoX:

                      Here is my test. pfsense spec is in the signature :)

                      [2.3.2-RELEASE][root@pfsense.local]/root: openssl speed -evp aes-256-cbc
                      Doing aes-256-cbc for 3s on 16 size blocks: 1697468 aes-256-cbc's in 0.23s
                      Doing aes-256-cbc for 3s on 64 size blocks: 1735785 aes-256-cbc's in 0.27s
                      Doing aes-256-cbc for 3s on 256 size blocks: 1514519 aes-256-cbc's in 0.28s
                      Doing aes-256-cbc for 3s on 1024 size blocks: 1025506 aes-256-cbc's in 0.22s
                      Doing aes-256-cbc for 3s on 8192 size blocks: 253309 aes-256-cbc's in 0.05s
                      OpenSSL 1.0.1s-freebsd  1 Mar 2016
                      built on: date not available
                      options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
                      compiler: clang
                      The 'numbers' are in 1000s of bytes per second processed.
                      type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                      aes-256-cbc    115880.48k  418222.08k  1378548.85k  4800540.09k 37944819.71k

                      So the real number is 691702, which is actually a bit low for a skylake @2.7GHz.

                      1 Reply Last reply Reply Quote 0
                      • A
                        aesguy
                        last edited by

                        Thanks CiscoX.  I've added your results to the list to get a sense compared to others:

                        170926276.61k	unknown (China)	gen 5 i5	Koenig	
                        150749577.22k	Microserver Gen 8	ESXi 6.0	biggsy	
                        91090845.70k	Zotac ZBOX ID92	Core i5 4570T	highwire	
                        48454172.67k	SuperMicro Board: X11SBA-LN4F	Intel N3700	Engineer	
                        48351936.51k	SuperMicro 2758	Intel(R) Atom(TM) CPU C2758 @ 2.40GHz 8 CPUs	AR15USR	
                        42008576.00k	Gigabyte GA-N3150N-D3V board	Celeron N3150 with AES-NI		https://forum.pfsense.org/index.php?topic=108119.0
                        37944819.71k		Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz (Skylake)	CiscoX	
                        32321306.62k	SuperMicro 2758	Intel(R) Atom(TM) CPU C2758 @ 2.40GHz 8 CPUs	AR15USR	
                        32267479.72k	Supermicro	Intel N3700	Engineer	
                        29080158.21k	hp microserver gen 8	Xeon 1265Lv2	iorx	
                        27986842.97k	Gigabyte GA-N3150N-D3V	Celeron N3150 with AES-NI		https://forum.pfsense.org/index.php?topic=105114.msg601520#msg601520
                        24435715.51k	unknown (China)	gen 5 i5	Koenig	
                        24345837.57k	Lanner FW-7525D	Quad-core Atom C2558 @ 2.40GHz	RMB	
                        24332468.22k	Netgate SG-4860  	Intel(R) Atom(TM) CPU C2558 @ 2.40GHz 4 CPUs	bytesizedalex	
                        21142437.89k	Partaker B5	Intel N3150	albatorsk	https://forum.pfsense.org/index.php?topic=75415.msg609564#msg609564
                        19462619.14k	SuperMicro 2758	Intel(R) Atom(TM) CPU C2758 @ 2.40GHz 8 CPUs	AR15USR	
                        18390712.32k	AM1	Athlon 5370	W4RH34D	
                        14241549.52k	pfSense SG-2440	Dual-core Atom C2358 @ 1.74GHz	RMB	
                        7123763.20k	Raspberry Pi 3	ARMv7l	aesguy	
                        405686.95k	Mini-ITX Build	Intel i7-4510U + 2x Intel 82574 + 2x Intel i350 		https://forum.pfsense.org/index.php?topic=115627.msg646395#msg646395
                        230708.57k	ci323 nano u	Celeron N3150 with AES-NI w/ -engine cryptodev		https://forum.pfsense.org/index.php?topic=115673.msg656602#msg656602
                        217617.75k	RCC-VE 2440	Intel Atom C2358		https://forum.pfsense.org/index.php?topic=91974.0
                        124788.74k	ALIX.APU2B4/APU2C4	1 GHz Quad Core AMD GX-412TC		http://wiki.ipfire.org/en/hardware/pcengines/apu2b4
                        34204.33k	ALIX.APU1C/APU1D	1 GHz Dual Core AMD G-T40E		http://wiki.ipfire.org/en/hardware/pcengines/apu1c
                        
                        1 Reply Last reply Reply Quote 0
                        • H
                          highwire
                          last edited by

                          Revisiting this thread, many have noted that results are all over the map.  Here are some more results from my Zotac ZBOX ID92.  This is the exact same command being run.  I noticed that the openssl command is single threaded (edit: somebody else mentioned that) as it only loads one of the available four CPUs.  The highest 8192 bytes result is 182,250,987k.  The lowest is 18,290,730k.  I don't know what to make of these results.

                          The 'numbers' are in 1000s of bytes per second processed.
                          type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                          aes-256-cbc      56825.17k  333369.17k  1653957.15k  5188806.84k 45544898.56k

                          type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                          aes-256-cbc    120866.00k  307110.87k  1461971.31k  4371936.81k 182250897.41k

                          type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                          aes-256-cbc      50362.80k  383251.87k  1430783.82k  4384267.66k 90994900.99k

                          type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                          aes-256-cbc    102366.01k  357036.89k  1305581.46k  4944793.78k 60739463.85k

                          type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                          aes-256-cbc      79577.36k  400752.86k  1147821.89k  4919229.04k 30295283.03k

                          type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                          aes-256-cbc      73808.26k  317724.33k  1134589.02k  3696661.67k 18290730.60k

                          type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                          aes-256-cbc      81048.09k  295471.58k  1208339.18k  8731986.39k 91067777.02k

                          1 Reply Last reply Reply Quote 0
                          • H
                            highwire
                            last edited by

                            I tried it with the 'multi' option to load up all four CPUs (2 physical, 2 SMT).  Here are the results of the first try.  Can anyone decypher what this means?

                            [2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -multi 4 -evp aes-256-cbc
                            Forked child 0
                            Forked child 1
                            Forked child 2
                            +DT:aes-256-cbc:3:16
                            Forked child 3
                            +DT:aes-256-cbc:3:16
                            +DT:aes-256-cbc:3:16
                            +DT:aes-256-cbc:3:16
                            +R:1162717:aes-256-cbc:3.000000
                            +R:1169710:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:64
                            +DT:aes-256-cbc:3:64
                            +R:1168829:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:64
                            +R:1170790:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:64
                            +R:1334722:aes-256-cbc:3.000000
                            +R:1334881:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:256
                            +DT:aes-256-cbc:3:256
                            +R:1326770:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:256
                            +R:1335193:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:256
                            +R:1135822:aes-256-cbc:3.000000
                            +R:1138869:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:1024
                            +DT:aes-256-cbc:3:1024
                            +R:1129522:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:1024
                            +R:1138978:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:1024
                            +R:727690:aes-256-cbc:3.000000
                            +R:731525:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:8192
                            +DT:aes-256-cbc:3:8192
                            +R:726865:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:8192
                            +R:728322:aes-256-cbc:3.000000
                            +DT:aes-256-cbc:3:8192
                            +R:157520:aes-256-cbc:3.000000
                            +R:158319:aes-256-cbc:3.000000
                            Got: +H:16:64:256:1024:8192 from 0
                            Got: +F:22:aes-256-cbc:6201157.33:28474069.33:97183488.00:248384853.33:430134613.33 from 0
                            Got: +H:16:64:256:1024:8192 from 1
                            Got: +F:22:aes-256-cbc:6238453.33:28477461.33:96923477.33:249693866.67:432316416.00 from 1
                            +R:157175:aes-256-cbc:3.000000
                            Got: +H:16:64:256:1024:8192 from 2
                            Got: +F:22:aes-256-cbc:6233754.67:28304426.67:96385877.33:248103253.33:429192533.33 from 2
                            +R:158173:aes-256-cbc:3.000000
                            Got: +H:16:64:256:1024:8192 from 3
                            Got: +F:22:aes-256-cbc:6244213.33:28484117.33:97192789.33:248600576.00:431917738.67 from 3
                            OpenSSL 1.0.1s-freebsd  1 Mar 2016
                            built on: date not available
                            options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
                            compiler: clang
                            evp              24917.58k  113740.07k  387685.63k  994782.55k  1723561.30k

                            1 Reply Last reply Reply Quote 0
                            • V
                              VAMike
                              last edited by

                              @highwire:

                              Revisiting this thread, many have noted that results are all over the map.  Here are some more results from my Zotac ZBOX ID92.  This is the exact same command being run.  I noticed that the openssl command is single threaded (edit: somebody else mentioned that) as it only loads one of the available four CPUs.  The highest 8192 bytes result is 182,250,987k.  The lowest is 18,290,730k.  I don't know what to make of these results

                              Well, if you'd read what I wrote above you'd understand completely: the posted results are useless noise because people are using cryptodev in their testing without the -elapsed flag and aren't actually measuring anything to do with crypto performance. It's immediately obvious for anyone familiar with the openssl implementation just by looking. Your system isn't capable of transferring 182GByte/s, full stop. So any result showing that it is can be immediately discounted. Run again with the -elapsed flag and you'll see consistent number which actually reflect what you're trying to see. Or turn off aesni.ko, it's probably only slowing you down anyway.

                              1 Reply Last reply Reply Quote 0
                              • V
                                VAMike
                                last edited by

                                @highwire:

                                I tried it with the 'multi' option to load up all four CPUs (2 physical, 2 SMT).  Here are the results of the first try.  Can anyone decypher what this means?

                                [2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -multi 4 -evp aes-256-cbc

                                -multi forces -elapsed, so you're actually seeing a real number which is shockingly low compared to the artificial numbers that people have been drooling over. run "kldunload aesni.ko" to kill the cryptodev implementation and rerun, you should see an order of magnitude improvement for smaller block sizes and a smaller but still substantial improvement in large blocks.

                                1 Reply Last reply Reply Quote 0
                                • E
                                  Engineer
                                  last edited by

                                  @VAMike:

                                  @highwire:

                                  I tried it with the 'multi' option to load up all four CPUs (2 physical, 2 SMT).  Here are the results of the first try.  Can anyone decypher what this means?

                                  [2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -multi 4 -evp aes-256-cbc

                                  -multi forces -elapsed, so you're actually seeing a real number which is shockingly low compared to the artificial numbers that people have been drooling over. run "kldunload aesni.ko" to kill the cryptodev implementation and rerun, you should see an order of magnitude improvement for smaller block sizes and a smaller but still substantial improvement in large blocks.

                                  That makes sense.  I tried the multi 4 and multi 2 options and it pretty much scaled perfectly with my original one core - elapsed score.

                                  1 Reply Last reply Reply Quote 0
                                  • H
                                    highwire
                                    last edited by

                                    This makes more sense.

                                    2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -elapsed -evp aes-256-cbc
                                    You have chosen to measure elapsed time instead of user CPU time.
                                    Doing aes-256-cbc for 3s on 16 size blocks: 1826319 aes-256-cbc's in 3.00s
                                    Doing aes-256-cbc for 3s on 64 size blocks: 1872707 aes-256-cbc's in 3.00s
                                    Doing aes-256-cbc for 3s on 256 size blocks: 1517032 aes-256-cbc's in 3.01s
                                    Doing aes-256-cbc for 3s on 1024 size blocks: 866718 aes-256-cbc's in 3.00s
                                    Doing aes-256-cbc for 3s on 8192 size blocks: 173745 aes-256-cbc's in 3.00s
                                    OpenSSL 1.0.1s-freebsd  1 Mar 2016
                                    built on: date not available
                                    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
                                    compiler: clang
                                    The 'numbers' are in 1000s of bytes per second processed.
                                    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                                    aes-256-cbc      9740.37k    39951.08k  129117.15k  295839.74k  474439.68k

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      Chucko
                                      last edited by

                                      For reference, Atom D525 w/ hyperthreading disabled:

                                      
                                      openssl speed -evp aes-256-cbc
                                      Doing aes-256-cbc for 3s on 16 size blocks: 3336818 aes-256-cbc's in 2.98s
                                      Doing aes-256-cbc for 3s on 64 size blocks: 913146 aes-256-cbc's in 2.98s
                                      Doing aes-256-cbc for 3s on 256 size blocks: 233424 aes-256-cbc's in 2.98s
                                      Doing aes-256-cbc for 3s on 1024 size blocks: 58628 aes-256-cbc's in 2.98s
                                      Doing aes-256-cbc for 3s on 8192 size blocks: 7337 aes-256-cbc's in 2.98s
                                      OpenSSL 1.0.1s-freebsd  1 Mar 2016
                                      built on: date not available
                                      options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
                                      compiler: clang
                                      The 'numbers' are in 1000s of bytes per second processed.
                                      type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
                                      aes-256-cbc      17889.54k    19582.44k    20023.14k    20116.46k    20139.80k
                                      
                                      
                                      1 Reply Last reply Reply Quote 0
                                      • C
                                        Chucko
                                        last edited by

                                        Adding -elapsed to the above command only changed results by ~2%.

                                        Here's the multi-threaded result:

                                        
                                        openssl speed -multi 2 -evp aes-256-cbc
                                        Forked child 0
                                        Forked child 1
                                        +DT:aes-256-cbc:3:16
                                        +DT:aes-256-cbc:3:16
                                        +R:3311914:aes-256-cbc:3.000000
                                        +DT:aes-256-cbc:3:64
                                        +R:3377542:aes-256-cbc:3.000000
                                        +DT:aes-256-cbc:3:64
                                        +R:886867:aes-256-cbc:3.000000
                                        +DT:aes-256-cbc:3:256
                                        +R:913678:aes-256-cbc:3.000000
                                        +DT:aes-256-cbc:3:256
                                        +R:226698:aes-256-cbc:3.000000
                                        +DT:aes-256-cbc:3:1024
                                        +R:233562:aes-256-cbc:3.000000
                                        +DT:aes-256-cbc:3:1024
                                        +R:57329:aes-256-cbc:3.000000
                                        +DT:aes-256-cbc:3:8192
                                        +R:58852:aes-256-cbc:3.000000
                                        +DT:aes-256-cbc:3:8192
                                        +R:7285:aes-256-cbc:3.000000
                                        +R:7406:aes-256-cbc:3.000000
                                        Got: +H:16:64:256:1024:8192 from 0
                                        Got: +F:22:aes-256-cbc:17663541.33:18919829.33:19344896.00:19568298.67:19892906.67 from 0
                                        Got: +H:16:64:256:1024:8192 from 1
                                        Got: +F:22:aes-256-cbc:18013557.33:19491797.33:19930624.00:20088149.33:20223317.33 from 1
                                        OpenSSL 1.0.1s-freebsd  1 Mar 2016
                                        built on: date not available
                                        options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
                                        compiler: clang
                                        evp              35677.10k    38411.63k    39275.52k    39656.45k    40116.22k
                                        
                                        
                                        1 Reply Last reply Reply Quote 0
                                        • C
                                          Chucko
                                          last edited by

                                          And for more perspective, my NAS4Free box running FreeBSD 11.0-RELEASE. This is a Core 2 Quad Q9550 @ 2.83 GHz.

                                          
                                          nas4free ~/ chucko~$ openssl speed -elapsed -evp aes-256-cbc
                                          You have chosen to measure elapsed time instead of user CPU time.
                                          Doing aes-256-cbc for 3s on 16 size blocks: 28607257 aes-256-cbc's in 3.01s
                                          Doing aes-256-cbc for 3s on 64 size blocks: 8038838 aes-256-cbc's in 3.00s
                                          Doing aes-256-cbc for 3s on 256 size blocks: 2078627 aes-256-cbc's in 3.00s
                                          Doing aes-256-cbc for 3s on 1024 size blocks: 521836 aes-256-cbc's in 3.00s
                                          Doing aes-256-cbc for 3s on 8192 size blocks: 65551 aes-256-cbc's in 3.00s
                                          OpenSSL 1.0.2j-freebsd  26 Sep 2016
                                          built on: date not available
                                          options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
                                          compiler: clang
                                          The 'numbers' are in 1000s of bytes per second processed.
                                          type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
                                          aes-256-cbc     152175.75k   171495.21k   177376.17k   178120.02k   178997.93k
                                          nas4free ~/ chucko~$ openssl speed -multi 4 -evp aes-256-cbc
                                          Forked child 0
                                          Forked child 1
                                          Forked child 2
                                          +DT:aes-256-cbc:3:16
                                          +DT:aes-256-cbc:3:16
                                          +DT:aes-256-cbc:3:16
                                          +DT:aes-256-cbc:3:16
                                          Forked child 3
                                          +R:28661984:aes-256-cbc:3.000000
                                          +R:28561131:aes-256-cbc:3.007813
                                          +R:28616238:aes-256-cbc:3.000000
                                          +DT:aes-256-cbc:3:64
                                          +DT:aes-256-cbc:3:64
                                          +DT:aes-256-cbc:3:64
                                          +R:28653210:aes-256-cbc:3.007813
                                          +DT:aes-256-cbc:3:64
                                          +R:8221475:aes-256-cbc:3.054688
                                          +R:8216875:aes-256-cbc:3.054688
                                          +R:8222598:aes-256-cbc:3.054688
                                          +R:8199168:aes-256-cbc:3.054688
                                          +DT:aes-256-cbc:3:256
                                          +DT:aes-256-cbc:3:256
                                          +DT:aes-256-cbc:3:256
                                          +DT:aes-256-cbc:3:256
                                          +R:2088535:aes-256-cbc:3.000000
                                          +R:2088077:aes-256-cbc:3.000000
                                          +R:2081254:aes-256-cbc:3.000000
                                          +DT:aes-256-cbc:3:1024
                                          +R:2087901:aes-256-cbc:3.000000
                                          +DT:aes-256-cbc:3:1024
                                          +DT:aes-256-cbc:3:1024
                                          +DT:aes-256-cbc:3:1024
                                          +R:526763:aes-256-cbc:3.007813
                                          +R:526629:aes-256-cbc:3.007813
                                          +R:526698:aes-256-cbc:3.007813
                                          +DT:aes-256-cbc:3:8192
                                          +R:525146:aes-256-cbc:3.007813
                                          +DT:aes-256-cbc:3:8192
                                          +DT:aes-256-cbc:3:8192
                                          +DT:aes-256-cbc:3:8192
                                          +R:65963:aes-256-cbc:3.000000
                                          +R:65715:aes-256-cbc:3.000000
                                          +R:65940:aes-256-cbc:3.000000
                                          +R:65937:aes-256-cbc:3.000000
                                          Got: +H:16:64:256:1024:8192 from 0
                                          Got: +F:22:aes-256-cbc:151930379.97:171784102.96:177600341.33:178784250.68:180122965.33 from 0
                                          Got: +H:16:64:256:1024:8192 from 1
                                          Got: +F:22:aes-256-cbc:152420192.42:172251465.98:178221653.33:179334753.08:179445760.00 from 1
                                          Got: +H:16:64:256:1024:8192 from 2
                                          Got: +F:22:aes-256-cbc:152863914.67:172155089.51:178167552.00:179312624.04:180051968.00 from 2
                                          Got: +H:16:64:256:1024:8192 from 3
                                          Got: +F:22:aes-256-cbc:152619936.00:172274994.41:178182570.67:179289133.22:180060160.00 from 3
                                          OpenSSL 1.0.2j-freebsd  26 Sep 2016
                                          built on: date not available
                                          options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
                                          compiler: clang
                                          evp             609834.42k   688465.65k   712172.12k   716720.76k   719680.85k
                                          nas4free ~/ chucko~$ 
                                          
                                          
                                          1 Reply Last reply Reply Quote 0
                                          • V
                                            VAMike
                                            last edited by

                                            @Chucko:

                                            Adding -elapsed to the above command only changed results by ~2%.

                                            Yeah, without aes-ni cryptodev isn't in play, and while -elapsed gives a less accurate result when using openssl's internal crypto routines the two numbers should be pretty close.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.