AES-NI performance
-
Who ever said anything about 7GB/s?! We're comparing the relative performance of different AES-NI hardware implementations.
No, you aren't. You're comparing their context switching rates, which has nothing to do with their crypto processing rates. You can't rationalize this into something positive. And the icing on the cake is that when people post the right numbers you turn them down because they don't fit your misconceptions.
Edit to add: actually, it's worse than that–given two cpus that are otherwise equal, this methodology will actually penalize the one with the more efficient crypto implementation (because it will spend relatively less time doing crypto in kernel space where the time isn't counted and more time in user space doing context switches, which are the only time counted).
-
VAMike, the keyword is "relative" - and in this context refers to comparing results from different hardware.
-
VAMike, the keyword is "relative" - and in this context refers to comparing results from different hardware.
Again, the words put together don't make any sense. Why would you reject solid data in favor of bogus data, it's not like it's any harder to gather. If you collected the real numbers, then you'd have an actual "relative" comparison rather than an "irrelevant" comparison. Is it really that difficult to just admit you were wrong and move on?
-
Again, the words put together don't make any sense.
Of course they don't make any sense to you - because you're either not reading or not trying to understand what I'm saying.
-
SuperMicro Board: X11SBA-LN4F with Intel N3700.
Why you are not using the IPSec VPN together with AES-GCM with that CPU?
…a big difference and hence why AES-NI offers a big gain in performance - if you can harness it properly.
Another member of this forum was posting in on reddit that he got a real throughput of nearly ~500 MBit/s
together with IPSec VPN over AES-GCM, based on a pfSense SG-4860 on a 1 GBit/s Internet connection.@VAMike
I consider to the circumstance that a real life VPN connection is better then all the testing runs on a bare
hardware machine. What I can get out from a device is not able to test on that device alone and only over
an OpenSSL test, OpenSSL is multi core using and the OpenVPN part isn´t tight now using that.Also talking over crypto cards such the soekris vpn14x1 is today a little bit outdated, with an viewing eyes
on the todays Internet speed. But in the past getting instead of ~14 MBit/s without it and then ~42 MBit/s
using it (vpn1411) was really impressive for me on a net5501 or an Alix Board. It was nearly the 3x speed!If I am using site-2-side VPN I only use IPSec with AES-GCM.
-
@BlueKobold:
I consider to the circumstance that a real life VPN connection is better then all the testing runs on a bare
hardware machine. What I can get out from a device is not able to test on that device alone and only over
an OpenSSL testCertainly it makes sense to optimize for the actual application. That said, running the openssl speed routine will give a ceiling for your performance. If you need to get N and openssl speed (the real results, not the meaningless /dev/crypto ones without -elapsed) says your hardware delivers N/2, no amount of tweaking is going to get you the results you need. The results (again, the real ones) are also useful to compare hardware: you may find (this is really a thing) that at a given price point three different systems have order of magnitude differences in their crypto processing rate–that's valuable information that's definitely worth knowing if crypto processing is a factor in choosing a solution. Actual VPN throughput would be a better basis for comparison, but that's much more configuration dependent and hard to communicate as a single repeatable value that you can ask someone for.
OpenSSL is multi core using and the OpenVPN part isn´t tight now using that.
The openssl speed routine is single threaded. If you add the -multi N parameter with a new enough version it will launch N single threaded processes and combine the results.
Also talking over crypto cards such the soekris vpn14x1 is today a little bit outdated, with an viewing eyes
on the todays Internet speed. But in the past getting instead of ~14 MBit/s without it and then ~42 MBit/s
using it (vpn1411) was really impressive for me on a net5501 or an Alix Board. It was nearly the 3x speed!
If I am using site-2-side VPN I only use IPSec with AES-GCM.If 42Mbit/s is acceptable for you, then you're golden. Almost anything modern will run rings around that, though, without the vpn card. You're right that AES GCM is generally a winner. People sometimes compare it to AES CBC and get disappointed, but the proper comparison is to AES CBC + SHA HMAC (because GCM includes MAC) and that changes things, especially on the lower end where GCM isn't optimized as well as it is on the better architectures so the comparison between GCM and CBC without HMAC looks worse:
(GX-412TC / APU2 @1GHZ)
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 129634.49k 180885.10k 218398.55k 230835.20k 233439.23k
aes-128-cbc-hmac-sha1 37836.21k 57600.58k 64128.26k 76066.47k 81273.41k
aes-128-gcm 66775.78k 171264.79k 256270.08k 293397.23k 304955.39k(silvermont @2.4GHz)
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 212028.47k 324202.84k 387726.68k 407513.09k 412789.42k
aes-128-cbc-hmac-sha1 82160.17k 127661.48k 144141.78k 152486.91k 155320.32k
aes-128-gcm 127695.16k 218440.90k 280572.06k 304679.94k 310804.48k(sandy bridge @2.5GHz)
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 466082.23k 481016.45k 527165.95k 575894.53k 572252.16k
aes-128-cbc-hmac-sha1 177098.15k 224449.86k 325795.96k 379470.51k 400328.52k
aes-128-gcm 237922.04k 572122.82k 761623.45k 835320.15k 919997.10k(haswell @2.6GHz)
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 617610.13k 704447.23k 724869.63k 723632.81k 718821.97k
aes-128-cbc-hmac-sha1 214205.59k 341683.78k 514962.58k 617723.90k 656337.58k
aes-128-gcm 422036.43k 1069918.31k 1470884.44k 1609671.68k 1635520.47k(skylake @3.7GHz)
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 919971.09k 1366752.68k 1394404.61k 1400528.21k 1400209.41k 1400182.10k
aes-128-cbc-hmac-sha1 335511.72k 513717.82k 675188.48k 804944.90k 874438.66k 882191.02k
aes-128-gcm 643566.37k 1481056.34k 2880229.72k 4531479.55k 5638567.25k 5748069.72kYou can see the silvermont GCM is actually slower than CBC, but ~2x CBC+HMAC. And for skylake the difference between CBC & GCM is huge and CBC+HMAC is just blown away. You can also see where the relatively inefficient PCLMULQDQ implementation hurts silvermont, to the point that an APU2 running at half the speed is actually competitive (real world VPN performance won't be nearly as close because the overall platform is much slower, and the small block results highlight the difference in a case where the crypto instructions have less room to run). And you can see the really impressive improvements intel has made over the past few years from sandy bridge to haswell to skylake. N.b., I didn't make any attempt to quiesce the systems or do real multi-trial benchmarking, but the numbers should be within about 20% or so across platforms and pretty consistent within a platform–certainly good enough for the discussion. It's also worth noting those are fairly recent versions of openssl, and older versions don't implement the CBC+HMAC EVP mode (so don't try to compare apples and oranges).
-
Here is my test. pfsense spec is in the signature :)
[2.3.2-RELEASE][root@pfsense.local]/root: openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 1697468 aes-256-cbc's in 0.23s
Doing aes-256-cbc for 3s on 64 size blocks: 1735785 aes-256-cbc's in 0.27s
Doing aes-256-cbc for 3s on 256 size blocks: 1514519 aes-256-cbc's in 0.28s
Doing aes-256-cbc for 3s on 1024 size blocks: 1025506 aes-256-cbc's in 0.22s
Doing aes-256-cbc for 3s on 8192 size blocks: 253309 aes-256-cbc's in 0.05s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 115880.48k 418222.08k 1378548.85k 4800540.09k 37944819.71k -
@CiscoX:
Here is my test. pfsense spec is in the signature :)
[2.3.2-RELEASE][root@pfsense.local]/root: openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 1697468 aes-256-cbc's in 0.23s
Doing aes-256-cbc for 3s on 64 size blocks: 1735785 aes-256-cbc's in 0.27s
Doing aes-256-cbc for 3s on 256 size blocks: 1514519 aes-256-cbc's in 0.28s
Doing aes-256-cbc for 3s on 1024 size blocks: 1025506 aes-256-cbc's in 0.22s
Doing aes-256-cbc for 3s on 8192 size blocks: 253309 aes-256-cbc's in 0.05s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 115880.48k 418222.08k 1378548.85k 4800540.09k 37944819.71kSo the real number is 691702, which is actually a bit low for a skylake @2.7GHz.
-
Thanks CiscoX. I've added your results to the list to get a sense compared to others:
170926276.61k unknown (China) gen 5 i5 Koenig 150749577.22k Microserver Gen 8 ESXi 6.0 biggsy 91090845.70k Zotac ZBOX ID92 Core i5 4570T highwire 48454172.67k SuperMicro Board: X11SBA-LN4F Intel N3700 Engineer 48351936.51k SuperMicro 2758 Intel(R) Atom(TM) CPU C2758 @ 2.40GHz 8 CPUs AR15USR 42008576.00k Gigabyte GA-N3150N-D3V board Celeron N3150 with AES-NI https://forum.pfsense.org/index.php?topic=108119.0 37944819.71k Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz (Skylake) CiscoX 32321306.62k SuperMicro 2758 Intel(R) Atom(TM) CPU C2758 @ 2.40GHz 8 CPUs AR15USR 32267479.72k Supermicro Intel N3700 Engineer 29080158.21k hp microserver gen 8 Xeon 1265Lv2 iorx 27986842.97k Gigabyte GA-N3150N-D3V Celeron N3150 with AES-NI https://forum.pfsense.org/index.php?topic=105114.msg601520#msg601520 24435715.51k unknown (China) gen 5 i5 Koenig 24345837.57k Lanner FW-7525D Quad-core Atom C2558 @ 2.40GHz RMB 24332468.22k Netgate SG-4860 Intel(R) Atom(TM) CPU C2558 @ 2.40GHz 4 CPUs bytesizedalex 21142437.89k Partaker B5 Intel N3150 albatorsk https://forum.pfsense.org/index.php?topic=75415.msg609564#msg609564 19462619.14k SuperMicro 2758 Intel(R) Atom(TM) CPU C2758 @ 2.40GHz 8 CPUs AR15USR 18390712.32k AM1 Athlon 5370 W4RH34D 14241549.52k pfSense SG-2440 Dual-core Atom C2358 @ 1.74GHz RMB 7123763.20k Raspberry Pi 3 ARMv7l aesguy 405686.95k Mini-ITX Build Intel i7-4510U + 2x Intel 82574 + 2x Intel i350 https://forum.pfsense.org/index.php?topic=115627.msg646395#msg646395 230708.57k ci323 nano u Celeron N3150 with AES-NI w/ -engine cryptodev https://forum.pfsense.org/index.php?topic=115673.msg656602#msg656602 217617.75k RCC-VE 2440 Intel Atom C2358 https://forum.pfsense.org/index.php?topic=91974.0 124788.74k ALIX.APU2B4/APU2C4 1 GHz Quad Core AMD GX-412TC http://wiki.ipfire.org/en/hardware/pcengines/apu2b4 34204.33k ALIX.APU1C/APU1D 1 GHz Dual Core AMD G-T40E http://wiki.ipfire.org/en/hardware/pcengines/apu1c
-
Revisiting this thread, many have noted that results are all over the map. Here are some more results from my Zotac ZBOX ID92. This is the exact same command being run. I noticed that the openssl command is single threaded (edit: somebody else mentioned that) as it only loads one of the available four CPUs. The highest 8192 bytes result is 182,250,987k. The lowest is 18,290,730k. I don't know what to make of these results.
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 56825.17k 333369.17k 1653957.15k 5188806.84k 45544898.56ktype 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 120866.00k 307110.87k 1461971.31k 4371936.81k 182250897.41ktype 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 50362.80k 383251.87k 1430783.82k 4384267.66k 90994900.99ktype 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 102366.01k 357036.89k 1305581.46k 4944793.78k 60739463.85ktype 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 79577.36k 400752.86k 1147821.89k 4919229.04k 30295283.03ktype 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 73808.26k 317724.33k 1134589.02k 3696661.67k 18290730.60ktype 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 81048.09k 295471.58k 1208339.18k 8731986.39k 91067777.02k -
I tried it with the 'multi' option to load up all four CPUs (2 physical, 2 SMT). Here are the results of the first try. Can anyone decypher what this means?
[2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -multi 4 -evp aes-256-cbc
Forked child 0
Forked child 1
Forked child 2
+DT:aes-256-cbc:3:16
Forked child 3
+DT:aes-256-cbc:3:16
+DT:aes-256-cbc:3:16
+DT:aes-256-cbc:3:16
+R:1162717:aes-256-cbc:3.000000
+R:1169710:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:64
+DT:aes-256-cbc:3:64
+R:1168829:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:64
+R:1170790:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:64
+R:1334722:aes-256-cbc:3.000000
+R:1334881:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:256
+DT:aes-256-cbc:3:256
+R:1326770:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:256
+R:1335193:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:256
+R:1135822:aes-256-cbc:3.000000
+R:1138869:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:1024
+DT:aes-256-cbc:3:1024
+R:1129522:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:1024
+R:1138978:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:1024
+R:727690:aes-256-cbc:3.000000
+R:731525:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:8192
+DT:aes-256-cbc:3:8192
+R:726865:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:8192
+R:728322:aes-256-cbc:3.000000
+DT:aes-256-cbc:3:8192
+R:157520:aes-256-cbc:3.000000
+R:158319:aes-256-cbc:3.000000
Got: +H:16:64:256:1024:8192 from 0
Got: +F:22:aes-256-cbc:6201157.33:28474069.33:97183488.00:248384853.33:430134613.33 from 0
Got: +H:16:64:256:1024:8192 from 1
Got: +F:22:aes-256-cbc:6238453.33:28477461.33:96923477.33:249693866.67:432316416.00 from 1
+R:157175:aes-256-cbc:3.000000
Got: +H:16:64:256:1024:8192 from 2
Got: +F:22:aes-256-cbc:6233754.67:28304426.67:96385877.33:248103253.33:429192533.33 from 2
+R:158173:aes-256-cbc:3.000000
Got: +H:16:64:256:1024:8192 from 3
Got: +F:22:aes-256-cbc:6244213.33:28484117.33:97192789.33:248600576.00:431917738.67 from 3
OpenSSL 1.0.1s-freebsd 1 Mar 2016
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
evp 24917.58k 113740.07k 387685.63k 994782.55k 1723561.30k -
Revisiting this thread, many have noted that results are all over the map. Here are some more results from my Zotac ZBOX ID92. This is the exact same command being run. I noticed that the openssl command is single threaded (edit: somebody else mentioned that) as it only loads one of the available four CPUs. The highest 8192 bytes result is 182,250,987k. The lowest is 18,290,730k. I don't know what to make of these results
Well, if you'd read what I wrote above you'd understand completely: the posted results are useless noise because people are using cryptodev in their testing without the -elapsed flag and aren't actually measuring anything to do with crypto performance. It's immediately obvious for anyone familiar with the openssl implementation just by looking. Your system isn't capable of transferring 182GByte/s, full stop. So any result showing that it is can be immediately discounted. Run again with the -elapsed flag and you'll see consistent number which actually reflect what you're trying to see. Or turn off aesni.ko, it's probably only slowing you down anyway.
-
I tried it with the 'multi' option to load up all four CPUs (2 physical, 2 SMT). Here are the results of the first try. Can anyone decypher what this means?
[2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -multi 4 -evp aes-256-cbc
-multi forces -elapsed, so you're actually seeing a real number which is shockingly low compared to the artificial numbers that people have been drooling over. run "kldunload aesni.ko" to kill the cryptodev implementation and rerun, you should see an order of magnitude improvement for smaller block sizes and a smaller but still substantial improvement in large blocks.
-
I tried it with the 'multi' option to load up all four CPUs (2 physical, 2 SMT). Here are the results of the first try. Can anyone decypher what this means?
[2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -multi 4 -evp aes-256-cbc
-multi forces -elapsed, so you're actually seeing a real number which is shockingly low compared to the artificial numbers that people have been drooling over. run "kldunload aesni.ko" to kill the cryptodev implementation and rerun, you should see an order of magnitude improvement for smaller block sizes and a smaller but still substantial improvement in large blocks.
That makes sense. I tried the multi 4 and multi 2 options and it pretty much scaled perfectly with my original one core - elapsed score.
-
This makes more sense.
2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 1826319 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 1872707 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 1517032 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 1024 size blocks: 866718 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 173745 aes-256-cbc's in 3.00s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 9740.37k 39951.08k 129117.15k 295839.74k 474439.68k -
For reference, Atom D525 w/ hyperthreading disabled:
openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 3336818 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 64 size blocks: 913146 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 256 size blocks: 233424 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 1024 size blocks: 58628 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 8192 size blocks: 7337 aes-256-cbc's in 2.98s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 17889.54k 19582.44k 20023.14k 20116.46k 20139.80k
-
Adding -elapsed to the above command only changed results by ~2%.
Here's the multi-threaded result:
openssl speed -multi 2 -evp aes-256-cbc Forked child 0 Forked child 1 +DT:aes-256-cbc:3:16 +DT:aes-256-cbc:3:16 +R:3311914:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:64 +R:3377542:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:64 +R:886867:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:256 +R:913678:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:256 +R:226698:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:1024 +R:233562:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:1024 +R:57329:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:8192 +R:58852:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:8192 +R:7285:aes-256-cbc:3.000000 +R:7406:aes-256-cbc:3.000000 Got: +H:16:64:256:1024:8192 from 0 Got: +F:22:aes-256-cbc:17663541.33:18919829.33:19344896.00:19568298.67:19892906.67 from 0 Got: +H:16:64:256:1024:8192 from 1 Got: +F:22:aes-256-cbc:18013557.33:19491797.33:19930624.00:20088149.33:20223317.33 from 1 OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang evp 35677.10k 38411.63k 39275.52k 39656.45k 40116.22k
-
And for more perspective, my NAS4Free box running FreeBSD 11.0-RELEASE. This is a Core 2 Quad Q9550 @ 2.83 GHz.
nas4free ~/ chucko~$ openssl speed -elapsed -evp aes-256-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 28607257 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size blocks: 8038838 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 2078627 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 521836 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 65551 aes-256-cbc's in 3.00s OpenSSL 1.0.2j-freebsd 26 Sep 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 152175.75k 171495.21k 177376.17k 178120.02k 178997.93k nas4free ~/ chucko~$ openssl speed -multi 4 -evp aes-256-cbc Forked child 0 Forked child 1 Forked child 2 +DT:aes-256-cbc:3:16 +DT:aes-256-cbc:3:16 +DT:aes-256-cbc:3:16 +DT:aes-256-cbc:3:16 Forked child 3 +R:28661984:aes-256-cbc:3.000000 +R:28561131:aes-256-cbc:3.007813 +R:28616238:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:64 +DT:aes-256-cbc:3:64 +DT:aes-256-cbc:3:64 +R:28653210:aes-256-cbc:3.007813 +DT:aes-256-cbc:3:64 +R:8221475:aes-256-cbc:3.054688 +R:8216875:aes-256-cbc:3.054688 +R:8222598:aes-256-cbc:3.054688 +R:8199168:aes-256-cbc:3.054688 +DT:aes-256-cbc:3:256 +DT:aes-256-cbc:3:256 +DT:aes-256-cbc:3:256 +DT:aes-256-cbc:3:256 +R:2088535:aes-256-cbc:3.000000 +R:2088077:aes-256-cbc:3.000000 +R:2081254:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:1024 +R:2087901:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:1024 +DT:aes-256-cbc:3:1024 +DT:aes-256-cbc:3:1024 +R:526763:aes-256-cbc:3.007813 +R:526629:aes-256-cbc:3.007813 +R:526698:aes-256-cbc:3.007813 +DT:aes-256-cbc:3:8192 +R:525146:aes-256-cbc:3.007813 +DT:aes-256-cbc:3:8192 +DT:aes-256-cbc:3:8192 +DT:aes-256-cbc:3:8192 +R:65963:aes-256-cbc:3.000000 +R:65715:aes-256-cbc:3.000000 +R:65940:aes-256-cbc:3.000000 +R:65937:aes-256-cbc:3.000000 Got: +H:16:64:256:1024:8192 from 0 Got: +F:22:aes-256-cbc:151930379.97:171784102.96:177600341.33:178784250.68:180122965.33 from 0 Got: +H:16:64:256:1024:8192 from 1 Got: +F:22:aes-256-cbc:152420192.42:172251465.98:178221653.33:179334753.08:179445760.00 from 1 Got: +H:16:64:256:1024:8192 from 2 Got: +F:22:aes-256-cbc:152863914.67:172155089.51:178167552.00:179312624.04:180051968.00 from 2 Got: +H:16:64:256:1024:8192 from 3 Got: +F:22:aes-256-cbc:152619936.00:172274994.41:178182570.67:179289133.22:180060160.00 from 3 OpenSSL 1.0.2j-freebsd 26 Sep 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang evp 609834.42k 688465.65k 712172.12k 716720.76k 719680.85k nas4free ~/ chucko~$
-
Adding -elapsed to the above command only changed results by ~2%.
Yeah, without aes-ni cryptodev isn't in play, and while -elapsed gives a less accurate result when using openssl's internal crypto routines the two numbers should be pretty close.
-
Hopefully this of value/help to others:
Quad Core Celeron J1900 Bay Trail 2.0GHz
(specifically this "Chinese" appliance: https://www.amazon.com/Firewall-micro-appliance-Gigabit-pfSense/dp/B01JHJGG5MCPU no AES-NI, so no difference in these two tests (based on what I've read in this thread) …
openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 5619317 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size blocks: 1475355 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 256 size blocks: 373757 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on 1024 size blocks: 94034 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 8192 size blocks: 11800 aes-256-cbc's in 3.00s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 29891.85k 31392.49k 31977.20k 32013.57k 32221.87k
and
openssl speed -elapsed -evp aes-256-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 5627119 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 1472526 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 256 size blocks: 375127 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 1024 size blocks: 94726 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 8192 size blocks: 11769 aes-256-cbc's in 3.00s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 30011.30k 31332.29k 31927.69k 32082.50k 32137.22k