AES-NI performance
-
Revisiting this thread, many have noted that results are all over the map. Here are some more results from my Zotac ZBOX ID92. This is the exact same command being run. I noticed that the openssl command is single threaded (edit: somebody else mentioned that) as it only loads one of the available four CPUs. The highest 8192 bytes result is 182,250,987k. The lowest is 18,290,730k. I don't know what to make of these results
Well, if you'd read what I wrote above you'd understand completely: the posted results are useless noise because people are using cryptodev in their testing without the -elapsed flag and aren't actually measuring anything to do with crypto performance. It's immediately obvious for anyone familiar with the openssl implementation just by looking. Your system isn't capable of transferring 182GByte/s, full stop. So any result showing that it is can be immediately discounted. Run again with the -elapsed flag and you'll see consistent number which actually reflect what you're trying to see. Or turn off aesni.ko, it's probably only slowing you down anyway.
-
I tried it with the 'multi' option to load up all four CPUs (2 physical, 2 SMT). Here are the results of the first try. Can anyone decypher what this means?
[2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -multi 4 -evp aes-256-cbc
-multi forces -elapsed, so you're actually seeing a real number which is shockingly low compared to the artificial numbers that people have been drooling over. run "kldunload aesni.ko" to kill the cryptodev implementation and rerun, you should see an order of magnitude improvement for smaller block sizes and a smaller but still substantial improvement in large blocks.
-
I tried it with the 'multi' option to load up all four CPUs (2 physical, 2 SMT). Here are the results of the first try. Can anyone decypher what this means?
[2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -multi 4 -evp aes-256-cbc
-multi forces -elapsed, so you're actually seeing a real number which is shockingly low compared to the artificial numbers that people have been drooling over. run "kldunload aesni.ko" to kill the cryptodev implementation and rerun, you should see an order of magnitude improvement for smaller block sizes and a smaller but still substantial improvement in large blocks.
That makes sense. I tried the multi 4 and multi 2 options and it pretty much scaled perfectly with my original one core - elapsed score.
-
This makes more sense.
2.3.2-RELEASE][root@pfSense.home]/root: openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 1826319 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 1872707 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 1517032 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 1024 size blocks: 866718 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 173745 aes-256-cbc's in 3.00s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 9740.37k 39951.08k 129117.15k 295839.74k 474439.68k -
For reference, Atom D525 w/ hyperthreading disabled:
openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 3336818 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 64 size blocks: 913146 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 256 size blocks: 233424 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 1024 size blocks: 58628 aes-256-cbc's in 2.98s Doing aes-256-cbc for 3s on 8192 size blocks: 7337 aes-256-cbc's in 2.98s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 17889.54k 19582.44k 20023.14k 20116.46k 20139.80k
-
Adding -elapsed to the above command only changed results by ~2%.
Here's the multi-threaded result:
openssl speed -multi 2 -evp aes-256-cbc Forked child 0 Forked child 1 +DT:aes-256-cbc:3:16 +DT:aes-256-cbc:3:16 +R:3311914:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:64 +R:3377542:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:64 +R:886867:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:256 +R:913678:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:256 +R:226698:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:1024 +R:233562:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:1024 +R:57329:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:8192 +R:58852:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:8192 +R:7285:aes-256-cbc:3.000000 +R:7406:aes-256-cbc:3.000000 Got: +H:16:64:256:1024:8192 from 0 Got: +F:22:aes-256-cbc:17663541.33:18919829.33:19344896.00:19568298.67:19892906.67 from 0 Got: +H:16:64:256:1024:8192 from 1 Got: +F:22:aes-256-cbc:18013557.33:19491797.33:19930624.00:20088149.33:20223317.33 from 1 OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang evp 35677.10k 38411.63k 39275.52k 39656.45k 40116.22k
-
And for more perspective, my NAS4Free box running FreeBSD 11.0-RELEASE. This is a Core 2 Quad Q9550 @ 2.83 GHz.
nas4free ~/ chucko~$ openssl speed -elapsed -evp aes-256-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 28607257 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size blocks: 8038838 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 2078627 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 521836 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 65551 aes-256-cbc's in 3.00s OpenSSL 1.0.2j-freebsd 26 Sep 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 152175.75k 171495.21k 177376.17k 178120.02k 178997.93k nas4free ~/ chucko~$ openssl speed -multi 4 -evp aes-256-cbc Forked child 0 Forked child 1 Forked child 2 +DT:aes-256-cbc:3:16 +DT:aes-256-cbc:3:16 +DT:aes-256-cbc:3:16 +DT:aes-256-cbc:3:16 Forked child 3 +R:28661984:aes-256-cbc:3.000000 +R:28561131:aes-256-cbc:3.007813 +R:28616238:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:64 +DT:aes-256-cbc:3:64 +DT:aes-256-cbc:3:64 +R:28653210:aes-256-cbc:3.007813 +DT:aes-256-cbc:3:64 +R:8221475:aes-256-cbc:3.054688 +R:8216875:aes-256-cbc:3.054688 +R:8222598:aes-256-cbc:3.054688 +R:8199168:aes-256-cbc:3.054688 +DT:aes-256-cbc:3:256 +DT:aes-256-cbc:3:256 +DT:aes-256-cbc:3:256 +DT:aes-256-cbc:3:256 +R:2088535:aes-256-cbc:3.000000 +R:2088077:aes-256-cbc:3.000000 +R:2081254:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:1024 +R:2087901:aes-256-cbc:3.000000 +DT:aes-256-cbc:3:1024 +DT:aes-256-cbc:3:1024 +DT:aes-256-cbc:3:1024 +R:526763:aes-256-cbc:3.007813 +R:526629:aes-256-cbc:3.007813 +R:526698:aes-256-cbc:3.007813 +DT:aes-256-cbc:3:8192 +R:525146:aes-256-cbc:3.007813 +DT:aes-256-cbc:3:8192 +DT:aes-256-cbc:3:8192 +DT:aes-256-cbc:3:8192 +R:65963:aes-256-cbc:3.000000 +R:65715:aes-256-cbc:3.000000 +R:65940:aes-256-cbc:3.000000 +R:65937:aes-256-cbc:3.000000 Got: +H:16:64:256:1024:8192 from 0 Got: +F:22:aes-256-cbc:151930379.97:171784102.96:177600341.33:178784250.68:180122965.33 from 0 Got: +H:16:64:256:1024:8192 from 1 Got: +F:22:aes-256-cbc:152420192.42:172251465.98:178221653.33:179334753.08:179445760.00 from 1 Got: +H:16:64:256:1024:8192 from 2 Got: +F:22:aes-256-cbc:152863914.67:172155089.51:178167552.00:179312624.04:180051968.00 from 2 Got: +H:16:64:256:1024:8192 from 3 Got: +F:22:aes-256-cbc:152619936.00:172274994.41:178182570.67:179289133.22:180060160.00 from 3 OpenSSL 1.0.2j-freebsd 26 Sep 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang evp 609834.42k 688465.65k 712172.12k 716720.76k 719680.85k nas4free ~/ chucko~$
-
Adding -elapsed to the above command only changed results by ~2%.
Yeah, without aes-ni cryptodev isn't in play, and while -elapsed gives a less accurate result when using openssl's internal crypto routines the two numbers should be pretty close.
-
Hopefully this of value/help to others:
Quad Core Celeron J1900 Bay Trail 2.0GHz
(specifically this "Chinese" appliance: https://www.amazon.com/Firewall-micro-appliance-Gigabit-pfSense/dp/B01JHJGG5MCPU no AES-NI, so no difference in these two tests (based on what I've read in this thread) …
openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 5619317 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size blocks: 1475355 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 256 size blocks: 373757 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on 1024 size blocks: 94034 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 8192 size blocks: 11800 aes-256-cbc's in 3.00s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 29891.85k 31392.49k 31977.20k 32013.57k 32221.87k
and
openssl speed -elapsed -evp aes-256-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 5627119 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 1472526 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 256 size blocks: 375127 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 1024 size blocks: 94726 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 8192 size blocks: 11769 aes-256-cbc's in 3.00s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 30011.30k 31332.29k 31927.69k 32082.50k 32137.22k
-
In case it helps anyone
System Specs
-
ASRock H270M-ITX/ac
-
Intel(R) Core(TM) i5-7500
-
Adaptive {PowerD}
uname
[2.4.0-BETA][admin@pfsense.localdomain]/root: uname -a FreeBSD pfsense.localdomain 11.0-RELEASE-p10 FreeBSD 11.0-RELEASE-p10 #75 51c8a24f312(RELENG_2_4): Fri May 12 19:55:27 CDT 2017 root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense amd64
dmesg cpu
[2.4.0-BETA][admin@pfsense.localdomain]/: dmesg | grep CPU CPU: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz (3408.16-MHz K8-class CPU) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0: <acpi cpu="">on acpi0 cpu1: <acpi cpu="">on acpi0 cpu2: <acpi cpu="">on acpi0 cpu3: <acpi cpu="">on acpi0 SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! coretemp0: <cpu on-die="" thermal="" sensors="">on cpu0 coretemp1: <cpu on-die="" thermal="" sensors="">on cpu1 coretemp2: <cpu on-die="" thermal="" sensors="">on cpu2 coretemp3: <cpu on-die="" thermal="" sensors="">on cpu3</cpu></cpu></cpu></cpu></acpi></acpi></acpi></acpi>
pciconf -lv
[2.4.0-BETA][admin@pfsense.localdomain]/: pciconf -lv hostb0@pci0:0:0:0: class=0x060000 card=0x591f1849 chip=0x591f8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' class = bridge subclass = HOST-PCI pcib1@pci0:0:1:0: class=0x060400 card=0x19011849 chip=0x19018086 rev=0x05 hdr=0x01 vendor = 'Intel Corporation' device = 'Skylake PCIe Controller (x16)' class = bridge subclass = PCI-PCI vgapci0@pci0:0:2:0: class=0x030000 card=0x59121849 chip=0x59128086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' class = display subclass = VGA xhci0@pci0:0:20:0: class=0x0c0330 card=0xa2af1849 chip=0xa2af8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = serial bus subclass = USB none0@pci0:0:20:2: class=0x118000 card=0xa2b11849 chip=0xa2b18086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = dasp none1@pci0:0:22:0: class=0x078000 card=0xa2ba1849 chip=0xa2ba8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = simple comms ahci0@pci0:0:23:0: class=0x010601 card=0xa2821849 chip=0xa2828086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = mass storage subclass = SATA pcib2@pci0:0:28:0: class=0x060400 card=0xa2921849 chip=0xa2928086 rev=0xf0 hdr=0x01 vendor = 'Intel Corporation' class = bridge subclass = PCI-PCI pcib3@pci0:0:28:5: class=0x060400 card=0xa2951849 chip=0xa2958086 rev=0xf0 hdr=0x01 vendor = 'Intel Corporation' class = bridge subclass = PCI-PCI pcib4@pci0:0:29:0: class=0x060400 card=0xa2981849 chip=0xa2988086 rev=0xf0 hdr=0x01 vendor = 'Intel Corporation' class = bridge subclass = PCI-PCI isab0@pci0:0:31:0: class=0x060100 card=0xa2c41849 chip=0xa2c48086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = bridge subclass = PCI-ISA none2@pci0:0:31:2: class=0x058000 card=0xa2a11849 chip=0xa2a18086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = memory none3@pci0:0:31:4: class=0x0c0500 card=0xa2a31849 chip=0xa2a38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = serial bus subclass = SMBus em0@pci0:0:31:6: class=0x020000 card=0x15b81849 chip=0x15b88086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection (2) I219-V' class = network subclass = ethernet igb0@pci0:1:0:0: class=0x020000 card=0x00018086 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb1@pci0:1:0:1: class=0x020000 card=0x00018086 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb2@pci0:1:0:2: class=0x020000 card=0x00018086 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb3@pci0:1:0:3: class=0x020000 card=0x00018086 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb4@pci0:3:0:0: class=0x020000 card=0x15391849 chip=0x15398086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'I211 Gigabit Network Connection' class = network subclass = ethernet nvme0@pci0:4:0:0: class=0x010802 card=0xa801144d chip=0xa804144d rev=0x00 hdr=0x00 vendor = 'Samsung Electronics Co Ltd' class = mass storage subclass = NVM
aesni unloaded
{-engine omitted} versus {-engine=cryptodev}
[2.4.0-BETA][admin@pfsense.localdomain]/root: kldunload aesni [2.4.0-BETA][admin@pfsense.localdomain]/root: openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 150632064 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on 64 size blocks: 41237969 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 10550741 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 2695765 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on 8192 size blocks: 335120 aes-256-cbc's in 3.00s OpenSSL 1.0.2k-freebsd 26 Jan 2017 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 805468.58k 879743.34k 900329.90k 922556.95k 915101.01k
[2.4.0-BETA][admin@pfsense.localdomain]/root: kldunload aesni [2.4.0-BETA][admin@pfsense.localdomain]/: openssl speed -evp aes-256-cbc -engine cryptodev engine "cryptodev" set. Doing aes-256-cbc for 3s on 16 size blocks: 146575420 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on 64 size blocks: 41172378 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 10626707 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 2699103 aes-256-cbc's in 2.99s Doing aes-256-cbc for 3s on 8192 size blocks: 332528 aes-256-cbc's in 3.00s OpenSSL 1.0.2k-freebsd 26 Jan 2017 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 783776.66k 878344.06k 906812.33k 923699.29k 908023.13k
{-engine omitted} versus {-engine=cryptodev} && {-elapsed}
[2.4.0-BETA][admin@pfsense.localdomain]/root: kldunload aesni [2.4.0-BETA][admin@pfsense.localdomain]/root: openssl speed -evp aes-256-cbc -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 148406148 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size blocks: 41268481 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 10574324 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 2695729 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 334470 aes-256-cbc's in 3.00s OpenSSL 1.0.2k-freebsd 26 Jan 2017 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 789443.61k 880394.26k 902342.31k 920142.17k 913326.08k
[2.4.0-BETA][admin@pfsense.localdomain]/root: kldunload aesni [2.4.0-BETA][admin@pfsense.localdomain]/: openssl speed -evp aes-256-cbc -elapsed -engine cryptodev engine "cryptodev" set. You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 146175678 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 41289379 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 10663194 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 2674432 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 334106 aes-256-cbc's in 3.00s OpenSSL 1.0.2k-freebsd 26 Jan 2017 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 779603.62k 880840.09k 909925.89k 912872.79k 912332.12k
aesni loaded
{-engine omitted} versus {-engine=cryptodev}
[2.4.0-BETA][admin@pfsense.localdomain]/root: kldload aesni [2.4.0-BETA][admin@pfsense.localdomain]/root: openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 1792739 aes-256-cbc's in 0.34s Doing aes-256-cbc for 3s on 64 size blocks: 1996478 aes-256-cbc's in 0.35s Doing aes-256-cbc for 3s on 256 size blocks: 1750550 aes-256-cbc's in 0.21s Doing aes-256-cbc for 3s on 1024 size blocks: 1202918 aes-256-cbc's in 0.25s Doing aes-256-cbc for 3s on 8192 size blocks: 296024 aes-256-cbc's in 0.05s OpenSSL 1.0.2k-freebsd 26 Jan 2017 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 83443.85k 363447.73k 2124519.35k 4927152.13k 44343380.26k
[2.4.0-BETA][admin@pfsense.localdomain]/root: kldload aesni [2.4.0-BETA][admin@pfsense.localdomain]/: openssl speed -evp aes-256-cbc -engine cryptodev engine "cryptodev" set. Doing aes-256-cbc for 3s on 16 size blocks: 1821618 aes-256-cbc's in 0.41s Doing aes-256-cbc for 3s on 64 size blocks: 2000941 aes-256-cbc's in 0.28s Doing aes-256-cbc for 3s on 256 size blocks: 1770129 aes-256-cbc's in 0.23s Doing aes-256-cbc for 3s on 1024 size blocks: 1193860 aes-256-cbc's in 0.15s Doing aes-256-cbc for 3s on 8192 size blocks: 299654 aes-256-cbc's in 0.03s OpenSSL 1.0.2k-freebsd 26 Jan 2017 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 70390.07k 455325.24k 1933452.90k 8235874.63k 78552498.18k
{-engine omitted} versus {-engine=cryptodev} && {-elapsed}
[2.4.0-BETA][admin@pfsense.localdomain]/root: kldload aesni [2.4.0-BETA][admin@pfsense.localdomain]/root: openssl speed -evp aes-256-cbc -elapsed You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 1945418 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 2012669 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 1750631 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 1200128 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 298092 aes-256-cbc's in 3.00s OpenSSL 1.0.2k-freebsd 26 Jan 2017 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 10375.56k 42936.94k 149387.18k 409643.69k 813989.89k
[2.4.0-BETA][admin@pfsense.localdomain]/root: kldload aesni [2.4.0-BETA][admin@pfsense.localdomain]/: openssl speed -evp aes-256-cbc -elapsed -engine cryptodev engine "cryptodev" set. You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 1907305 aes-256-cbc's in 3.01s Doing aes-256-cbc for 3s on 64 size blocks: 2009783 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 1773813 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 1205382 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 296249 aes-256-cbc's in 3.00s OpenSSL 1.0.2k-freebsd 26 Jan 2017 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 10145.87k 42875.37k 151365.38k 411437.06k 808957.27k
-
-
AMD Athlon 5350 APU with Radeon(tm) R3
4 CPUs: 1 package(s) x 4 core(s)
AES-NI CPU Crypto: Yes (active)openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 52378144 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 17296394 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 5031667 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 1307810 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 165573 aes-256-cbc's in 3.00s OpenSSL 1.0.2k-freebsd 26 Jan 2017 built on: date not available options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 279350.10k 368989.74k 429368.92k 446399.15k 452124.67k
-
You know what? I still don't know what is good or bad or what these results mean to me in the real world:
openssl speed -evp aes-256-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 50744813 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 13939575 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 3914297 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1010884 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 127631 aes-256-cbc's in 3.00s
OpenSSL 1.0.2g 1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,–noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 270639.00k 297377.60k 334020.01k 345048.41k 348517.72kcpuid | grep -i aes
AES instruction = true
AES instruction = true
AES instruction = true
AES instruction = true
AES instruction = true
AES instruction = true
AES instruction = true
AES instruction = trueInterestingly enough I ran the same test on a VM running on a i7 Q70 that has no aes acceleration at all and the numbers were about half what the AES accelerated chip did.
The first test is running on a 8 core AMD 8150 and the second (values are all approx half) ran on a very old wimpy i7 quad core with no AES-NI.I would expect the AMD to run 2 or 3 times faster even if it had no AES-NI. Basically I don't feel these test mean very much and that the only way to gauge performance is an actual throughput test using vpn traffic.
-
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 270639.00k 297377.60k 334020.01k 345048.41k 348517.72k
[…]
nterestingly enough I ran the same test on a VM running on a i7 Q70 that has no aes acceleration at all and the numbers were about half what the AES accelerated chip did.
The first test is running on a 8 core AMD 8150 and the second (values are all approx half) ran on a very old wimpy i7 quad core with no AES-NI.I would expect the AMD to run 2 or 3 times faster even if it had no AES-NI. Basically I don't feel these test mean very much and that the only way to gauge performance is an actual throughput test using vpn traffic.
The number of cores is irrelevant, it's a single threaded test. (It's also worth pointing out that your bulldozer era chip isn't really 8 cores, it's 4 cores that have a multi-thread implementation similar to intel's hyperthreading, and the early releases weren't tuned very well.) I don't have any numbers for the FX-8150, but it's is an old CPU, so your results aren't necessarily unreasonable. I have tested bulldozer-based opterons and I'd have expected your results to be a bit higher based on clockspeed, but I don't have the data points to know how the results should scale on the desktop chips of that line. I would double check that you have the cryptodev checkbox turned off because that will slow things down, but that might be as good as it gets.
It's important to remember that AES-NI implementations have evolved a lot over the years, so there's a whole lot more to performance than its simple presence. You are correct that the openssl speed results alone aren't going to predict OpenVPN performance, but they are a datapoint that can help predict performance relative to other known systems, and can help establish a ceiling on performance. (E.g., a system that can only perform AES-256-CBC at 30MByte/s is never going to get more than 240Mbit/s of VPN, and much less in practice.)
-
The AES test is single threaded? Is openssl also single threaded during normal use?
-
The AES test is single threaded? Is openssl also single threaded during normal use?
Yes, as is OpenVPN (what you probably mean to be asking about).
-
Nope - I know that openvpn is single threaded in that each instance gets a single thread.
What I'm wondering is do multiple instances of openvpn, which result in multiple openvpn threads each also result in multiple threads of openssl?
Example. Do 4 openvpn instances rely on a single instance of openssl working on the crypt or 4 threads?
-
Nope - I know that openvpn is single threaded in that each instance gets a single thread.
What I'm wondering is do multiple instances of openvpn, which result in multiple openvpn threads each also result in multiple threads of openssl?
the "openssl" command line utility is single threaded unless you pass -multi (which produces an output which is pretty meaningless and hard to compare across platforms, just don't do that). The ssl library is single threaded with a process. If you run multiple instances of openvpn you are running multiple independent processes, not threads, and can utilize different cores with each process.
You didn't answer whether the cryptodev stuff was disabled in the gui.
-
Yes - cryptodev is disabled and AES-NI is enabled. The pfsense VM gets about the same scores at the physical machine also, which is pretty nice to see.
I was only in the box to test why its getting random crashes, so I was just playing around and running process to stress the machine to wait for the crash.
And it died… I think the power supply is failing. Going to have to get that replaced before I can further study the mysteries of AES-NI on the AMD 8150.
-
Hi all,
Version 2.4.3-RELEASE-p1 (amd64) CPU Type Intel(R) Xeon(R) CPU X5650 @ 2.67GHz 24 CPUs: 2 package(s) x 6 core(s) x 2 hardware threads AES-NI CPU Crypto: Yes (active)
I performed several tests with the following commands:
openssl speed -evp aes-128-cbc -elapsed openssl speed -evp aes-128-gcm -elapsed
with different Cryptographic Hardware and Kernel PTI settings (+PTI means Kernel PTI is enabled):
+------------------------+--------------------------+--------------------------+--------------+--------------+-----------------+-----------------+ | | AES-NI + Cryptodev + PTI | AES-NI + Cryptodev - PTI | AES-NI + PTI | AES-NI - PTI | Cryptodev + PTI | Cryptodev - PTI | +------------------------+--------------------------+--------------------------+--------------+--------------+-----------------+-----------------+ | aes-128-cbc 16 bytes | 7189 | 7794 | 612843 | 612249 | 605915 | 588186 | | aes-128-cbc 8192 bytes | 568785 | 591544 | 765053 | 763943 | 763748 | 764321 | | aes-128-gcm 16 bytes | 243029 | 243885 | 238457 | 251084 | 250158 | 229928 | | aes-128-gcm 8192 bytes | 942211 | 943865 | 944693 | 943185 | 944543 | 946034 | +------------------------+--------------------------+--------------------------+--------------+--------------+-----------------+-----------------+
The router was rebooted after changing each setting.
Can anybody explain the very small values in aes-128-cbc 16 bytes test as well as remarkably smaller values in aes-128-cbc 8192 bytes test when both AES-NI and Cryptodev enabled?
Thanks in advance!
-
I suggest that when both are enabled the AES-NI module registers itself as a crypto device in the framework for AES-CBC and openssl tries to use it. That results in massive additional switching especially for small packets.
Though there is a load of misinformation surrounding this and I have managed to get it wrong before!Perhaps more interesting is that you seem to be seeing a better result with PTI enabled in some cases there. I have no explanation for that.
Steve