AES-NI performance
-
I was thinking of firing up the 6 core xeon but I just don't really care for epeen stuff anymore. I mean if someone needs to see it I'll do it, no time for "just for grins" these days.
-
I have a chinese "mini-computer" (gen 5 i5)
I did 2 test and got a very varying result:
Try one:
Doing aes-256-cbc for 3s on 16 size blocks: 1704782 aes-256-cbc's in 0.28s
Doing aes-256-cbc for 3s on 64 size blocks: 1762586 aes-256-cbc's in 0.31s
Doing aes-256-cbc for 3s on 256 size blocks: 1417931 aes-256-cbc's in 0.32s
Doing aes-256-cbc for 3s on 1024 size blocks: 811284 aes-256-cbc's in 0.13s
Doing aes-256-cbc for 3s on 8192 size blocks: 163126 aes-256-cbc's in 0.05s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 96983.15k 360977.61k 1133238.12k 6646038.53k 24435715.51kTry two:
Doing aes-256-cbc for 3s on 16 size blocks: 1727740 aes-256-cbc's in 0.41s
Doing aes-256-cbc for 3s on 64 size blocks: 1742973 aes-256-cbc's in 0.38s
Doing aes-256-cbc for 3s on 256 size blocks: 1414059 aes-256-cbc's in 0.29s
Doing aes-256-cbc for 3s on 1024 size blocks: 815243 aes-256-cbc's in 0.13s
Doing aes-256-cbc for 3s on 8192 size blocks: 163008 aes-256-cbc's in 0.01s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 68046.38k 291396.63k 1252321.22k 6285619.44k 170926276.61k -
Thanks all for the results, keep them coming!
Here are the results so far:
170926276.61k gen 5 i5 91090845.70k Zotac ZBOX ID92 Core i5 4570T 29080158.21k hp microserver gen 8 Xeon 1265Lv2 24435715.51k gen 5 i5 24345837.57k Lanner FW-7525D Quad-core Atom C2558 @ 2.40GHz 19462619.14k SuperMicro 2758 18390712.32k AM1 Athlon 5370 14241549.52k pfSense SG-2440 Dual-core Atom C2358 @ 1.74GHz 7123763.20k Raspberry Pi 3 ARMv7l
-
AR15USR,
There doesn't seem any difference in your tests. Can you try running without the "-evp" option?
openssl speed aes-256-cbc
-
Koenig,
Can you provide the make and model of your "gen 5 i5"?
-
Here's an updated list of results:
170926276.61k gen 5 i5 91090845.70k Zotac ZBOX ID92 Core i5 4570T 42008576.00k Gigabyte GA-N3150N-D3V board Celeron N3150 with AES-NI https://forum.pfsense.org/index.php?topic=108119.0 29080158.21k hp microserver gen 8 Xeon 1265Lv2 27986842.97k Gigabyte GA-N3150N-D3V Celeron N3150 with AES-NI https://forum.pfsense.org/index.php?topic=105114.msg601520#msg601520 24435715.51k gen 5 i5 24345837.57k Lanner FW-7525D Quad-core Atom C2558 @ 2.40GHz 19462619.14k SuperMicro 2758 18390712.32k AM1 Athlon 5370 14241549.52k pfSense SG-2440 Dual-core Atom C2358 @ 1.74GHz 7123763.20k Raspberry Pi 3 ARMv7l 405686.95k Intel i7-4510U + 2x Intel 82574 + 2x Intel i350 Mini-ITX Build https://forum.pfsense.org/index.php?topic=115627.msg646395#msg646395 230708.57k ci323 nano u Celeron N3150 with AES-NI w/ -engine cryptodev https://forum.pfsense.org/index.php?topic=115673.msg656602#msg656602 217617.75k RCC-VE 2440 Intel Atom C2358 https://forum.pfsense.org/index.php?topic=91974.0 124788.74k ALIX.APU2B4/APU2C4 1 GHz Quad Core AMD GX-412TC http://wiki.ipfire.org/en/hardware/pcengines/apu2b4 34204.33k ALIX.APU1C/APU1D 1 GHz Dual Core AMD G-T40E http://wiki.ipfire.org/en/hardware/pcengines/apu1c
-
Koenig,
Can you provide the make and model of your "gen 5 i5"?
There's no brand or model on it…
Something like this: https://www.aliexpress.com/item/Fanless-PC-Intel-NUC-Core-i7-5500u-i5-5257u-Iris-6100-Barebone-Mini-PC-Windows-2HDMI/32755490163.html?spm=2114.01010208.3.100.Dtd346&ws_ab_test=searchweb0_0,searchweb201602_2_10091_10090_10088_10089,searchweb201603_1&btsid=6d47dcd0-df75-47e8-84cf-86813f160f8e
Some more results:
aes-256-cbc 99810.65k 375805.41k 1454872.58k 4844784.55k 28507460.95k
aes-256-cbc 62518.77k 350371.84k 1217122.52k 5055197.38k 34182738.74k
aes-256-cbc 76404.78k 341786.43k 1224697.10k 4425564.16k 34284240.90k
aes-256-cbc 91091.47k 242748.12k 1191453.72k 5068092.37k 85483061.25k
aes-256-cbc 100148.30k 299186.69k 1330803.04k 6668591.10k 86076555.26k
aes-256-cbc 105877.45k 377916.58k 1538361.48k 6694084.61k 57179897.86k
aes-256-cbc 84355.12k 320069.81k 1420017.17k 6647087.10k 57598978.73k
aes-256-cbc 106102.67k 260300.35k 1792681.83k 9638188.87k 34206646.27k
All from the same machine.
-
14241549.52k pfSense SG-2440 Dual-core Atom C2358 @ 1.74GHz
217617.75k RCC-VE 2440 Intel Atom C2358 https://forum.pfsense.org/index.php?topic=91974.0Obviously something off there.
-
First my system details -
System: Netgate SG-4860
Version: 2.3.2-RELEASE-p1 (amd64) built on Fri Sep 30 14:36:56 CDT 2016 FreeBSD 10.3-RELEASE-p9
CPU Type: Intel(R) Atom(TM) CPU C2558 @ 2.40GHz 4 CPUs: 1 package(s) x 4 core(s)
Hardware crypto: AES-CBC,AES-XTS,AES-GCM,AES-ICMResults (system pretty active so possibility for skewed results) -
[2.3.2-RELEASE][admin@pfSense.localdomain]/root: openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 984814 aes-256-cbc's in 0.35s
Doing aes-256-cbc for 3s on 64 size blocks: 920037 aes-256-cbc's in 0.30s
Doing aes-256-cbc for 3s on 256 size blocks: 759776 aes-256-cbc's in 0.26s
Doing aes-256-cbc for 3s on 1024 size blocks: 452100 aes-256-cbc's in 0.15s
Doing aes-256-cbc for 3s on 8192 size blocks: 92821 aes-256-cbc's in 0.03s
OpenSSL 1.0.1s-freebsd 1 Mar 2016
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 44819.98k 193254.95k 754434.54k 3118823.75k 24332468.22k -
AR15USR,
There doesn't seem any difference in your tests. Can you try running without the "-evp" option?
openssl speed aes-256-cbc
/root: openssl speed aes-256-cbc Doing aes-256 cbc for 3s on 16 size blocks: 5517180 aes-256 cbc's in 3.01s Doing aes-256 cbc for 3s on 64 size blocks: 1544753 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 256 size blocks: 399657 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 1024 size blocks: 258521 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 8192 size blocks: 32712 aes-256 cbc's in 2.99s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256 cbc 29348.53k 32954.73k 34104.06k 88241.83k 89558.79k
For comparison:
/root: openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 957210 aes-256-cbc's in 0.39s Doing aes-256-cbc for 3s on 64 size blocks: 893869 aes-256-cbc's in 0.24s Doing aes-256-cbc for 3s on 256 size blocks: 751299 aes-256-cbc's in 0.27s Doing aes-256-cbc for 3s on 1024 size blocks: 450002 aes-256-cbc's in 0.10s Doing aes-256-cbc for 3s on 8192 size blocks: 92472 aes-256-cbc's in 0.02s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 39207.32k 236212.09k 724075.46k 4537127.86k 32321306.62k
-
I have no idea what it means, and how good or bad output is, as i do not understand this, but, i thought lets try on my box :)
any good?
[2.3.2-RELEASE][admin@pfSense]/root: openssl speed aes-256-cbc Doing aes-256 cbc for 3s on 16 size blocks: 12830479 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 64 size blocks: 3389641 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 256 size blocks: 858407 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 1024 size blocks: 217919 aes-256 cbc's in 3.03s Doing aes-256 cbc for 3s on 8192 size blocks: 27176 aes-256 cbc's in 3.02s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256 cbc 68429.22k 72312.34k 73250.73k 73616.18k 73633.34k
[2.3.2-RELEASE][admin@pfSense]/root: openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 77185949 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 20190084 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 5139740 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 1024 size blocks: 1286608 aes-256-cbc's in 3.02s Doing aes-256-cbc for 3s on 8192 size blocks: 160088 aes-256-cbc's in 3.00s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 411658.39k 430721.79k 435191.22k 436886.75k 437146.97k
tnx
-
~~Means you don't have AES-NI or it is disabled or ?
The first 3 secs indicates clock time. The second time interval indicates CPU time. Note that on the accelerated systems they are performing operations on more data in < 1/10 the CPU time.~~ Don't listen to that guy.
-
SuperMicro with Intel N3700. Not bad for a 6W CPU (System pulls 11 Watts from the wall).
$ openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 991459 aes-256-cbc's in 0.25s
Doing aes-256-cbc for 3s on 64 size blocks: 971848 aes-256-cbc's in 0.26s
Doing aes-256-cbc for 3s on 256 size blocks: 785303 aes-256-cbc's in 0.28s
Doing aes-256-cbc for 3s on 1024 size blocks: 393543 aes-256-cbc's in 0.16s
Doing aes-256-cbc for 3s on 8192 size blocks: 92318 aes-256-cbc's in 0.02s
OpenSSL 1.0.1l-freebsd 15 Jan 2015
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 63453.38k 241253.90k 714800.24k 2579123.40k 32267479.72k -
@Koenig: thanks, I've labelled it as Unknown(China)
@Derelict: I'm taking an indiscriminate method and keeping all data points provided. It might be due to the OS version, random timing, etc.
@bytesizedalex: thanks and added to the list
@AR15USR: as I suspected - OpenSSL -evp determines itself whether AES-NI is present and uses it - doesn't matter what you set in pfsense.
@NEK4TE: can you provide what your box & CPU are?
@Engineer: which Supermicro box is it?
-
Updated results list:
170926276.61k unknown (China) gen 5 i5 91090845.70k Zotac ZBOX ID92 Core i5 4570T 42008576.00k Gigabyte GA-N3150N-D3V board Celeron N3150 with AES-NI https://forum.pfsense.org/index.php?topic=108119.0 32321306.62k SuperMicro 2758 32267479.72k Supermicro Intel N3700 29080158.21k hp microserver gen 8 Xeon 1265Lv2 27986842.97k Gigabyte GA-N3150N-D3V Celeron N3150 with AES-NI https://forum.pfsense.org/index.php?topic=105114.msg601520#msg601520 24435715.51k unknown (China) gen 5 i5 24345837.57k Lanner FW-7525D Quad-core Atom C2558 @ 2.40GHz 24332468.22k Netgate SG-4860 Intel(R) Atom(TM) CPU C2558 @ 2.40GHz 4 CPUs 19462619.14k SuperMicro 2758 18390712.32k AM1 Athlon 5370 14241549.52k pfSense SG-2440 Dual-core Atom C2358 @ 1.74GHz 7123763.20k Raspberry Pi 3 ARMv7l 405686.95k Intel i7-4510U + 2x Intel 82574 + 2x Intel i350 Mini-ITX Build https://forum.pfsense.org/index.php?topic=115627.msg646395#msg646395 230708.57k ci323 nano u Celeron N3150 with AES-NI w/ -engine cryptodev https://forum.pfsense.org/index.php?topic=115673.msg656602#msg656602 217617.75k RCC-VE 2440 Intel Atom C2358 https://forum.pfsense.org/index.php?topic=91974.0 124788.74k ALIX.APU2B4/APU2C4 1 GHz Quad Core AMD GX-412TC http://wiki.ipfire.org/en/hardware/pcengines/apu2b4 34204.33k ALIX.APU1C/APU1D 1 GHz Dual Core AMD G-T40E http://wiki.ipfire.org/en/hardware/pcengines/apu1c
-
iorx,
Interested to see you're running the same processor in a Microserver Gen 8 as I do.
Mine is running ESXi 6.0 though and produces slightly different numbers:
[2.3.2-RELEASE] /root: openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 1767436 aes-256-cbc's in 0.38s Doing aes-256-cbc for 3s on 64 size blocks: 1616969 aes-256-cbc's in 0.35s Doing aes-256-cbc for 3s on 256 size blocks: 1308617 aes-256-cbc's in 0.27s Doing aes-256-cbc for 3s on 1024 size blocks: 723750 aes-256-cbc's in 0.13s Doing aes-256-cbc for 3s on 8192 size blocks: 143766 aes-256-cbc's in 0.01s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 75410.60k 294360.22k 1225164.62k 5580197.65k 150749577.22k
Hi!
For fun or reference :). A Hyper-v hosted pfsense on a hp microserver gen 8 with a Xeon 1265Lv2.
[2.3.2-RELEASE][n23]/root: openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 1084848 aes-256-cbc's in 0.45s Doing aes-256-cbc for 3s on 64 size blocks: 1345250 aes-256-cbc's in 0.24s Doing aes-256-cbc for 3s on 256 size blocks: 709374 aes-256-cbc's in 0.23s Doing aes-256-cbc for 3s on 1024 size blocks: 472042 aes-256-cbc's in 0.19s Doing aes-256-cbc for 3s on 8192 size blocks: 110932 aes-256-cbc's in 0.03s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 38978.40k 355493.16k 774825.57k 2577978.71k 29080158.21k
-
@Engineer: which Supermicro box is it?
SuperMicro Board: X11SBA-LN4F with Intel N3700.
Running 2.2.5 and whatever FreeBSD version comes with it but not sure if there have been improvements in the newer versions or not.
Just re-ran the test with nobody using the Internet (wife and two kids on Facebook, snapchat, youtube, etc. really change the results) and got this….
$ openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 951002 aes-256-cbc's in 0.28s
Doing aes-256-cbc for 3s on 64 size blocks: 961593 aes-256-cbc's in 0.26s
Doing aes-256-cbc for 3s on 256 size blocks: 770095 aes-256-cbc's in 0.23s
Doing aes-256-cbc for 3s on 1024 size blocks: 454015 aes-256-cbc's in 0.14s
Doing aes-256-cbc for 3s on 8192 size blocks: 92419 aes-256-cbc's in 0.02s
OpenSSL 1.0.1l-freebsd 15 Jan 2015
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 54101.45k 238708.18k 870154.24k 3306036.34k 48454172.67kMakes more sense compared to the N3150 in the chart now.
-
@aesguy, here's the stats on my board if you want them:
Intel(R) Atom(TM) CPU C2758 @ 2.40GHz 8 CPUs/root: openssl speed -evp aes-256-cbc Doing aes-256-cbc for 3s on 16 size blocks: 944591 aes-256-cbc's in 0.33s Doing aes-256-cbc for 3s on 64 size blocks: 888807 aes-256-cbc's in 0.26s Doing aes-256-cbc for 3s on 256 size blocks: 743989 aes-256-cbc's in 0.23s Doing aes-256-cbc for 3s on 1024 size blocks: 445355 aes-256-cbc's in 0.11s Doing aes-256-cbc for 3s on 8192 size blocks: 92224 aes-256-cbc's in 0.02s OpenSSL 1.0.1s-freebsd 1 Mar 2016 built on: date not available options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 46060.06k 220639.60k 840656.26k 4169540.75k 48351936.51k
-
I'm amazed that nobody has pointed out yet that most of these results are COMPLETELY BOGUS. If you have an openssl speed test result based on a time of less than 3 seconds, your result is invalid. What's happening is that openssl by default bases its time on the cpu time registered to the ssl process rather than the elapsed time, because when using software encryption on a loaded system you may not get 100% of the cpu and using the cpu time figure gives a better accounting of the work actually done. But when using the freebsd crypto device most of the work is done in kernel space rather than user space, so the cpu time measurement consists entirely of the time spent making system calls. BUT YOU DID NOT ACTUALLY GET THREE SECONDS OF COMPUTATION DONE IN .01 SECONDS!!!! If using the freebsd crypto device you MUST add -elapsed to the command line to get a better idea of the real performance. If you do not, you are basing your conclusions on a meaningless number.
A simple sanity check will conclude that many (most?) of the results listed here suggest that the machines are performing crypto at a rate greater than their theoretical peak performance (based on the number of operations performed * clock rate of the machine). Any result that shows 170GB/s of work performed by a commodity PC is OBVIOUSLY INCORRECT. A report of 48GByte/s on an atom with a 25GB/s memory implementation is OBVIOUSLY INCORRECT.
It's been hard to get people to stop using the freebsd crypto interface because they really, really want these numbers to be true. But if you compare openssl performance with and without cryptodev on an AES-NI system USING THE REAL NUMBERS you'll find that cryptodev is slower than openssl's native AES-NI (it basically has to be, because they're doing the same crypto operations, but the kernel module has a penalty for going into an out of kernel space).
The real fastest implementation of AES-NI that I'm aware of is with AES GCM on the skylake core, where you should see somewhere in the neighborhood of 6GByte/s/core depending on the clock speed. (Yes, a commodity skylake desktop will completely stomp a broadwell xeon; can't wait to see the skylake xeons.) The GCM implementation on the later intel cores is significantly faster than CBC at larger block sizes when PCLMULQDQ is available.
-
VAMike said:
"But when using the freebsd crypto device most of the work is done in kernel space rather than user space, so the cpu time measurement consists entirely of the time spent making system calls."
It does not appear that the crypto device is being used - OpenSSL invokes the appropriate CPU instructions directly. For example, on ARMv8, the AESE instruction is invoked directly: https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aesv8-armx.pl
Secondly, we see evidence to support this - it matters not whether you set AES-NI in pfsense but rather does matter whether you invoke openssl with "-evp" or not.
I am not convinced that your assumption about kernel vs userland is valid. And therefore that these numbers are not as meaningless as you think.