Openssl AES-NI benchmark bug?
-
Hi,
We run pfsense on several networks now. Cheers to the devs!
I've been keeping track of AES-NI perf for various platforms as we go. On one network we run a pair of the SG-2440's: https://www.pfsense.org/products/product-family.html#sg-2440
It has blown the doors off anything I've ever tested. It did over 15GBYTES/s for AES-128CBC with 8192 size blocks with the -evp flag. However, looking more closely, it says:
Doing aes-128-cbc for 3s on 8192 size blocks: 87407 aes-128-cbc's in 0.05s
Shouldn't it run for 3s, not 0.05s?
All the benchmarks that I've run for other platforms have run for the full 3s. I assume this must be a bug in openssl? I'd like to know for sure the actual AES CBC, GCM, etc. perf of this box and others, and I don't know whom to ask about digging into this to find out? Anybody know where to point me?
For your reference, here's the full test result and some of the other boxes i've tested:
Intel(R) Atom(TM) CPU C2358 @ 1.74GHz (testnet PFSense)[edit]
[2.2.4-RELEASE][admin@fw1. … ]/root: openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 5470659 aes-128 cbc's in 2.93s
Doing aes-128 cbc for 3s on 64 size blocks: 1436850 aes-128 cbc's in 2.74s
Doing aes-128 cbc for 3s on 256 size blocks: 406790 aes-128 cbc's in 2.97s
Doing aes-128 cbc for 3s on 1024 size blocks: 256634 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 8192 size blocks: 24238 aes-128 cbc's in 2.23s
OpenSSL 1.0.1l-freebsd 15 Jan 2015
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 29877.09k 33534.69k 35078.14k 87826.45k 89176.79k
[2.2.4-RELEASE][admin@fw1. …]/root: openssl speed -evp AES128
Doing aes-128-cbc for 3s on 16 size blocks: 530644 aes-128-cbc's in 0.27s
Doing aes-128-cbc for 3s on 64 size blocks: 643047 aes-128-cbc's in 0.35s
Doing aes-128-cbc for 3s on 256 size blocks: 501377 aes-128-cbc's in 0.23s
Doing aes-128-cbc for 3s on 1024 size blocks: 357574 aes-128-cbc's in 0.21s
Doing aes-128-cbc for 3s on 8192 size blocks: 87407 aes-128-cbc's in 0.05s
OpenSSL 1.0.1l-freebsd 15 Jan 2015
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 31050.25k 117063.13k 547637.38k 1735849.60k 15275480.41kIntel(R) Atom(TM) CPU E3825 @ 1.33GHz (minnowboard max rev b)[edit]
minnow@minnow:~$ openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 4151351 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 1202836 aes-128 cbc's in 3.01s
Doing aes-128 cbc for 3s on 256 size blocks: 313375 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 196386 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 25012 aes-128 cbc's in 3.01s
OpenSSL 1.0.1k 8 Jan 2015
built on: Fri Jun 12 18:48:03 2015
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,–noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 22140.54k 25575.25k 26741.33k 67033.09k 68072.53k
minnow@minnow:~$ openssl speed -evp AES128
Doing aes-128-cbc for 3s on 16 size blocks: 21417497 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 8464074 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 2534467 aes-128-cbc's in 3.01s
Doing aes-128-cbc for 3s on 1024 size blocks: 666627 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 84544 aes-128-cbc's in 3.00s
OpenSSL 1.0.1k 8 Jan 2015
built on: Fri Jun 12 18:48:03 2015
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 114226.65k 180566.91k 215556.00k 227542.02k 230861.48kIntel(R) Xeon(R) CPU X5670 @ 2.93GHz[edit]
➜ ~ openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 16758137 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 4843536 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 1242016 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 315306 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 39514 aes-128 cbc's in 2.99s
OpenSSL 1.0.1k 8 Jan 2015
built on: Fri Jun 12 18:48:03 2015
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,–noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 89376.73k 103328.77k 105985.37k 107624.45k 108260.43k
➜ ~ openssl speed -evp AES128
Doing aes-128-cbc for 3s on 16 size blocks: 48365979 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 15012135 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 3886896 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 985474 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 123056 aes-128-cbc's in 3.00s
OpenSSL 1.0.1k 8 Jan 2015
built on: Fri Jun 12 18:48:03 2015
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 257951.89k 320258.88k 331681.79k 336375.13k 336024.92kIntel(R) Core(TM) i7-3930K CPU @ 3.20GHz[edit]
➜ ~ openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 23212288 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 64 size blocks: 6228873 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 1592634 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 403202 aes-128 cbc's in 2.96s
Doing aes-128 cbc for 3s on 8192 size blocks: 50351 aes-128 cbc's in 3.00s
OpenSSL 1.0.1f 6 Jan 2014
built on: Thu Jun 11 15:28:12 UTC 2015
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector –param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 124212.91k 132882.62k 135904.77k 139486.10k 137491.80k
➜ ~ openssl speed -evp AES128
Doing aes-128-cbc for 3s on 16 size blocks: 138490506 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 64 size blocks: 37586083 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 256 size blocks: 9594697 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2408292 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 301327 aes-128-cbc's in 3.00s
OpenSSL 1.0.1f 6 Jan 2014
built on: Thu Jun 11 15:28:12 UTC 2015
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 741086.32k 804518.16k 818747.48k 822030.34k 822823.59kMany thanks in advance!
-
I don't have a 2440 handy, but I do have several quad core @ 2.4Ghz around.
[2.3-ALPHA][jim@tonkawa-gw.netgate.com]/home/jim: sysctl hw.model kern.version hw.model: Intel(R) Atom(TM) CPU C2558 @ 2.40GHz
kern.version: FreeBSD 10.2-STABLE #164 2e02b14(devel): Thu Nov 12 22:27:20 CST 2015
root@pfs23-amd64-builder:/usr/home/pfsense/pfsense/tmp/obj/usr/home/pfsense/pfsense/tmp/FreeBSD-src/sys/pfSense
[2.3-ALPHA][jim@tonkawa-gw.netgate.com]/home/jim: openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 7406984 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 2175159 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 563793 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 353416 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 45016 aes-128 cbc's in 3.00s
OpenSSL 1.0.1p-freebsd 9 Jul 2015
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 39503.91k 46403.39k 48110.34k 120632.66k 122923.69k[2.3-ALPHA][jim@tonkawa-gw.netgate.com]/home/jim: openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 39123806 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 15151715 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 4549847 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 1198668 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 152164 aes-128-cbc's in 3.00s
OpenSSL 1.0.1p-freebsd 9 Jul 2015
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 208660.30k 323236.59k 388253.61k 409145.34k 415509.16k(So tempting to post a Minnowboard Turbot result here…)
-
(So tempting to post a Minnowboard Turbot result here…)
Post it! We can compare against the minnowboard max (in my post). All other platforms in my test were running a bare vanilla debian jessie, btw.
-
Also, thank you for posting your results, it's actually encouraging to see that other atoms on a similar platform perform at a level that I can reason with. To be honest I was almost worried there was something that I was missing and that some subset of new intel procs had magically figured out how to parallelize CBC… lol