Openssl AES-NI benchmark bug?



  • Hi,

    We run pfsense on several networks now. Cheers to the devs!

    I've been keeping track of AES-NI perf for various platforms as we go. On one network we run a pair of the SG-2440's: https://www.pfsense.org/products/product-family.html#sg-2440

    It has blown the doors off anything I've ever tested. It did over 15GBYTES/s for AES-128CBC with 8192 size blocks with the -evp flag. However, looking more closely, it says:

    Doing aes-128-cbc for 3s on 8192 size blocks: 87407 aes-128-cbc's in 0.05s

    Shouldn't it run for 3s, not 0.05s?

    All the benchmarks that I've run for other platforms have run for the full 3s. I assume this must be a bug in openssl? I'd like to know for sure the actual AES CBC, GCM, etc. perf of this box and others, and I don't know whom to ask about digging into this to find out? Anybody know where to point me?

    For your reference, here's the full test result and some of the other boxes i've tested:

    Intel(R) Atom(TM) CPU C2358 @ 1.74GHz (testnet PFSense)[edit]
    [2.2.4-RELEASE][admin@fw1. … ]/root: openssl speed aes-128-cbc
    Doing aes-128 cbc for 3s on 16 size blocks: 5470659 aes-128 cbc's in 2.93s
    Doing aes-128 cbc for 3s on 64 size blocks: 1436850 aes-128 cbc's in 2.74s
    Doing aes-128 cbc for 3s on 256 size blocks: 406790 aes-128 cbc's in 2.97s
    Doing aes-128 cbc for 3s on 1024 size blocks: 256634 aes-128 cbc's in 2.99s
    Doing aes-128 cbc for 3s on 8192 size blocks: 24238 aes-128 cbc's in 2.23s
    OpenSSL 1.0.1l-freebsd 15 Jan 2015
    built on: date not available
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: clang
    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128 cbc      29877.09k    33534.69k    35078.14k    87826.45k    89176.79k
    [2.2.4-RELEASE][admin@fw1. …]/root: openssl speed -evp AES128
    Doing aes-128-cbc for 3s on 16 size blocks: 530644 aes-128-cbc's in 0.27s
    Doing aes-128-cbc for 3s on 64 size blocks: 643047 aes-128-cbc's in 0.35s
    Doing aes-128-cbc for 3s on 256 size blocks: 501377 aes-128-cbc's in 0.23s
    Doing aes-128-cbc for 3s on 1024 size blocks: 357574 aes-128-cbc's in 0.21s
    Doing aes-128-cbc for 3s on 8192 size blocks: 87407 aes-128-cbc's in 0.05s
    OpenSSL 1.0.1l-freebsd 15 Jan 2015
    built on: date not available
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: clang
    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128-cbc      31050.25k  117063.13k  547637.38k  1735849.60k 15275480.41k

    Intel(R) Atom(TM) CPU E3825 @ 1.33GHz (minnowboard max rev b)[edit]
    minnow@minnow:~$ openssl speed aes-128-cbc
    Doing aes-128 cbc for 3s on 16 size blocks: 4151351 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 64 size blocks: 1202836 aes-128 cbc's in 3.01s
    Doing aes-128 cbc for 3s on 256 size blocks: 313375 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 1024 size blocks: 196386 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 8192 size blocks: 25012 aes-128 cbc's in 3.01s
    OpenSSL 1.0.1k 8 Jan 2015
    built on: Fri Jun 12 18:48:03 2015
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,–noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128 cbc      22140.54k    25575.25k    26741.33k    67033.09k    68072.53k
    minnow@minnow:~$ openssl speed -evp AES128
    Doing aes-128-cbc for 3s on 16 size blocks: 21417497 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 8464074 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 2534467 aes-128-cbc's in 3.01s
    Doing aes-128-cbc for 3s on 1024 size blocks: 666627 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 84544 aes-128-cbc's in 3.00s
    OpenSSL 1.0.1k 8 Jan 2015
    built on: Fri Jun 12 18:48:03 2015
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128-cbc    114226.65k  180566.91k  215556.00k  227542.02k  230861.48k

    Intel(R) Xeon(R) CPU X5670 @ 2.93GHz[edit]
    ➜  ~  openssl speed aes-128-cbc
    Doing aes-128 cbc for 3s on 16 size blocks: 16758137 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 64 size blocks: 4843536 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 256 size blocks: 1242016 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 1024 size blocks: 315306 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 8192 size blocks: 39514 aes-128 cbc's in 2.99s
    OpenSSL 1.0.1k 8 Jan 2015
    built on: Fri Jun 12 18:48:03 2015
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,–noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128 cbc      89376.73k  103328.77k  105985.37k  107624.45k  108260.43k
    ➜  ~  openssl speed -evp AES128
    Doing aes-128-cbc for 3s on 16 size blocks: 48365979 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 15012135 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 3886896 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 985474 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 123056 aes-128-cbc's in 3.00s
    OpenSSL 1.0.1k 8 Jan 2015
    built on: Fri Jun 12 18:48:03 2015
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128-cbc    257951.89k  320258.88k  331681.79k  336375.13k  336024.92k

    Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz[edit]
    ➜  ~  openssl speed aes-128-cbc
    Doing aes-128 cbc for 3s on 16 size blocks: 23212288 aes-128 cbc's in 2.99s
    Doing aes-128 cbc for 3s on 64 size blocks: 6228873 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 256 size blocks: 1592634 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 1024 size blocks: 403202 aes-128 cbc's in 2.96s
    Doing aes-128 cbc for 3s on 8192 size blocks: 50351 aes-128 cbc's in 3.00s
    OpenSSL 1.0.1f 6 Jan 2014
    built on: Thu Jun 11 15:28:12 UTC 2015
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector –param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128 cbc    124212.91k  132882.62k  135904.77k  139486.10k  137491.80k
    ➜  ~  openssl speed -evp AES128
    Doing aes-128-cbc for 3s on 16 size blocks: 138490506 aes-128-cbc's in 2.99s
    Doing aes-128-cbc for 3s on 64 size blocks: 37586083 aes-128-cbc's in 2.99s
    Doing aes-128-cbc for 3s on 256 size blocks: 9594697 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 2408292 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 301327 aes-128-cbc's in 3.00s
    OpenSSL 1.0.1f 6 Jan 2014
    built on: Thu Jun 11 15:28:12 UTC 2015
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
    compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128-cbc    741086.32k  804518.16k  818747.48k  822030.34k  822823.59k

    Many thanks in advance!


  • Netgate

    I don't have a 2440 handy, but I do have several quad core @ 2.4Ghz around.

    [2.3-ALPHA][jim@tonkawa-gw.netgate.com]/home/jim: sysctl hw.model kern.version  hw.model: Intel(R) Atom(TM) CPU  C2558  @ 2.40GHz
    kern.version: FreeBSD 10.2-STABLE #164 2e02b14(devel): Thu Nov 12 22:27:20 CST 2015
        root@pfs23-amd64-builder:/usr/home/pfsense/pfsense/tmp/obj/usr/home/pfsense/pfsense/tmp/FreeBSD-src/sys/pfSense
    [2.3-ALPHA][jim@tonkawa-gw.netgate.com]/home/jim: openssl speed aes-128-cbc
    Doing aes-128 cbc for 3s on 16 size blocks: 7406984 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 64 size blocks: 2175159 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 256 size blocks: 563793 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 1024 size blocks: 353416 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 8192 size blocks: 45016 aes-128 cbc's in 3.00s
    OpenSSL 1.0.1p-freebsd 9 Jul 2015
    built on: date not available
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: clang
    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128 cbc      39503.91k    46403.39k    48110.34k  120632.66k  122923.69k

    [2.3-ALPHA][jim@tonkawa-gw.netgate.com]/home/jim: openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 39123806 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 15151715 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 4549847 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 1198668 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 152164 aes-128-cbc's in 3.00s
    OpenSSL 1.0.1p-freebsd 9 Jul 2015
    built on: date not available
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
    compiler: clang
    The 'numbers' are in 1000s of bytes per second processed.
    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
    aes-128-cbc    208660.30k  323236.59k  388253.61k  409145.34k  415509.16k

    (So tempting to post a Minnowboard Turbot result here…)



  • (So tempting to post a Minnowboard Turbot result here…)

    Post it! We can compare against the minnowboard max (in my post). All other platforms in my test were running a bare vanilla debian jessie, btw.



  • Also, thank you for posting your results, it's actually encouraging to see that other atoms on a similar platform perform at a level that I can reason with. To be honest I was almost worried there was something that I was missing and that some subset of new intel procs had magically figured out how to parallelize CBC… lol


Log in to reply