OpenSSL & QAT
-
Is it expected that OpenSSL can't use QAT on supported Plus hardware? Here's my setup:
Netgate SG-2440, pfSense Plus 21.02-RELEASE-p1 (amd64)
Under pfSense settings System->Miscellaneous I have Intel QuickAssist enabled (and I've rebooted to be sure.)
The QAT driver is loaded.
# kldstat Id Refs Address Size Name 1 19 0xffffffff80200000 3aedcb0 kernel 2 1 0xffffffff83f21000 1000 cpuctl.ko 3 1 0xffffffff83f22000 146e0 qat.ko 4 1 0xffffffff83f37000 40336 qat_c2xxxfw.ko 5 1 0xffffffff83f78000 b28 coretemp.ko 6 1 0xffffffff83f79000 8cd0 aesni.ko 7 1 0xffffffff83f82000 37f8 cryptodev.ko
But OpenSSL doesn't see the QAT engine.
# openssl engine (devcrypto) /dev/crypto engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support
It isn't present on the filesystem either.
# ls -l /usr/lib/engines/ total 44 -r--r--r-- 1 root wheel 27288 Feb 22 10:17 ateccx08.so -r--r--r-- 1 root wheel 4040 Feb 22 10:17 capi.so -r--r--r-- 1 root wheel 8392 Feb 22 10:17 padlock.so
Thanks.
-
It should use it through
devcrypto
, not directly like it does with AES-NI. -
@jimp Thanks for the quick response. I tried some of the QAT-accelerated operations mentioned by Intel. The speeds are the same across runs.
root@host:~ # openssl speed -engine rdrand rsa2048 engine "rdrand" set. Doing 2048 bits private rsa's for 10s: 1431 2048 bits private RSA's in 9.68s Doing 2048 bits public rsa's for 10s: 49840 2048 bits public RSA's in 9.80s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang sign verify sign/s verify/s rsa 2048 bits 0.006764s 0.000197s 147.8 5083.3 root@host:~ # openssl speed -engine devcrypto rsa2048 engine "devcrypto" set. Doing 2048 bits private rsa's for 10s: 1454 2048 bits private RSA's in 9.81s Doing 2048 bits public rsa's for 10s: 50498 2048 bits public RSA's in 9.89s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang sign verify sign/s verify/s rsa 2048 bits 0.006749s 0.000196s 148.2 5105.6 root@host:~ # openssl speed -engine rdrand -async_jobs 8 rsa2048 engine "rdrand" set. Doing 2048 bits private rsa's for 10s: 1448 2048 bits private RSA's in 9.74s Doing 2048 bits public rsa's for 10s: 50138 2048 bits public RSA's in 9.84s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang sign verify sign/s verify/s rsa 2048 bits 0.006728s 0.000196s 148.6 5093.4 root@host:~ # openssl speed -engine devcrypto -async_jobs 8 rsa2048 engine "devcrypto" set. Doing 2048 bits private rsa's for 10s: 1457 2048 bits private RSA's in 9.84s Doing 2048 bits public rsa's for 10s: 50398 2048 bits public RSA's in 9.88s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang sign verify sign/s verify/s rsa 2048 bits 0.006751s 0.000196s 148.1 5099.6 root@host:~ # openssl speed -engine rdrand ecdhx25519 engine "rdrand" set. Doing 253 bits ecdh's for 10s: 27718 253-bits ECDH ops in 9.74s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang op op/s 253 bits ecdh (X25519) 0.0004s 2845.2 root@host:~ # openssl speed -engine devcrypto ecdhx25519 engine "devcrypto" set. Doing 253 bits ecdh's for 10s: 27362 253-bits ECDH ops in 9.67s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang op op/s 253 bits ecdh (X25519) 0.0004s 2829.0 root@host:~ # openssl speed -engine rdrand -evp aes-128-gcm engine "rdrand" set. Doing aes-128-gcm for 3s on 16 size blocks: 13577226 aes-128-gcm's in 2.80s Doing aes-128-gcm for 3s on 64 size blocks: 7394730 aes-128-gcm's in 2.99s Doing aes-128-gcm for 3s on 256 size blocks: 2674064 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 721573 aes-128-gcm's in 2.88s Doing aes-128-gcm for 3s on 8192 size blocks: 98022 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 16384 size blocks: 46999 aes-128-gcm's in 2.88s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 77454.48k 158166.13k 228186.79k 257005.48k 267665.41k 267111.24k root@host:~ # openssl speed -engine devcrypto -evp aes-128-gcm engine "devcrypto" set. Doing aes-128-gcm for 3s on 16 size blocks: 14191873 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 7190949 aes-128-gcm's in 2.91s Doing aes-128-gcm for 3s on 256 size blocks: 2673595 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 757603 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 93181 aes-128-gcm's in 2.88s Doing aes-128-gcm for 3s on 16384 size blocks: 49169 aes-128-gcm's in 3.00s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 75689.99k 157930.98k 228146.77k 258595.16k 265509.13k 268528.30k```
-
Just to be sure, I disabled QAT entirely and rebooted. Same numbers. Seems like this doesn't work without
the compiled QAT engine for openssl to load directly.root@host:~ # kldstat Id Refs Address Size Name 1 8 0xffffffff80200000 3aedcb0 kernel 2 1 0xffffffff83f21000 1000 cpuctl.ko 3 1 0xffffffff83f22000 8cd0 aesni.ko 4 1 0xffffffff83f2b000 b28 coretemp.ko root@host:~ # openssl engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support root@host:~ # openssl speed rsa2048 Doing 2048 bits private rsa's for 10s: 1314 2048 bits private RSA's in 9.05s Doing 2048 bits public rsa's for 10s: 50573 2048 bits public RSA's in 9.92s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang sign verify sign/s verify/s rsa 2048 bits 0.006891s 0.000196s 145.1 5097.1 root@host:~ # openssl speed ecdhx25519 Doing 253 bits ecdh's for 10s: 25317 253-bits ECDH ops in 9.01s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang op op/s 253 bits ecdh (X25519) 0.0004s 2810.6 root@host:~ # openssl speed -evp aes-128-gcm Doing aes-128-gcm for 3s on 16 size blocks: 11912348 aes-128-gcm's in 2.48s Doing aes-128-gcm for 3s on 64 size blocks: 5182524 aes-128-gcm's in 2.19s Doing aes-128-gcm for 3s on 256 size blocks: 2541379 aes-128-gcm's in 2.92s Doing aes-128-gcm for 3s on 1024 size blocks: 699205 aes-128-gcm's in 2.81s Doing aes-128-gcm for 3s on 8192 size blocks: 70298 aes-128-gcm's in 2.20s Doing aes-128-gcm for 3s on 16384 size blocks: 36594 aes-128-gcm's in 2.27s OpenSSL 1.1.1i-freebsd 8 Dec 2020 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 76718.52k 151625.85k 222662.85k 254572.77k 261392.89k 263722.27k
-
Confirming the same results on 21.05. Would be great to have full implementation of QAT via OpenSSL so more crypto tasks could be offloaded (SSH, HAProxy, OpenVPN, etc...). Even Wireguard eventually as chachapoly is currently being integrated into dpdk for QAT offload.
[21.05-DEVELOPMENT][root@gw01]/: kldstat Id Refs Address Size Name 1 25 0xffffffff80200000 3aebf68 kernel 2 1 0xffffffff83cec000 3bbb70 zfs.ko 3 2 0xffffffff840a8000 a448 opensolaris.ko 4 1 0xffffffff844e6000 1000 cpuctl.ko 5 1 0xffffffff844e7000 146e0 qat.ko 6 1 0xffffffff844fc000 9f521 qat_c3xxxfw.ko 7 1 0xffffffff8459c000 b28 coretemp.ko 8 1 0xffffffff8459d000 37f8 cryptodev.ko [21.05-DEVELOPMENT][root@gw01]/: openssl engine (devcrypto) /dev/crypto engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support [21.05-DEVELOPMENT][root@gw01]/: openssl speed rsa2048 Doing 2048 bits private rsa's for 10s: 3651 2048 bits private RSA's in 9.84s Doing 2048 bits public rsa's for 10s: 125823 2048 bits public RSA's in 9.84s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang sign verify sign/s verify/s rsa 2048 bits 0.002694s 0.000078s 371.2 12792.2 [21.05-DEVELOPMENT][root@gw01]/: openssl speed ecdhx25519 Doing 253 bits ecdh's for 10s: 77026 253-bits ECDH ops in 9.85s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang op op/s 253 bits ecdh (X25519) 0.0001s 7818.7 [21.05-DEVELOPMENT][root@gw01]/: openssl speed -evp aes-128-gcm Doing aes-128-gcm for 3s on 16 size blocks: 32281714 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 18676449 aes-128-gcm's in 2.70s Doing aes-128-gcm for 3s on 256 size blocks: 8272172 aes-128-gcm's in 2.90s Doing aes-128-gcm for 3s on 1024 size blocks: 2564473 aes-128-gcm's in 2.98s Doing aes-128-gcm for 3s on 8192 size blocks: 336340 aes-128-gcm's in 2.99s Doing aes-128-gcm for 3s on 16384 size blocks: 160840 aes-128-gcm's in 2.84s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 172169.14k 443470.93k 730626.77k 879923.05k 920830.42k 926664.64k [21.05-DEVELOPMENT][root@gw01]/: openssl speed -evp aes-128-gcm Doing aes-128-gcm for 3s on 16 size blocks: 32255868 aes-128-gcm's in 2.99s Doing aes-128-gcm for 3s on 64 size blocks: 20729035 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 8243093 aes-128-gcm's in 2.88s Doing aes-128-gcm for 3s on 1024 size blocks: 2532296 aes-128-gcm's in 2.95s Doing aes-128-gcm for 3s on 8192 size blocks: 336769 aes-128-gcm's in 2.99s Doing aes-128-gcm for 3s on 16384 size blocks: 168888 aes-128-gcm's in 2.98s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 172480.46k 442219.41k 732004.53k 878076.99k 922004.94k 927182.74k [21.05-DEVELOPMENT][root@gw01]/: clear [21.05-DEVELOPMENT][root@gw01]/: openssl engine (devcrypto) /dev/crypto engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support [21.05-DEVELOPMENT][root@gw01]/: openssl speed -engine rdrand rsa2048 engine "rdrand" set. Doing 2048 bits private rsa's for 10s: 3630 2048 bits private RSA's in 9.79s Doing 2048 bits public rsa's for 10s: 125602 2048 bits public RSA's in 9.83s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang sign verify sign/s verify/s rsa 2048 bits 0.002697s 0.000078s 370.8 12779.9 [21.05-DEVELOPMENT][root@gw01]/: openssl speed -engine devcrypto rsa2048 engine "devcrypto" set. Doing 2048 bits private rsa's for 10s: 3638 2048 bits private RSA's in 9.80s Doing 2048 bits public rsa's for 10s: 125636 2048 bits public RSA's in 9.84s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang sign verify sign/s verify/s rsa 2048 bits 0.002695s 0.000078s 371.0 12773.2 [21.05-DEVELOPMENT][root@gw01]/: openssl speed -engine rdrand -async_jobs 8 rsa2048 engine "rdrand" set. Doing 2048 bits private rsa's for 10s: 3543 2048 bits private RSA's in 9.56s Doing 2048 bits public rsa's for 10s: 124895 2048 bits public RSA's in 9.80s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang sign verify sign/s verify/s rsa 2048 bits 0.002699s 0.000079s 370.5 12738.3 [21.05-DEVELOPMENT][root@gw01]/: openssl speed -engine devcrypto -async_jobs 8 rsa2048 engine "devcrypto" set. Doing 2048 bits private rsa's for 10s: 3648 2048 bits private RSA's in 9.84s Doing 2048 bits public rsa's for 10s: 125619 2048 bits public RSA's in 9.83s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang sign verify sign/s verify/s rsa 2048 bits 0.002696s 0.000078s 370.9 12781.6 [21.05-DEVELOPMENT][root@gw01]/: openssl speed -engine rdrand ecdhx25519 engine "rdrand" set. Doing 253 bits ecdh's for 10s: 74337 253-bits ECDH ops in 9.52s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang op op/s 253 bits ecdh (X25519) 0.0001s 7805.7 [21.05-DEVELOPMENT][root@gw01]/: openssl speed -engine devcrypto ecdhx25519 engine "devcrypto" set. Doing 253 bits ecdh's for 10s: 75745 253-bits ECDH ops in 9.69s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang op op/s 253 bits ecdh (X25519) 0.0001s 7818.8 [21.05-DEVELOPMENT][root@gw01]/: openssl speed -engine rdrand -evp aes-128-gcm engine "rdrand" set. Doing aes-128-gcm for 3s on 16 size blocks: 30782618 aes-128-gcm's in 2.86s Doing aes-128-gcm for 3s on 64 size blocks: 20747977 aes-128-gcm's in 2.99s Doing aes-128-gcm for 3s on 256 size blocks: 8580626 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 2572191 aes-128-gcm's in 2.99s Doing aes-128-gcm for 3s on 8192 size blocks: 320648 aes-128-gcm's in 2.85s Doing aes-128-gcm for 3s on 16384 size blocks: 169422 aes-128-gcm's in 2.98s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 172248.09k 443779.18k 732213.42k 880266.89k 921161.09k 930114.36k [21.05-DEVELOPMENT][root@gw01]/: openssl speed -engine devcrypto -evp aes-128-gcm engine "devcrypto" set. Doing aes-128-gcm for 3s on 16 size blocks: 32215536 aes-128-gcm's in 2.99s Doing aes-128-gcm for 3s on 64 size blocks: 19703584 aes-128-gcm's in 2.85s Doing aes-128-gcm for 3s on 256 size blocks: 8581766 aes-128-gcm's in 2.99s Doing aes-128-gcm for 3s on 1024 size blocks: 2575100 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 320421 aes-128-gcm's in 2.84s Doing aes-128-gcm for 3s on 16384 size blocks: 169559 aes-128-gcm's in 3.00s OpenSSL 1.1.1k-freebsd 25 Mar 2021 built on: reproducible build, date unspecified options:bn(64,64) rc4(16x,int) des(int) aes(partial) idea(int) blowfish(ptr) compiler: clang The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 172264.80k 442224.00k 734222.74k 878967.47k 923037.83k 926018.22k [21.05-DEVELOPMENT][root@gw01]/:
-
Should we file this as a bug report?
-
-
@jimp I see some changes have been made related to this area in 21.05, but this issue still occurs.
-
@ensnare +1 when I purchased the 6100 I assumed QAT (as long as enabled) would be working for OpenVPN....
-
It will work with DCO if you're able to try that. That using the kernel crypto framework and the QAT driver is available there.
Steve