Aes-ni not working?
-
kldstat:
id refs address size name
1 4 0xc0400000 13a57e8 kernel
2 1 0xc858a000 4000 aesni.kokldstat aesni:
can't load aesni: File existsSo it's already loaded. Try
kldunload aesni
kldload aesnithe output of the latter command indicates if the aesni driver thinks AES-NI is supported by your hw.
[2.1-BETA0][admin@pfSense.localdomain]/root(9): kldunload aesni
[2.1-BETA0][admin@pfSense.localdomain]/root(10): kldstat
Id Refs Address Size Name
1 1 0xc0400000 13a57e8 kernel[2.1-BETA0][admin@pfSense.localdomain]/root(11): kldload aesni
[2.1-BETA0][admin@pfSense.localdomain]/root(12): kldstat
Id Refs Address Size Name
1 4 0xc0400000 13a57e8 kernel
2 1 0xc813c000 4000 aesni.ko -
also:
dmesg | grep -i aes
It may be that the driver isn't attaching to your chip. Your chip may not support AES-NI or it may be a newer chip than the AES-NI driver knows about.
[2.1-BETA0][admin@pfSense.localdomain]/root(13): dmesg | grep -i aes
Features2=0x77bae3ff<sse3,pclmulqdq,dtes64,mon,ds_cpl,vmx,smx,est,tm2,ssse3,cx16,xtpr,pdcm,pcid,sse4.1,sse4.2,x2apic,popcnt,tscdlt,aesni,xsave,avx,f16c,<b30>>
aesni0: <aes-cbc,aes-xts>on motherboard
aesni0: detached
aesni0: <aes-cbc,aes-xts>on motherboard
aesni0: detached
aesni0: <aes-cbc,aes-xts>on motherboard
aesni0: detached
aesni0: <aes-cbc,aes-xts>on motherboard</aes-cbc,aes-xts></aes-cbc,aes-xts></aes-cbc,aes-xts></aes-cbc,aes-xts></sse3,pclmulqdq,dtes64,mon,ds_cpl,vmx,smx,est,tm2,ssse3,cx16,xtpr,pdcm,pcid,sse4.1,sse4.2,x2apic,popcnt,tscdlt,aesni,xsave,avx,f16c,<b30> -
any updates regarding aes-ni not working?
i have a test environment with aes-ni capabilities that i'd be more than happy to let you use for testing.
-
I just registered to leave back my high interest in this topic.
We built our latest internal-test appliance based on xeon e3-1220Lv2 (Dual-Core 2,3 GHz Low-Voltage) which has AES-NI build in.
With the latest 2.1 pfsense we expected ipsec VPN performance with aes-256 reaching 1 Gbit/s.
But in fact we only get around 230 MBit/s, or as use case 27 MB/s with File-Transfers between sites.
Without ipsec we can transfer with almost 1 GBit/s.
With Quad-Core (E3-1260L) we get almost the same.
Cryptostats tells:
79369 symmetric crypto ops (0 errors, 0 times driver blocked)
0 key ops (0 errors, 0 times driver blocked)
0 crypto dispatch thread activations
0 crypto return thread activations–> it's not beeing used. Besides that we have the same output as the previous posters, dmesg reports AES-NI, device/driver is loaded and activated.
Also the performance is exactly the same with aes enabled or not.Which leaves two big "downsides" right now with pfsense and high-performance hardware:
First: ipsec is not multithreaded. It is only using one core, so only Gigahertz matters not core-count. For mixed usage that is ok, e.g. 500.000 Sessions hitting the packet-filter and besides that some 100 Mbit/s VPN Tunnels you get overall good performance. But as site-to-site link we only care for ipsec netto transfer rates.
With two cores we could get 400 Mbit/s with that E3-1220L (CPU Load is 55 percent with 230 MBit/s and 2 Cores).Second: aes-ni not working. With one core and aes-ni I was thinking the calculated performance should be 2 GBit/s.
This can also be verified with vmware and aes-ni capable CPUs, as vmware passes that feature though.
I think this should be focused on, as aes capable cpus will be standard on all systems and this is supported since 2 generations of intel cpus (westmere & sandy bridge).
All other things of pfsense are already more than minimum needed. With ipsec-nat reaching the latest 2.1 this is becoming one big thing to consider at companyside, only missing central managing. -
As far as I know, we still don't have any routers capable of AES-NI in the hands of any developers for testing.
In absence of that, it's going to take some debugging from those that have the hardware.
First step would be to try configuring/using AES-NI on a stock FreeBSD 8.3 image to see if it works for them there.
We are loading the module, which is supposed to be sufficient for actually using it. So the first big question is whether or not we're doing something else in the OS that breaks it, or perhaps it is broken or not configured correctly in the stock OS without our changes.
It's possible that the backporting of AES-NI to FreeBSD 8.3 from 9.x missed something, if that is the case, this probably won't work 100% until we move to a FreeBSD 9.x base. Checking that means comparing the results of the stock FreeBSD 8.3 test with a stock FreeBSD 9.1 test.
-
As i've previously stated, if you want to borrow my test-setup for testing please just pm me.
I can set it up with the snapshot of your choice, and provide a jumphost from which you can reach the physical servers.
-
As i've previously stated, if you want to borrow my test-setup for testing please just pm me.
I can set it up with the snapshot of your choice, and provide a jumphost from which you can reach the physical servers.
Having remote access in this case isn't really all that helpful, it would take a ton of coordination and such to make the tests happen, since it would involve multiple reinstalls of a few different operating systems (pfSense, FreeBSD 8.3, FreeBSD 9.1) and various tests.
Ideally either someone can run the tests directly on their own hardware, or eventually we'll get hardware on hand that supports it.
-
i can install vmware esxi on the hardware… with a jumphost you can do snapshots and reinstall as much as you like. :)
-
Ran across something today that might narrow something down.
Can you run this on your board?
# /usr/bin/openssl engine -t -c # /usr/local/bin/openssl engine -t -c
Also the next round of 1.1 images should have OpenSSL 1.0.1, and from what I've read, that contains better support for AES-NI.
-
sorry for the late reply… i've been very busy.
image: pfSense-memstick-2.1-BETA1-i386-20130130-0420.img
/usr/bin/openssl engine -t -c
(cryptodev) BSD cryptodev engine
[RSA, RSA, DH]
[available]
(padlock) VIA PadLock (no-RNG, no-ACE)
[unavailable]
(dynamic) Dynamic engine loading support
[unavailable]/usr/local/bin/openssl engine -t -c
(cryptodev) BSD cryptodev engine
[RSA, RSA, DH]
[available]
(rdrand) Intel RDRAND engine
[RAND]
[available]
(dynamic) Dynamic engine loading support
[unavailable]
(padlock) VIA PadLock: not supported
[unavailable] -
Is aesni.ko loaded during those tests? (check the output of kldstat)
I would expect to see at least AES-128-CBC in the cryptodev list if it attached, but then again, some others have reported that OpenSSL 1.0.1 did use AES-NI but didn't ever report it as being present, so it may take some more speed tests to tell for sure…
-
i entered the commands in the shell of a fresh image i just bootet up. i haven't configured/enabled anything at all.
if i enter the command "kldload aesni" i get this output:
padlock0: No ACE support
aesni0: AES-CBC,AES-XTS on motherboard -
Does that openssl engine output change after having run the kldload?
-
yes…
/usr/bin/openssl engine -t -c
(cryptodev) BSD cryptodev engine
[RSA, RSA, DH, [b]AES-128-CBC]
[available]
(padlock) VIA PadLock (no-RNG, no-ACE)
[unavailable]
(dynamic) Dynamic engine loading support
[unavailable]/usr/local/bin/openssl engine -t -c
(cryptodev) BSD cryptodev engine
[RSA, RSA, DH,[b] AES-128-CBC, AES-192-CBC, AES-256-CBC]
[available]
(rdrand) Intel RDRAND engine
[RAND]
[available]
(dynamic) Dynamic engine loading support
[unavailable]
(padlock) VIA PadLock: not supported
[unavailable] -
ok, great.
One more thing if you have some time:
1. Reboot so aes-ni is not loaded.
2. Run the following in order:Test speed before
/usr/bin/openssl speed -evp aes-128-cbc -elapsed
/usr/local/bin/openssl speed -evp aes-128-cbc -elapsedLoad AES-NI
kldload aesni
Test OpenSSL with default engine
/usr/bin/openssl speed -evp aes-128-cbc -elapsed
/usr/local/bin/openssl speed -evp aes-128-cbc -elapsedTest OpenSSL with cryptodev engine
/usr/bin/openssl speed -evp aes-128-cbc -elapsed -engine cryptodev
/usr/local/bin/openssl speed -evp aes-128-cbc -elapsed -engine cryptodev -
heres your wall of text. :)
[2.1-BETA1][admin@pfSense.localdomain]/root(1): /usr/bin/openssl speed -evp aes-128-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
To get the most accurate results, try to run this
program when this computer is idle.
Doing aes-128-cbc for 3s on 16 size blocks: 18546805 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 5035121 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 1289095 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 325137 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 40722 aes-128-cbc's in 3.00s
OpenSSL 0.9.8q 2 Dec 2010
built on: date not available
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx)
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: gettimeofday
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 98891.21k 107382.09k 109967.84k 110944.78k 111161.64k[2.1-BETA1][admin@pfSense.localdomain]/root(2): /usr/local/bin/openssl speed -evp aes-128-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 108688414 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 28926457 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 7348512 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 1844550 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 230842 aes-128-cbc's in 3.00s
OpenSSL 1.0.1c 10 May 2012
built on: Sun Jan 27 13:05:44 EST 2013
options:bn(64,32) md2(int) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) idea(int) blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -pthread -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,–noexecstack -DL_ENDIAN -DTERMIOS -O3 -fomit-frame-pointer -Wall -O2 -pipe -fno-strict-aliasing -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 579671.54k 617097.75k 627073.02k 629606.40k 630352.55k[2.1-BETA1][admin@pfSense.localdomain]/root(3): kldload aesni
[2.1-BETA1][admin@pfSense.localdomain]/root(4): /usr/bin/openssl speed -evp aes-128-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
To get the most accurate results, try to run this
program when this computer is idle.
Doing aes-128-cbc for 3s on 16 size blocks: 2725774 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 2507908 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 1925032 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 1029235 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 147766 aes-128-cbc's in 3.00s
OpenSSL 0.9.8q 2 Dec 2010
built on: date not available
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx)
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: gettimeofday
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 14535.69k 53485.26k 164217.56k 351201.58k 403372.36k[2.1-BETA1][admin@pfSense.localdomain]/root(5): /usr/local/bin/openssl speed -evp aes-128-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 2719290 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 2505062 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 1919653 aes-128-cbc's in 3.01s
Doing aes-128-cbc for 3s on 1024 size blocks: 1028277 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 147809 aes-128-cbc's in 3.00s
OpenSSL 1.0.1c 10 May 2012
built on: Sun Jan 27 13:05:44 EST 2013
options:bn(64,32) md2(int) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) idea(int) blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -pthread -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,–noexecstack -DL_ENDIAN -DTERMIOS -O3 -fomit-frame-pointer -Wall -O2 -pipe -fno-strict-aliasing -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 14502.88k 53441.32k 163384.91k 350985.22k 403617.11k[2.1-BETA1][admin@pfSense.localdomain]/root(6): /usr/bin/openssl speed -evp aes-128-cbc -elapsed -engine cryptodev
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
To get the most accurate results, try to run this
program when this computer is idle.
Doing aes-128-cbc for 3s on 16 size blocks: 2721627 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 2516799 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 1926157 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 1029088 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 147941 aes-128-cbc's in 3.00s
OpenSSL 0.9.8q 2 Dec 2010
built on: date not available
options:bn(64,32) md2(int) rc4(idx,int) des(ptr,risc1,16,long) aes(partial) blowfish(idx)
compiler: cc
available timing options: USE_TOD HZ=128 [sysconf value]
timing function used: gettimeofday
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 14514.45k 53674.88k 164313.53k 351151.19k 403847.11k[2.1-BETA1][admin@pfSense.localdomain]/root(7): /usr/local/bin/openssl speed -evp aes-128-cbc -elapsed -engine cryptodev
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 2733266 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 2512115 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 1928735 aes-128-cbc's in 3.01s
Doing aes-128-cbc for 3s on 1024 size blocks: 1031083 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 147874 aes-128-cbc's in 3.00s
OpenSSL 1.0.1c 10 May 2012
built on: Sun Jan 27 13:05:44 EST 2013
options:bn(64,32) md2(int) rc4(8x,mmx) des(ptr,risc1,16,long) aes(partial) idea(int) blowfish(idx)
compiler: cc -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -pthread -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -Wa,–noexecstack -DL_ENDIAN -DTERMIOS -O3 -fomit-frame-pointer -Wall -O2 -pipe -fno-strict-aliasing -DOPENSSL_BN_ASM_PART_WORDS -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DRMD160_ASM -DAES_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 14577.42k 53591.79k 164157.89k 351943.00k 403794.60k -
heres your wall of text. :)
Thanks :-)
Summarizing a little…
@miloman:OpenSSL 0.9.8q, aesni.ko UNloaded:
aes-128-cbc 98891.21k 107382.09k 109967.84k 110944.78k 111161.64kOpenSSL 1.0.1c, aesni.ko UNloaded:
aes-128-cbc 579671.54k 617097.75k 627073.02k 629606.40k 630352.55kOpenSSL 0.9.8q, aesni.ko loaded:
aes-128-cbc 14535.69k 53485.26k 164217.56k 351201.58k 403372.36kOpenSSL 1.0.1c, aesni.ko loaded:
aes-128-cbc 14502.88k 53441.32k 163384.91k 350985.22k 403617.11kOpenSSL 0.9.8q, aesni.ko loaded, cryptodev engine:
aes-128-cbc 14514.45k 53674.88k 164313.53k 351151.19k 403847.11kOpenSSL 1.0.1c, aesni.ko loaded, cryptodev engine
aes-128-cbc 14577.42k 53591.79k 164157.89k 351943.00k 403794.60kIt looks like loading aesni.ko does make it get used, since there is a substantial difference between the base system aesni before and after it is loaded.
Oddly, OpenSSL 1.0.1c without aesni.ko loaded is even faster. I'm not sure if that's somehow linked to OpenSSL's internal aesni support that may be getting dragged down by cryptodev or what.If you repeat that test (just the first two commands), are the results the same each time?
Once aesni.ko is loaded it doesn't seem to matter which version of openssl is used or the engine used, too, suggesting at least the speed command is autoselecting the engine based on the cipher being used. (I confirmed this is also the case on ALIX with glxsb). So the last two commands can be ignored apparently.
-
If you repeat that test (just the first two commands), are the results the same each time?
yes… i ran the commands a couple of times to see if the speed/results were consistent.
let me know if you need me to test anything else. :)
-
It may be helpful if others with capable hardware could run the same test, I started a spreadsheet here:
https://docs.google.com/spreadsheet/ccc?key=0AojFUXcbH0ROdE15eHB4dndHTXZYcU1mQm9Dc3V2elEThe only other thing to try is a similar test but with actual VPN traffic (e.g. OpenVPN using AES-128-CBC) to see if (a) throughput is improved and/or (b) cpu usage reduced under load.
-
Thought of one more thing:
cryptotest -va aes128