Testing Requested: AESNI
-
FreeBSD 8.3 includes AES-NI support, but we don't have any hardware on-hand with that feature on the CPU to test with.
If you have an AES capable CPU (Newer i5's, i7's, even some i3's, see this list, and some amd procs have it too.), try to load the module:
- Go to System > Advanced, on the Misc tab, and pick AES-NI from the Cryptographic Hardware drop-down (on new snapshots)
And then see what shows up in the system log, and see if your VPN performance goes up (or at least CPU consumption for the VPN goes down).
If your CPU is not supported, you'll see something like:
Jun 11 08:41:40 pfSense kernel: aesni0: No AESNI support.
-
Hi,
I just tested it on a new HP DL120G7 with a Intel(R) Xeon(R) CPU E31220 @ 3.10GHz (3092.99-MHz K8-class CPU). The Module loads correctly:
dmesg:
padlock0: No ACE support. aesni0: <aes-cbc,aes-xts>on motherboard</aes-cbc,aes-xts>
but openssl didn't seem realize the engine:
$ openssl engine (cryptodev) BSD cryptodev engine (dynamic) Dynamic engine loading support
@jimp
The Server isn't in production yet, if you want you can get the credentials for some testing?! -
OpenSSL/OpenVPN won't see it directly, you just tell it "cryptodev" and it'll use it. (select that from the drop-down in the GUI for OpenVPN tunnels)
Credentials wouldn't help much, a system that already had working tunnels and performance data to compare against would be most useful (I also lack time as well as equipment, the more we can "crowdsource" the better ;-)
What you show is promising though, enough that I think we might need to include that .ko in the builds.
-
Can this be tested with cryptotest do you know?
Steve
-
In theory cryptotest and cryptostats should work since it's part of the BSD crypto framework.
IPsec should use it automatically, and OpenVPN will use it if you select cryptodev from the hardware crypto drop-down.
The modules should appear in tomorrow's build and I'm going to add a selector to enable it much like the current glxsb option. Though I will probably make them mutually exclusive either via radio selector or drop-down since the hardware required is mutually exclusive there is no reason to set both, it can only lead to problems.
-
GUI support is in:
https://github.com/bsdperimeter/pfsense/commit/7530177c7c59795b4e5c0767453444837ee5d622Next new snapshot after this post will have it. I'm manually restarting the builders now.
-
This is all there in current snapshots.
Now to wait for someone who actually has the hardware to show up… :-)
-
Do you still need someone with hardware to do some testing? I might be able to wrangle access to something and run some AES-NI/no AES-NI OpenVPN comparisons, and I'm definitely extremely interested in AES-NI support.
How easy is it to create a boot USB Flash drive? If I can do anything, I'm going to have to detach existing hard drives and re-attach them afterwards.
-
Yes, we need all the testing we can get.
There are bootable USB images up at http://snapshots.pfsense.org/
-
Ok; give me a week or two, and I'll see what I can do.
-
I've enabled it for openvpn and using an ipsec tunnel as well. I haven't noticed any problems with it, though performance measurements are difficult due to the excessive hardware it's running on.
What processes should I look for to determine what kind of performance impact it has?
-
Running cryptostats with and without it enabled might be interesting.
For the processes, I think for OpenVPN it would show up in the actual openvpn process, not sure if IPsec would show as racoon or just kernel cpu.
-
Running cryptostats with and without it enabled might be interesting.
For the processes, I think for OpenVPN it would show up in the actual openvpn process, not sure if IPsec would show as racoon or just kernel cpu.
Testing with the option on/off in openvpn made no difference (33% cpu at 105mbit openvpn-aes128cbc)
There seems to be an problem with the aesni driver, as this shows
ession = 0x0
device = aesni0
count = 1, size = 16
iv:
0000: 61 38 32 39 6a 6f 6f 73 6e 31 65 74 73 62 6f 75
cleartext:
0000: 6f 38 32 75 74 74 6a 34 62 62 62 21 69 74 6e 6f
cleartext:
0000: 6f 38 32 75 74 74 6a 34 62 62 62 21 69 74 6e 6f
0.000 sec, 2 aes256 crypts, 16 bytes, 16000000 byte/sec, 122.1 Mb/secA dualcore 3.4ghz I7 should be in the 2000MB range, if it was working as it should. Has there been any changes to the crypto device in recent builds? This test machine is:
2.1-BETA0 (i386)
built on Fri Jun 22 21:43:34 EDT 2012 -
Tried this on my up and coming firewall that's running a 3570k, which has support for AES-NI.
Running openssl speed shows no difference between activating AES-NI, and running stock, but this is normal from what I understood since it doesn't seem to show improvement on other systems neither.
What tools can I use to measure the performance of the AES-NI implementation without needing to measure routing performance over VPN?
Edit: When adding -evp to the command "openssl speed", I got a large increase in numbers, a very large one. I read that aesni has been moved onto evp for openssl, however I don't know what's going on here.
$ openssl speed -evp aes-128-cbcTo get the most accurate results, try to run this program when this computer is idle. Doing aes-128-cbc for 3s on 16 size blocks: 2857523 aes-128-cbc's in 0.15s Doing aes-128-cbc for 3s on 64 size blocks: 2661994 aes-128-cbc's in 0.16s Doing aes-128-cbc for 3s on 256 size blocks: 2061427 aes-128-cbc's in 0.17s Doing aes-128-cbc for 3s on 1024 size blocks: 1093513 aes-128-cbc's in 0.04s Doing aes-128-cbc for 3s on 8192 size blocks: 180211 aes-128-cbc's in 0.01s OpenSSL 0.9.8q 2 Dec 2010 built on: date not available options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) compiler: cc available timing options: USE_TOD HZ=128 [sysconf value] timing function used: getrusage The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 303771.65k 1078651.53k 3196959.56k 29730964.39k 195095614.11k
$ openssl engine
(cryptodev) BSD cryptodev engine (dynamic) Dynamic engine loading support
$ openssl speed
To get the most accurate results, try to run this program when this computer is idle. Doing md2 for 3s on 16 size blocks: 401850 md2's in 3.00s Doing md2 for 3s on 64 size blocks: 207132 md2's in 3.00s Doing md2 for 3s on 256 size blocks: 70482 md2's in 3.00s Doing md2 for 3s on 1024 size blocks: 19381 md2's in 3.00s Doing md2 for 3s on 8192 size blocks: 2495 md2's in 3.00s Doing mdc2 for 3s on 16 size blocks: 2457072 mdc2's in 3.00s Doing mdc2 for 3s on 64 size blocks: 693512 mdc2's in 3.00s Doing mdc2 for 3s on 256 size blocks: 179479 mdc2's in 3.00s Doing mdc2 for 3s on 1024 size blocks: 45252 mdc2's in 3.00s Doing mdc2 for 3s on 8192 size blocks: 5672 mdc2's in 3.00s Doing md4 for 3s on 16 size blocks: 7247323 md4's in 3.00s Doing md4 for 3s on 64 size blocks: 6755531 md4's in 3.00s Doing md4 for 3s on 256 size blocks: 4883120 md4's in 3.00s Doing md4 for 3s on 1024 size blocks: 2337344 md4's in 3.00s Doing md4 for 3s on 8192 size blocks: 398012 md4's in 3.00s Doing md5 for 3s on 16 size blocks: 5978420 md5's in 3.00s Doing md5 for 3s on 64 size blocks: 5199277 md5's in 3.00s Doing md5 for 3s on 256 size blocks: 3398475 md5's in 3.00s Doing md5 for 3s on 1024 size blocks: 1425765 md5's in 3.00s Doing md5 for 3s on 8192 size blocks: 222316 md5's in 3.00s Doing hmac(md5) for 3s on 16 size blocks: 5328855 hmac(md5)'s in 3.00s Doing hmac(md5) for 3s on 64 size blocks: 4671326 hmac(md5)'s in 3.00s Doing hmac(md5) for 3s on 256 size blocks: 3159369 hmac(md5)'s in 3.00s Doing hmac(md5) for 3s on 1024 size blocks: 1380441 hmac(md5)'s in 3.00s Doing hmac(md5) for 3s on 8192 size blocks: 220887 hmac(md5)'s in 3.00s Doing sha1 for 3s on 16 size blocks: 6069814 sha1's in 3.00s Doing sha1 for 3s on 64 size blocks: 4931770 sha1's in 3.00s Doing sha1 for 3s on 256 size blocks: 2928970 sha1's in 3.00s Doing sha1 for 3s on 1024 size blocks: 1116292 sha1's in 3.00s Doing sha1 for 3s on 8192 size blocks: 164988 sha1's in 3.00s Doing sha256 for 3s on 16 size blocks: 4598139 sha256's in 3.00s Doing sha256 for 3s on 64 size blocks: 3068521 sha256's in 3.00s Doing sha256 for 3s on 256 size blocks: 1494362 sha256's in 3.00s Doing sha256 for 3s on 1024 size blocks: 494263 sha256's in 3.00s Doing sha256 for 3s on 8192 size blocks: 66836 sha256's in 3.00s Doing sha512 for 3s on 16 size blocks: 3727036 sha512's in 3.00s Doing sha512 for 3s on 64 size blocks: 3726802 sha512's in 3.00s Doing sha512 for 3s on 256 size blocks: 1805883 sha512's in 3.00s Doing sha512 for 3s on 1024 size blocks: 701803 sha512's in 3.00s Doing sha512 for 3s on 8192 size blocks: 104335 sha512's in 3.00s Doing rmd160 for 3s on 16 size blocks: 5014298 rmd160's in 3.00s Doing rmd160 for 3s on 64 size blocks: 3627977 rmd160's in 3.00s Doing rmd160 for 3s on 256 size blocks: 1901799 rmd160's in 3.00s Doing rmd160 for 3s on 1024 size blocks: 654739 rmd160's in 3.00s Doing rmd160 for 3s on 8192 size blocks: 92034 rmd160's in 3.00s Doing rc4 for 3s on 16 size blocks: 96819688 rc4's in 3.00s Doing rc4 for 3s on 64 size blocks: 26486918 rc4's in 3.00s Doing rc4 for 3s on 256 size blocks: 6797096 rc4's in 3.00s Doing rc4 for 3s on 1024 size blocks: 1726539 rc4's in 3.00s Doing rc4 for 3s on 8192 size blocks: 216861 rc4's in 3.00s Doing des cbc for 3s on 16 size blocks: 12666236 des cbc's in 3.00s Doing des cbc for 3s on 64 size blocks: 3249366 des cbc's in 3.00s Doing des cbc for 3s on 256 size blocks: 816695 des cbc's in 3.00s Doing des cbc for 3s on 1024 size blocks: 204628 des cbc's in 3.00s Doing des cbc for 3s on 8192 size blocks: 25605 des cbc's in 3.00s Doing des ede3 for 3s on 16 size blocks: 4772607 des ede3's in 3.00s Doing des ede3 for 3s on 64 size blocks: 1208541 des ede3's in 3.00s Doing des ede3 for 3s on 256 size blocks: 303278 des ede3's in 3.00s Doing des ede3 for 3s on 1024 size blocks: 75881 des ede3's in 3.00s Doing des ede3 for 3s on 8192 size blocks: 9489 des ede3's in 3.00s Doing aes-128 cbc for 3s on 16 size blocks: 35100626 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 64 size blocks: 9214812 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 256 size blocks: 2316893 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 1024 size blocks: 581234 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 8192 size blocks: 73086 aes-128 cbc's in 3.00s Doing aes-192 cbc for 3s on 16 size blocks: 30975192 aes-192 cbc's in 3.00s Doing aes-192 cbc for 3s on 64 size blocks: 8065315 aes-192 cbc's in 3.00s Doing aes-192 cbc for 3s on 256 size blocks: 2027724 aes-192 cbc's in 3.00s Doing aes-192 cbc for 3s on 1024 size blocks: 508231 aes-192 cbc's in 3.00s Doing aes-192 cbc for 3s on 8192 size blocks: 63850 aes-192 cbc's in 3.00s Doing aes-256 cbc for 3s on 16 size blocks: 27504566 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 64 size blocks: 7169168 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 256 size blocks: 1801224 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 1024 size blocks: 451344 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 8192 size blocks: 56671 aes-256 cbc's in 3.00s Doing aes-128 ige for 3s on 16 size blocks: 36653955 aes-128 ige's in 3.00s Doing aes-128 ige for 3s on 64 size blocks: 9656932 aes-128 ige's in 3.00s Doing aes-128 ige for 3s on 256 size blocks: 2435978 aes-128 ige's in 3.00s Doing aes-128 ige for 3s on 1024 size blocks: 611488 aes-128 ige's in 3.00s Doing aes-128 ige for 3s on 8192 size blocks: 76654 aes-128 ige's in 3.00s Doing aes-192 ige for 3s on 16 size blocks: 32107099 aes-192 ige's in 3.00s Doing aes-192 ige for 3s on 64 size blocks: 8400913 aes-192 ige's in 3.00s Doing aes-192 ige for 3s on 256 size blocks: 2116313 aes-192 ige's in 3.00s Doing aes-192 ige for 3s on 1024 size blocks: 530959 aes-192 ige's in 3.00s Doing aes-192 ige for 3s on 8192 size blocks: 66545 aes-192 ige's in 3.00s Doing aes-256 ige for 3s on 16 size blocks: 28555853 aes-256 ige's in 3.00s Doing aes-256 ige for 3s on 64 size blocks: 7434226 aes-256 ige's in 3.00s Doing aes-256 ige for 3s on 256 size blocks: 1871325 aes-256 ige's in 3.00s Doing aes-256 ige for 3s on 1024 size blocks: 469310 aes-256 ige's in 3.00s Doing aes-256 ige for 3s on 8192 size blocks: 58792 aes-256 ige's in 3.00s Doing camellia-128 cbc for 3s on 16 size blocks: 21802881 camellia-128 cbc's in 3.00s Doing camellia-128 cbc for 3s on 64 size blocks: 5667379 camellia-128 cbc's in 3.00s Doing camellia-128 cbc for 3s on 256 size blocks: 1432879 camellia-128 cbc's in 3.00s Doing camellia-128 cbc for 3s on 1024 size blocks: 358708 camellia-128 cbc's in 3.00s Doing camellia-128 cbc for 3s on 8192 size blocks: 44922 camellia-128 cbc's in 3.00s Doing camellia-192 cbc for 3s on 16 size blocks: 16488434 camellia-192 cbc's in 3.00s Doing camellia-192 cbc for 3s on 64 size blocks: 4282365 camellia-192 cbc's in 3.00s Doing camellia-192 cbc for 3s on 256 size blocks: 1078711 camellia-192 cbc's in 3.00s Doing camellia-192 cbc for 3s on 1024 size blocks: 269872 camellia-192 cbc's in 3.00s Doing camellia-192 cbc for 3s on 8192 size blocks: 33773 camellia-192 cbc's in 3.00s Doing camellia-256 cbc for 3s on 16 size blocks: 16350769 camellia-256 cbc's in 3.00s Doing camellia-256 cbc for 3s on 64 size blocks: 4282388 camellia-256 cbc's in 3.00s Doing camellia-256 cbc for 3s on 256 size blocks: 1078704 camellia-256 cbc's in 3.00s Doing camellia-256 cbc for 3s on 1024 size blocks: 269870 camellia-256 cbc's in 3.00s Doing camellia-256 cbc for 3s on 8192 size blocks: 33773 camellia-256 cbc's in 3.00s Doing rc2 cbc for 3s on 16 size blocks: 9444067 rc2 cbc's in 3.00s Doing rc2 cbc for 3s on 64 size blocks: 2392481 rc2 cbc's in 3.00s Doing rc2 cbc for 3s on 256 size blocks: 602464 rc2 cbc's in 3.00s Doing rc2 cbc for 3s on 1024 size blocks: 150533 rc2 cbc's in 3.00s Doing rc2 cbc for 3s on 8192 size blocks: 18834 rc2 cbc's in 3.00s Doing rc5-32/12 cbc for 3s on 16 size blocks: 46814041 rc5-32/12 cbc's in 3.00s Doing rc5-32/12 cbc for 3s on 64 size blocks: 12733424 rc5-32/12 cbc's in 3.00s Doing rc5-32/12 cbc for 3s on 256 size blocks: 3225428 rc5-32/12 cbc's in 3.00s Doing rc5-32/12 cbc for 3s on 1024 size blocks: 813624 rc5-32/12 cbc's in 3.00s Doing rc5-32/12 cbc for 3s on 8192 size blocks: 102347 rc5-32/12 cbc's in 3.00s Doing blowfish cbc for 3s on 16 size blocks: 21725463 blowfish cbc's in 3.00s Doing blowfish cbc for 3s on 64 size blocks: 5676725 blowfish cbc's in 3.00s Doing blowfish cbc for 3s on 256 size blocks: 1434179 blowfish cbc's in 3.00s Doing blowfish cbc for 3s on 1024 size blocks: 358834 blowfish cbc's in 3.00s Doing blowfish cbc for 3s on 8192 size blocks: 44988 blowfish cbc's in 3.00s Doing cast cbc for 3s on 16 size blocks: 17178577 cast cbc's in 3.00s Doing cast cbc for 3s on 64 size blocks: 4439759 cast cbc's in 3.00s Doing cast cbc for 3s on 256 size blocks: 1116964 cast cbc's in 3.00s Doing cast cbc for 3s on 1024 size blocks: 279388 cast cbc's in 3.00s Doing cast cbc for 3s on 8192 size blocks: 35044 cast cbc's in 3.00s Doing 512 bit private rsa's for 10s: 61189 512 bit private RSA's in 9.98s Doing 512 bit public rsa's for 10s: 601215 512 bit public RSA's in 10.00s Doing 1024 bit private rsa's for 10s: 20334 1024 bit private RSA's in 9.99s Doing 1024 bit public rsa's for 10s: 340325 1024 bit public RSA's in 10.00s Doing 2048 bit private rsa's for 10s: 4119 2048 bit private RSA's in 10.00s Doing 2048 bit public rsa's for 10s: 136034 2048 bit public RSA's in 10.00s Doing 4096 bit private rsa's for 10s: 706 4096 bit private RSA's in 10.01s Doing 4096 bit public rsa's for 10s: 45624 4096 bit public RSA's in 10.00s Doing 512 bit sign dsa's for 10s: 105931 512 bit DSA signs in 9.96s Doing 512 bit verify dsa's for 10s: 106824 512 bit DSA verify in 10.00s Doing 1024 bit sign dsa's for 10s: 46416 1024 bit DSA signs in 9.99s Doing 1024 bit verify dsa's for 10s: 42466 1024 bit DSA verify in 10.00s Doing 2048 bit sign dsa's for 10s: 16911 2048 bit DSA signs in 9.99s Doing 2048 bit verify dsa's for 10s: 14245 2048 bit DSA verify in 10.00s OpenSSL 0.9.8q 2 Dec 2010 built on: date not available options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) compiler: cc available timing options: USE_TOD HZ=128 [sysconf value] timing function used: getrusage The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 2141.30k 4417.44k 6012.57k 6613.21k 6808.50k mdc2 13100.85k 14790.31k 15310.75k 15441.23k 15482.13k md4 38643.98k 144091.67k 416565.02k 797565.38k 1086500.20k md5 31875.01k 110883.42k 289913.04k 486509.49k 606880.13k hmac(md5) 28411.91k 99623.97k 269515.58k 471044.03k 602978.83k sha1 32362.42k 105178.42k 249861.15k 380908.95k 450385.51k rmd160 26734.79k 77372.78k 162236.45k 223414.84k 251234.70k rc4 516214.74k 564878.95k 579838.72k 589142.68k 591990.60k des cbc 67532.32k 69298.28k 69669.64k 69824.69k 69896.47k des ede3 25446.22k 25774.19k 25871.64k 25892.63k 25902.58k idea cbc 0.00 0.00 0.00 0.00 0.00 seed cbc 0.00 0.00 0.00 0.00 0.00 rc2 cbc 50353.49k 51023.74k 51394.29k 51365.90k 51413.47k rc5-32/12 cbc 249597.68k 271562.10k 275151.04k 277663.96k 279387.16k blowfish cbc 115834.12k 121065.94k 122345.23k 122443.88k 122808.75k cast cbc 91590.96k 94685.38k 95284.60k 95334.60k 95661.91k aes-128 cbc 187150.56k 196521.54k 197646.67k 198332.79k 199511.26k aes-192 cbc 165149.83k 172006.56k 172978.82k 173422.19k 174297.99k aes-256 cbc 146646.34k 152894.80k 153656.61k 154011.21k 154699.80k camellia-128 cbc 116247.24k 120866.54k 122249.07k 122401.22k 122626.91k camellia-192 cbc 87912.53k 91328.81k 92021.42k 92087.61k 92193.36k camellia-256 cbc 87177.89k 91329.24k 92020.70k 92087.14k 92192.72k sha256 24515.91k 65445.03k 127480.26k 168655.93k 182448.88k sha512 19871.51k 79480.47k 154054.52k 239474.36k 284814.01k aes-128 ige 195429.52k 205950.70k 207805.56k 208656.41k 209249.49k aes-192 ige 171186.45k 179163.88k 180536.02k 181177.48k 181655.84k aes-256 ige 152250.74k 158547.62k 159636.75k 160141.45k 160489.78k sign verify sign/s verify/s rsa 512 bits 0.000163s 0.000017s 6132.5 60117.0 rsa 1024 bits 0.000491s 0.000029s 2034.9 34031.0 rsa 2048 bits 0.002428s 0.000074s 411.9 13602.8 rsa 4096 bits 0.014179s 0.000219s 70.5 4562.3 sign verify sign/s verify/s dsa 512 bits 0.000094s 0.000094s 10632.7 10681.8 dsa 1024 bits 0.000215s 0.000235s 4644.8 4246.4 dsa 2048 bits 0.000591s 0.000702s 1692.3 1424.4
$ dmesg | grep -i aes
Features2=0x779ae3bf<sse3,pclmulqdq,dtes64,mon,ds_cpl,vmx,est,tm2,ssse3,cx16,xtpr,pdcm,pcid,sse4.1,sse4.2,popcnt,tscdlt,aesni,xsave,avx,f16c,<b30>> aesni0: <aes-cbc,aes-xts> on motherboard</aes-cbc,aes-xts></sse3,pclmulqdq,dtes64,mon,ds_cpl,vmx,est,tm2,ssse3,cx16,xtpr,pdcm,pcid,sse4.1,sse4.2,popcnt,tscdlt,aesni,xsave,avx,f16c,<b30>
$ kldstat
Id Refs Address Size Name 1 6 0xffffffff80100000 155a000 kernel 2 1 0xffffffff81812000 1a49 aesni.ko 3 1 0xffffffff81814000 c63 coretemp.ko
$ kldunload aesni
aesni0: detached
$ kldload aesni
padlock0: No ACE support. aesni0: <aes-cbc,aes-xts> on motherboard</aes-cbc,aes-xts>
-
any updates regarding aes-ni not working?
i have a test environment with aes-ni capabilities that i'd be more than happy to let you use for testing.