23.09d - Is QAT Broken?
-
@jimp
Am I not correct in presuming that encrypted traffic originating from the router that uses TLS will use QAT?I don't know what daemons running on the router reach-out. I'm not good at guessing but router updates/checks, package updates, DoT (unbound), pfBlocker et al. I would be surprised if any of the services hosted on pfSense used unencrypted traffic but again, that is a guess from a non-developer.
The use of QAT has changed from 23.05.1 (and below) to 23.09 dev. I genuinely don't know why or how routine TLS traffic or a test openssl session with Microsoft fails to show any QAT usage. I read through the Intel QAT white-paper on testing and used the confirmation methods they gave.
Perhaps 'broken' is the wrong verb in dev-speak so if I need to use English in a different way please suggest a better one. As a customer I am trying my best here.
๏ธ
-
What I'm saying is you need to be a lot more specific. Yes, there is a change in behavior but you haven't even clearly defined what that is.
I ran some tests locally and it doesn't appear to be getting used by ssh, nginx, or openvpn at least and I seem to recall it worked at least with nginx in the past. I am not sure about outbound. I don't have anything left on 23.05.1 with QAT to confirm all of the old behavior.
Primarily what gets accelerated is use of the accelerated algorithms in the kernel, so IPsec and OpenVPN DCO are the main consumers.
What you'd need to check is looking at interrupts on the QAT device and see if they go up in proportion with traffic you send to things like SSH (type some lines, check the count, or SCP a large file), the GUI (check after refreshing the page), and so on. For testing outbound, you can try
pkg update -f
or try tocurl https://<something>
. -
Speaking with some others here the only things that would use QAT would be encryption in the kernel, so IPsec and OpenVPN DCO as I mentioned. It wouldn't get used for userspace daemons or clients.
So it would help to confirm what exactly was using QAT on 23.05.1 that you aren't seeing now.
-
I am trying to be specific as I can but I just don't have your knowledge. I can only give you exactly what I have tested and paste in the exact results given. Not that it matters, I did not state 'QAT is broken', I posted a question asking if it was, specific to a firmware load.
I have tried the testing methods listed in the Intel QAT paper, I have monitored the routine traffic originating from the router, I have run updates, basic
pkg update -f
&curl
to HTTPS sites and theopenssl
test session with Microsoft (as per their developer guide). There remains zero QAT interrupts, whatever I try.I am sure there are different crypto uses I have not tried and may work, but I can say that all the different (and somewhat basic) things I would expect QAT to work with now produce zero interrupts.
Have you tried any of the tests I have used and, if so, is everything working ok for you?
Apologies for being a pfSense newcomer and not having enough knowledge to resolve this on my own. If you need someone who can fly a fighter-jet or run a complex flight test profile I'm your guy but I need help with pfSense!
๏ธ
-
@jimp said in 23.09d - Is QAT Broken?:
Speaking with some others here the only things that would use QAT would be encryption in the kernel, so IPsec and OpenVPN DCO as I mentioned. It wouldn't get used for userspace daemons or clients.
So it would help to confirm what exactly was using QAT on 23.05.1 that you aren't seeing now.
Our messages crossed. How would you like me to test for that on a 23.05.1 snapshot?
๏ธ
-
Test that exactly as I mentioned before -- try running traffic through each daemon individually and see if you see interrupts on the QAT device when you do.
I just setup and tested an OpenVPN DCO tunnel here and it is also using QAT just like IPsec, both of which are in the kernel. That's what we expect to see.
So we need to figure out what you were seeing using DCO on 23.05.1 to narrow down what has changed in your environment.
-
@jimp
Thanks @jimp and happy to do so. To avoid more errata, how specifically do you want to achieve traffic through each daemon?Does this not conflict with your note that QAT is not used for userspace daemons?
Does the lack of QAT interrupts when using TLS, or a
curl
to HTTPS, orpkg update -f
or anopenssl
test session mean nothing and a zero result is actually the expected behaviour?I've become slightly confused to know if I have a problem or not?
I need more tea.
๏ธ
-
@jimp One of the big changes in 23.09 is the switch to OpenSSL v3. OpenSSL has a QAT engine if its built with it. Is it possible that the OpenSSL 1.1.x was built with the QAT engine support and 3.x is not?
[23.05.1-RELEASE][root@pfsense.bayside.sara-jeff.nyc]/root: openssl engine (devcrypto) /dev/crypto engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support [23.09-DEVELOPMENT][root@hostname]/root: openssl engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support
I suspect that the OpenSSL QAT engine is unavailable as part of 23.05 because qatengine.so is not installed in the default location and /etc/ssl/openssl.conf does not configure it.
[23.05.1-RELEASE][root@pfsense.bayside.sara-jeff.nyc]/usr/bin: ./openssl engine -t -c -v qatengine 59772833685504:error:25066067:DSO support routines:dlfcn_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/dso/dso_dlfcn.c:118:filename(/usr/lib/engines/qatengine.so): Cannot open "/usr/lib/engines/qatengine.so" 59772833685504:error:25070067:DSO support routines:DSO_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/dso/dso_lib.c:162: 59772833685504:error:260B6084:engine routines:dynamic_load:dso not found:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/engine/eng_dyn.c:434: 59772833685504:error:2606A074:engine routines:ENGINE_by_id:no such engine:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/engine/eng_list.c:421:id=qatengine
The Intel QAT engine is supported for FreeBSD so it might be worthwhile adding to a future release:
https://github.com/intel/QAT_Engine/blob/master/README.md
https://www.intel.com/content/www/us/en/download/19735/intel-quickassist-technology-driver-for-freebsd-hw-version-1-x.html? -
@RobbieTT said in 23.09d - Is QAT Broken?:
@jimp
Thanks @jimp and happy to do so. To avoid more errata, how specifically do you want to achieve traffic through each daemon?openvpn: Pinging the tunnel IP addresses on the other end and checking is sufficient.
IPsec: For tunnel mode, ping LAN to LAN, for VTI, pinging the VTI address on the far side is enough.
nginx: Try reloading various GUI pages in a browser and see if the interrupts increase, or try transferring data from a remote curl client. s_client alone may not do enough to be meaningful since it's just negotiating the connection.
ssh: Even just running the command to check the interrupts should increase the interrupts, but using scp to transfer a large-ish file would really show it checking before and after.
outbound/curl: Try fetching a remote file using https and see if the interrupts increase.
Does this not conflict with your note that QAT is not used for userspace daemons?
We expect it only to be used by the kernel. You're seeing some difference on 23.05.1, which is why we need more data to isolate what that might be.
Does the lack of QAT interrupts when using TLS, or a
curl
to HTTPS, orpkg update -f
or anopenssl
test session mean nothing and a zero result is actually the expected behaviour?We don't expect any of those to cause interrupts on QAT since they aren't running through the kernel. So not seeing an increase is the correct behavior.
So far I haven't seen anything that suggests there is a problem on 23.09 but we need more data about what you were seeing on 23.05.1 to say for sure.
-
I reloaded 23.05.1 on a 4100 and I don't see any QAT activity on there at all for the GUI, ssh, curl, etc.
Are you certain you don't have any IPsec or OpenVPN DCO tunnels on 23.05.1 or 23.09?
Maybe if you had an OpenVPN DCO tunnel on 23.05.1 it was using AES-GCM (accelerated) but on 23.09 it may be using ChaCha20-Poly1305 (not accelerated).
-
@jimp
I'm up on 23.05.1 right now and have re-run all the testing. To me, this version is working correctly, including in user-land.Immediately after rebooting with 23.05.1: Last login: Mon Oct 2 13:32:39 on ttys000 ~ % ssh admin@router-8 (admin@router-8) Password for admin@Router-8.redacted.me: Netgate 6100 - Serial: 20######8 - Netgate Device ID: ee72########851a *** Welcome to Netgate pfSense Plus 23.05.1-RELEASE (amd64) on Router-8 *** Current Boot Environment: quick-20230831162514 Next Boot Environment: default WAN (wan) -> pppoe0 -> v4/PPPoE: .../32 v6/DHCP6: 2a02::/64 MAN (lan) -> igc0 -> v4: 192.168.1.1/24 v6/t6: 2a02::/64 VLAN (opt1) -> ix1.1003 -> v4: 172.16.1.1/24 v6/t6: 2a02::/64 LAN (opt3) -> ix1 -> v4: 10.0.1.1/24 v6/t6: 2a02::/64 ONT (opt4) -> igc3 -> v4: 10.0.0.1/24 0) Logout (SSH only) 9) pfTop 1) Assign Interfaces 10) Filter Logs 2) Set interface(s) IP address 11) Restart webConfigurator 3) Reset webConfigurator password 12) PHP shell + Netgate pfSense Plus tools 4) Reset to factory defaults 13) Update from console 5) Reboot system 14) Disable Secure Shell (sshd) 6) Halt system 15) Restore recent configuration 7) Ping host 16) Restart PHP-FPM 8) Shell Enter an option: 8 [23.05.1-RELEASE][admin@Router-8.redacted.me]/root: vmstat -i | grep qat irq175: qat0:b1 48 0 irq176: qat0:b2 32 0 [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
The QAT interrupts are present and increment in the manner I expected for all the things I was expecting QAT to process (I acknowledge that you don't expect QAT to be running for anything that is not running through the kernel).
Intel seems to expect QAT to be used for the examples I have given previously and from what I understand the only reason they would not would be if the QAT API wasn't being used.
QAT interrupts increment with:
curl
(eg curl https://www.netgate.com)
pkg update -f
Any GUI action triggering the use of an external source (eg Package Manager)
Any package or service updating (eg pfBlocker)
openSSL testing (Microsoft)
File transfers
DoTBasically I cannot find an example where QAT is not incrementing the interrupts for any encrypted task run on the router itself with 23.05.1.
Again, I have no VPNs in use or even a VPN profile set.
I'm at a loss as to why this behaviour is not seen as the expected behaviour, with the full QAT implementation. pfSense+ appears to behave in accordance with Intel's QAT developers guide; at least to a layman and pre-23.09 dev.
Meanwhile QAT, pre-23.09 dev keeps on ticking upwards:
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: vmstat -i | grep qat irq175: qat0:b1 1466 0 irq176: qat0:b2 1722 0 [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
I appreciate that we are already 'at the age of not believing' and my inability to understand the underpinnings of pfSense is proven above but in case anyone gets this far, I did note a small config change for QAT between 23.05 and 23.09. In 23.09 there is this extra line in sysctl:
dev.qat_ocf.0.enable: 1
Just an observation, nothing more.
I think I have scratched my head enough as an end-user. I'd be grateful if Netgate could go over their notes once more. However, I can see that the associated redmine has now been shelved.
๏ธ
-
@RobbieTT what do "openssl engine" and "openssl engine -t -c -v qatengine" report on your device when running 23.05.1?
-
The redmine has not been "shelved" it's waiting on more information because without more information there is nothing we can do.
dev.qat_ocf.0.enable
is to tie QAT into the opencrypto framework (cryptodev) which isn't really a change in how it operates but the newer base OS displays it differently.Another difference in your case is that your WAN is PPPoE which may be using encryption of its own. It's possible that is the difference. There may be some difference in how PPPoE is doing encryption which is now not going through QAT when it was before.
Anything you're trying is also going over your WAN which would be causing PPPoE traffic.
So that's the next thing I would look at is the PPP connection logs from both to see if there is anything different there.
I still wouldn't normally expect that to be using QAT since mpd is a userspace daemon, but some parts of what happens for PPP do end up in the kernel so it's not 100% clear either way at the moment.
-
@jaltman said in 23.09d - Is QAT Broken?:
openssl engine -t -c -v qatengine
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: openssl engine (devcrypto) /dev/crypto engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: openssl engine -t -c -v qatengine 14607459921920:error:25066067:DSO support routines:dlfcn_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/dso/dso_dlfcn.c:118:filename(/usr/lib/engines/qatengine.so): Cannot open "/usr/lib/engines/qatengine.so" 14607459921920:error:25070067:DSO support routines:DSO_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/dso/dso_lib.c:162: 14607459921920:error:260B6084:engine routines:dynamic_load:dso not found:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/engine/eng_dyn.c:434: 14607459921920:error:2606A074:engine routines:ENGINE_by_id:no such engine:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/engine/eng_list.c:421:id=qatengine [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
๏ธ
-
@jimp said in 23.09d - Is QAT Broken?:
Another difference in your case is that your WAN is PPPoE which may be using encryption of its own. It's possible that is the difference. There may be some difference in how PPPoE is doing encryption which is now not going through QAT when it was before.
Anything you're trying is also going over your WAN which would be causing PPPoE traffic.
So that's the next thing I would look at is the PPP connection logs from both to see if there is anything different there.
Nothing of note in the PPPoE logs, just the regular handshakes.
The QAT interrupt rates are not varied by the WAN (PPPoE) handling traffic with no observable change between near-idle and almost saturated.
๏ธ
-
And you didn't do anything different with the installation, like maybe turn on filesystem encryption in the installer?
Any differences in sysctl tunables? loader.conf.local contents? Any packages that might be doing something with encryption? Any difference in kernel modules (not just qat modules)?
There has to be something more fundamentally different at play here. I can't replicate your experience here at all on 23.05.1 with a 4100.
For me, both before and after, it gets used with kernel encryption for IPsec and OpenVPN DCO and that's it. That's all that it should work with the way it's been implemented in the current driver on FreeBSD.
I even tried forcing on userspace QAT but the driver failed saying that only works on 4xxx QAT devices, so again, that isn't something that would have possibly been functional now or before.
-
@jimp said in 23.09d - Is QAT Broken?:
And you didn't do anything different with the installation, like maybe turn on filesystem encryption in the installer?
Any differences in sysctl tunables? loader.conf.local contents? Any packages that might be doing something with encryption? Any difference in kernel modules (not just qat modules)?
irq175: qat0:b1 3344 0 irq176: qat0:b2 3684 0
No changes to the kernel and nothing via loader.conf.local. I do have 1 small change via sysctl tuneables, as you would expect:
net.isr.dispatch=deferred
Along with PPPoE I also use IPv6 and use FQ_CoDel. Installed packages and services actually active are:
I don't think these are significant but mentioned for completness.
๏ธ
-
@RobbieTT said in 23.09d - Is QAT Broken?:
@jaltman said in 23.09d - Is QAT Broken?:
openssl engine -t -c -v qatengine
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: openssl engine (devcrypto) /dev/crypto engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: openssl engine -t -c -v qatengine 14607459921920:error:25066067:DSO support routines:dlfcn_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/dso/dso_dlfcn.c:118:filename(/usr/lib/engines/qatengine.so): Cannot open "/usr/lib/engines/qatengine.so" 14607459921920:error:25070067:DSO support routines:DSO_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/dso/dso_lib.c:162: 14607459921920:error:260B6084:engine routines:dynamic_load:dso not found:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/engine/eng_dyn.c:434: 14607459921920:error:2606A074:engine routines:ENGINE_by_id:no such engine:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/engine/eng_list.c:421:id=qatengine [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
๏ธ
I'm getting the same error as you on a C3xxx board runnig pfSense+ 23.05.1. Is it normal, should we ignore this?
-
Not much different on today's 23.09d. I don't know if it is significant error or not:
[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: openssl engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: openssl engine -t -c -v qatengine 0020E1AF5B420000:error:12800067:DSO support routines:dlfcn_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-master-main/sources/FreeBSD-src-plus-devel-main/crypto/openssl/crypto/dso/dso_dlfcn.c:118:filename(/usr/lib/engines-3/qatengine.so): Cannot open "/usr/lib/engines-3/qatengine.so" 0020E1AF5B420000:error:12800067:DSO support routines:DSO_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-master-main/sources/FreeBSD-src-plus-devel-main/crypto/openssl/crypto/dso/dso_lib.c:152: 0020E1AF5B420000:error:13000084:engine routines:dynamic_load:dso not found:/var/jenkins/workspace/pfSense-Plus-snapshots-master-main/sources/FreeBSD-src-plus-devel-main/crypto/openssl/crypto/engine/eng_dyn.c:442: 0020E1AF5B420000:error:13000074:engine routines:ENGINE_by_id:no such engine:/var/jenkins/workspace/pfSense-Plus-snapshots-master-main/sources/FreeBSD-src-plus-devel-main/crypto/openssl/crypto/engine/eng_list.c:430:id=qatengine [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:
๏ธ
-
@RobbieTT If on a development snapshot I can understand to happen, why I am also getting this on 23.05.1 which is a production version...I know you don't have the answers, but maybe someone will help with a reply.