23.09d - Is QAT Broken?
-
@RobbieTT said in 23.09d - Is QAT Broken?:
@jimp
Thanks @jimp and happy to do so. To avoid more errata, how specifically do you want to achieve traffic through each daemon?openvpn: Pinging the tunnel IP addresses on the other end and checking is sufficient.
IPsec: For tunnel mode, ping LAN to LAN, for VTI, pinging the VTI address on the far side is enough.
nginx: Try reloading various GUI pages in a browser and see if the interrupts increase, or try transferring data from a remote curl client. s_client alone may not do enough to be meaningful since it's just negotiating the connection.
ssh: Even just running the command to check the interrupts should increase the interrupts, but using scp to transfer a large-ish file would really show it checking before and after.
outbound/curl: Try fetching a remote file using https and see if the interrupts increase.
Does this not conflict with your note that QAT is not used for userspace daemons?
We expect it only to be used by the kernel. You're seeing some difference on 23.05.1, which is why we need more data to isolate what that might be.
Does the lack of QAT interrupts when using TLS, or a
curl
to HTTPS, orpkg update -f
or anopenssl
test session mean nothing and a zero result is actually the expected behaviour?We don't expect any of those to cause interrupts on QAT since they aren't running through the kernel. So not seeing an increase is the correct behavior.
So far I haven't seen anything that suggests there is a problem on 23.09 but we need more data about what you were seeing on 23.05.1 to say for sure.
-
I reloaded 23.05.1 on a 4100 and I don't see any QAT activity on there at all for the GUI, ssh, curl, etc.
Are you certain you don't have any IPsec or OpenVPN DCO tunnels on 23.05.1 or 23.09?
Maybe if you had an OpenVPN DCO tunnel on 23.05.1 it was using AES-GCM (accelerated) but on 23.09 it may be using ChaCha20-Poly1305 (not accelerated).
-
@jimp
I'm up on 23.05.1 right now and have re-run all the testing. To me, this version is working correctly, including in user-land.Immediately after rebooting with 23.05.1: Last login: Mon Oct 2 13:32:39 on ttys000 ~ % ssh admin@router-8 (admin@router-8) Password for admin@Router-8.redacted.me: Netgate 6100 - Serial: 20######8 - Netgate Device ID: ee72########851a *** Welcome to Netgate pfSense Plus 23.05.1-RELEASE (amd64) on Router-8 *** Current Boot Environment: quick-20230831162514 Next Boot Environment: default WAN (wan) -> pppoe0 -> v4/PPPoE: .../32 v6/DHCP6: 2a02::/64 MAN (lan) -> igc0 -> v4: 192.168.1.1/24 v6/t6: 2a02::/64 VLAN (opt1) -> ix1.1003 -> v4: 172.16.1.1/24 v6/t6: 2a02::/64 LAN (opt3) -> ix1 -> v4: 10.0.1.1/24 v6/t6: 2a02::/64 ONT (opt4) -> igc3 -> v4: 10.0.0.1/24 0) Logout (SSH only) 9) pfTop 1) Assign Interfaces 10) Filter Logs 2) Set interface(s) IP address 11) Restart webConfigurator 3) Reset webConfigurator password 12) PHP shell + Netgate pfSense Plus tools 4) Reset to factory defaults 13) Update from console 5) Reboot system 14) Disable Secure Shell (sshd) 6) Halt system 15) Restore recent configuration 7) Ping host 16) Restart PHP-FPM 8) Shell Enter an option: 8 [23.05.1-RELEASE][admin@Router-8.redacted.me]/root: vmstat -i | grep qat irq175: qat0:b1 48 0 irq176: qat0:b2 32 0 [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
The QAT interrupts are present and increment in the manner I expected for all the things I was expecting QAT to process (I acknowledge that you don't expect QAT to be running for anything that is not running through the kernel).
Intel seems to expect QAT to be used for the examples I have given previously and from what I understand the only reason they would not would be if the QAT API wasn't being used.
QAT interrupts increment with:
curl
(eg curl https://www.netgate.com)
pkg update -f
Any GUI action triggering the use of an external source (eg Package Manager)
Any package or service updating (eg pfBlocker)
openSSL testing (Microsoft)
File transfers
DoTBasically I cannot find an example where QAT is not incrementing the interrupts for any encrypted task run on the router itself with 23.05.1.
Again, I have no VPNs in use or even a VPN profile set.
I'm at a loss as to why this behaviour is not seen as the expected behaviour, with the full QAT implementation. pfSense+ appears to behave in accordance with Intel's QAT developers guide; at least to a layman and pre-23.09 dev.
Meanwhile QAT, pre-23.09 dev keeps on ticking upwards:
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: vmstat -i | grep qat irq175: qat0:b1 1466 0 irq176: qat0:b2 1722 0 [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
I appreciate that we are already 'at the age of not believing' and my inability to understand the underpinnings of pfSense is proven above but in case anyone gets this far, I did note a small config change for QAT between 23.05 and 23.09. In 23.09 there is this extra line in sysctl:
dev.qat_ocf.0.enable: 1
Just an observation, nothing more.
I think I have scratched my head enough as an end-user. I'd be grateful if Netgate could go over their notes once more. However, I can see that the associated redmine has now been shelved.
️
-
@RobbieTT what do "openssl engine" and "openssl engine -t -c -v qatengine" report on your device when running 23.05.1?
-
The redmine has not been "shelved" it's waiting on more information because without more information there is nothing we can do.
dev.qat_ocf.0.enable
is to tie QAT into the opencrypto framework (cryptodev) which isn't really a change in how it operates but the newer base OS displays it differently.Another difference in your case is that your WAN is PPPoE which may be using encryption of its own. It's possible that is the difference. There may be some difference in how PPPoE is doing encryption which is now not going through QAT when it was before.
Anything you're trying is also going over your WAN which would be causing PPPoE traffic.
So that's the next thing I would look at is the PPP connection logs from both to see if there is anything different there.
I still wouldn't normally expect that to be using QAT since mpd is a userspace daemon, but some parts of what happens for PPP do end up in the kernel so it's not 100% clear either way at the moment.
-
@jaltman said in 23.09d - Is QAT Broken?:
openssl engine -t -c -v qatengine
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: openssl engine (devcrypto) /dev/crypto engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: openssl engine -t -c -v qatengine 14607459921920:error:25066067:DSO support routines:dlfcn_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/dso/dso_dlfcn.c:118:filename(/usr/lib/engines/qatengine.so): Cannot open "/usr/lib/engines/qatengine.so" 14607459921920:error:25070067:DSO support routines:DSO_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/dso/dso_lib.c:162: 14607459921920:error:260B6084:engine routines:dynamic_load:dso not found:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/engine/eng_dyn.c:434: 14607459921920:error:2606A074:engine routines:ENGINE_by_id:no such engine:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/engine/eng_list.c:421:id=qatengine [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
️
-
@jimp said in 23.09d - Is QAT Broken?:
Another difference in your case is that your WAN is PPPoE which may be using encryption of its own. It's possible that is the difference. There may be some difference in how PPPoE is doing encryption which is now not going through QAT when it was before.
Anything you're trying is also going over your WAN which would be causing PPPoE traffic.
So that's the next thing I would look at is the PPP connection logs from both to see if there is anything different there.
Nothing of note in the PPPoE logs, just the regular handshakes.
The QAT interrupt rates are not varied by the WAN (PPPoE) handling traffic with no observable change between near-idle and almost saturated.
️
-
And you didn't do anything different with the installation, like maybe turn on filesystem encryption in the installer?
Any differences in sysctl tunables? loader.conf.local contents? Any packages that might be doing something with encryption? Any difference in kernel modules (not just qat modules)?
There has to be something more fundamentally different at play here. I can't replicate your experience here at all on 23.05.1 with a 4100.
For me, both before and after, it gets used with kernel encryption for IPsec and OpenVPN DCO and that's it. That's all that it should work with the way it's been implemented in the current driver on FreeBSD.
I even tried forcing on userspace QAT but the driver failed saying that only works on 4xxx QAT devices, so again, that isn't something that would have possibly been functional now or before.
-
@jimp said in 23.09d - Is QAT Broken?:
And you didn't do anything different with the installation, like maybe turn on filesystem encryption in the installer?
Any differences in sysctl tunables? loader.conf.local contents? Any packages that might be doing something with encryption? Any difference in kernel modules (not just qat modules)?
irq175: qat0:b1 3344 0 irq176: qat0:b2 3684 0
No changes to the kernel and nothing via loader.conf.local. I do have 1 small change via sysctl tuneables, as you would expect:
net.isr.dispatch=deferred
Along with PPPoE I also use IPv6 and use FQ_CoDel. Installed packages and services actually active are:
I don't think these are significant but mentioned for completness.
️
-
@RobbieTT said in 23.09d - Is QAT Broken?:
@jaltman said in 23.09d - Is QAT Broken?:
openssl engine -t -c -v qatengine
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: openssl engine (devcrypto) /dev/crypto engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: openssl engine -t -c -v qatengine 14607459921920:error:25066067:DSO support routines:dlfcn_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/dso/dso_dlfcn.c:118:filename(/usr/lib/engines/qatengine.so): Cannot open "/usr/lib/engines/qatengine.so" 14607459921920:error:25070067:DSO support routines:DSO_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/dso/dso_lib.c:162: 14607459921920:error:260B6084:engine routines:dynamic_load:dso not found:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/engine/eng_dyn.c:434: 14607459921920:error:2606A074:engine routines:ENGINE_by_id:no such engine:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05_1-main/sources/FreeBSD-src-plus-RELENG_23_05_1/crypto/openssl/crypto/engine/eng_list.c:421:id=qatengine [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
️
I'm getting the same error as you on a C3xxx board runnig pfSense+ 23.05.1. Is it normal, should we ignore this?
-
Not much different on today's 23.09d. I don't know if it is significant error or not:
[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: openssl engine (rdrand) Intel RDRAND engine (dynamic) Dynamic engine loading support [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: openssl engine -t -c -v qatengine 0020E1AF5B420000:error:12800067:DSO support routines:dlfcn_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-master-main/sources/FreeBSD-src-plus-devel-main/crypto/openssl/crypto/dso/dso_dlfcn.c:118:filename(/usr/lib/engines-3/qatengine.so): Cannot open "/usr/lib/engines-3/qatengine.so" 0020E1AF5B420000:error:12800067:DSO support routines:DSO_load:could not load the shared library:/var/jenkins/workspace/pfSense-Plus-snapshots-master-main/sources/FreeBSD-src-plus-devel-main/crypto/openssl/crypto/dso/dso_lib.c:152: 0020E1AF5B420000:error:13000084:engine routines:dynamic_load:dso not found:/var/jenkins/workspace/pfSense-Plus-snapshots-master-main/sources/FreeBSD-src-plus-devel-main/crypto/openssl/crypto/engine/eng_dyn.c:442: 0020E1AF5B420000:error:13000074:engine routines:ENGINE_by_id:no such engine:/var/jenkins/workspace/pfSense-Plus-snapshots-master-main/sources/FreeBSD-src-plus-devel-main/crypto/openssl/crypto/engine/eng_list.c:430:id=qatengine [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:
️
-
@RobbieTT If on a development snapshot I can understand to happen, why I am also getting this on 23.05.1 which is a production version...I know you don't have the answers, but maybe someone will help with a reply.
-
@NRgia said in 23.09d - Is QAT Broken?:
I'm getting the same error as you on a C3xxx board runnig pfSense+ 23.05.1. Is it normal, should we ignore this?
As far as I can tell the openssl qatengine is not built for pfSense and has never been. Therefore it is normal that these commands will fail.
If the openssl qatengine were present then nginx, curl, sshd, and ssh and other userspace tools would be able to leverage the QAT hardware.@jimp has said that QAT is only expected to be enabled for kernel functions. The kernel QAT support does not rely upon the openssl qatengine.
These failures can be ignored.
-
@jaltman Thanks, I feel releaved now.
-
@jaltman said in 23.09d - Is QAT Broken?:
@jimp has said that QAT is only expected to be enabled for kernel functions. The kernel QAT support does not rely upon the openssl qatengine.
We still have the open question as to why pre 23.09d the QAT interrupts increment concurrently with on-device TLS/SSL type tasks.
I would be surprised if Netgate deliberately neutered the QAT support in the latest freeBSD offerings, given their close ties with upstream. They push QAT as a feature for pfSense+ so it would make more sense to unleash it wherever possible.
️
-
That kind of FUD is completely uncalled for. Netgate didn't reduce the functionality of the driver at all, we have put significant resources into its development.
-
I am encountering a similar problem with QAT not working on a fresh install, but on 23.05.01.
I started a post before I found this one here.
Similar to OP here is my dmesg and vmstat output:
qat0: <Intel 200xx QuickAssist> mem 0xfe600000-0xfe63ffff,0xfe640000-0xfe67ffff irq 16 at device 0.0 on pci2 qat0: qat_dev0 started 6 acceleration engines qat0: FW version: 4.18.0 qat0: Excessive clock measure delay qat_ocf0: <QAT engine> [23.05.1-RELEASE][admin@pfSense.home.arpa]/root: vmstat -i | grep qat [23.05.1-RELEASE][admin@pfSense.home.arpa]/root:
I have not configured any VPNs as I was trying to get the dashboard QAT Crypto status to change to YES as a first step.
I'm a QAT newb, so it's possible I have overlooked something simple...
-
@jimp said in 23.09d - Is QAT Broken?:
That kind of FUD is completely uncalled for. Netgate didn't reduce the functionality of the driver at all, we have put significant resources into its development.
@jimp I'm not sure if you are aiming at @jaltman (who was merely repeating your words) or myself, when I expressed surprise if Netgate's desire was to limit QAT functionality, especially as you push QAT as a feature. I was expressing doubt that Netgate would do this.
Jim, you have been PA or a bit combative on this issue for no real reason that I can see. You have spun my questions back on me by asking me what traffic, TLS/algorithms do I expect to be accelerated and even misstated my questions as statements, which you then baulk at. Meanwhile the original, simple A vs B question remains unexplained and sidestepped.
I have run all the tests you have asked for, checked all the configurations that you requested, spent hours booting in and out of pfSense versions that have only reinforced the original query. Simply put, what is show as accelerated by QAT in 23.05 is not in 23.09d. This is customer feedback on a technical issue that has arisen. Can we just sidestep the emotive and go back to 'the data is the data'?
For clarity:
-
I do understand that your instance of 23.05 is not triggering QAT interrupts for the traffic types I have given. We need to understand why those of us above have a different experience to yours.
-
I understand that you do not think QAT should be active on pfSense with SSH, nginx, curl, TLS/SSL, openSSL etc. This would equate to reduced feature-set from that stated in the Intel / freeBSD QAT documentation. That it appeared to work on 23.05 is in doubt as you have opined that this may be false reporting.
-
I understand that your current thinking is that the only things that would use QAT on pfSense+ is in the kernel space; more specifically only IPsec and OpenVPN DCO.
-
I understand that you do not expect pfSense+ to utilise QAT for any daemons or user-space. This reduced QAT functionality on pfSense+ would explain what I and others are observing on 23.09d (albeit not fully explaining the interrupts reported on 23.05).
-
However, you have also stated that Netgate has not reduced the Intel QAT functionality at all. This appears to be a contradiction to the bullet above.
Perhaps we are divided by a common language and the barrier of the written word but I really am investing time and effort to understand this and have been drawn down into a depth of the system that I don't think I should be as a customer.
So far we have learned that the Intel QAT / freeBSD implementation on the latest pfSense+ appears to be missing all functionality save for those executed through the kernel. This has (apparently) excluded all QAT user space functions including those frameworks directly enabled by Intel (eg OpenSSL, libcrypto etc) or via the QAT API (compression / decompression, SSL, TLS, nginx et al) and the QAT User Space Additional Functions.
I understand that the attempt to force a user space QAT driver on your device produced an error stating it only works on 4xxx QAT devices. The Intel QAT software release for freeBSD (which includes QAT user space support) makes no reference to the 4xxx QAT devices; it states that:
This software release is intended for platforms that contain:
• Intel C62x Chipset
• Intel Atom C3000 processor product family
• Intel QuickAssist Adapter 8960/ Intel QuickAssist Adapter 8970 (formerly known as "Lewis Hill")
• Intel Communications Chipset 8925 to 8955 Series
• Intel Atom P5300 processor product familyRefs: Package Version: QAT.B.3.12.0-00004 - June 2022 & GitHub - Intel - Asynch Mode for NGINX
@jimp I am sure you can see why some of us are confused as to the functionality of QAT in pfSense+ given the apparent (or at least appearance of) technical contradictions. This is not an attack on Netgate devs. We either have full QAT functionality on the C3xxx platforms or we don't. If we don't then this may be due to sound technical reasons, an error or oversight, a bug or just work in progress.
Regards, Rob
️
-
-
@RobbieTT said in 23.09d - Is QAT Broken?:
I understand that you do not think QAT should be active on pfSense with SSH, nginx, curl, TLS/SSL, openSSL etc. This would equate to reduced feature-set from that stated in the Intel / freeBSD QAT documentation. That it appeared to work on 23.05 is in doubt as you have opined that this may be false reporting.
To be honest, I don't understand why QAT would be active for ssh, sshd, nginx, curl, or anything else linked against openssl's libcrypto when the openssl qatengine is not present on either 23.05.1 or 23.09-dev. There is no driver installed that exposes QAT to userspace nor is there a userspace library to call it. All of the above processes are linked to openssl's libcrypto. In 23.05.1 its an openssl 1.1.x library and in 23.09-dev its an openssl 3.0.x library but in neither case would I expect QAT to be used.
The QAT interrupts we are seeing must be coming from some kernel packet processing. I've tried obtaining a packet capture for the WAN and separately for the LAN while doing various things but there aren't any packets that jump out at me as something that would use QAT. I'm almost wondering if there is something from the WAN that appears to be attempting to establish a tunnel that doesn't exist and perhaps that is triggering the QAT activity with 23.05.1 but in 23.09-dev the trigger in 23.09-dev is correctly filtered out.
I'm not worried that QAT is not being used in 23.09-dev for userspace because I don't think it was being used for userspace in 23.05.1. However, I would like it to be used for userspace in the future. I would also appreciate it if the Netgate pfSense documentation was a bit more specific about when QAT can be used and when it cannot. The text on System->Advanced->Miscellaneous page doesn't explicitly mention QAT.
A cryptographic accelerator module will use hardware support to speed up some cryptographic functions on systems which have the chip. Loading the BSD Crypto Device module will allow access to acceleration devices using drivers built into the kernel, such as Hifn or ubsec chipsets. If the firewall does not contain a crypto chip, this option will have no effect. To unload the selected module, set this option to "none" and then reboot.
-
@jaltman said in 23.09d - Is QAT Broken?:
...the openssl qatengine is not present on either 23.05.1 or 23.09-dev. There is no driver installed that exposes QAT to userspace nor is there a userspace library to call it.
So what do you think we are missing? The kernel files on the Intel documents all appear to be in place on pfSense, including the common and API:
/boot/kernel/qat_4xxx_fw.ko /boot/kernel/qat_dh895xcc_fw.ko /boot/kernel/qat_hw.ko /boot/kernel/qat_c2xxxfw.ko /boot/kernel/qat_c4xxx_fw.ko /boot/kernel/qat_common.ko /boot/kernel/qat_api.ko /boot/kernel/qat_c3xxx_fw.ko /boot/kernel/qat_c2xxx.ko /boot/kernel/qat_c62x_fw.ko /boot/kernel/qat.ko /boot/kernel/qat_200xx_fw.ko
The QAT engine is there and nothing stands out as missing, at least to my eyes:
qat0: <Intel c3xxx QuickAssist> mem 0x81500000-0x8153ffff,0x81540000-0x8157ffff at device 0.0 on pci1 qat0: qat_dev0 started 6 acceleration engines qat0: FW version: 4.18.0 qat0: Excessive clock measure delay qat_ocf0: <QAT engine> irq175: qat0:b1:353 @cpu0(domain0): 790224 irq176: qat0:b2:355 @cpu0(domain0): 659108 dev.qat_ocf.0.%parent: nexus0 dev.qat_ocf.0.%pnpinfo: dev.qat_ocf.0.%location: dev.qat_ocf.0.%driver: qat_ocf dev.qat_ocf.0.%desc: QAT engine dev.qat_ocf.%parent: dev.qat.0.frequency: 685000000 dev.qat.0.cnv_error: dev.qat.0.fw_counters: dev.qat.0.mmp_version: 6.0.0 dev.qat.0.hw_version: 17 dev.qat.0.fw_version: 4.18.0 dev.qat.0.heartbeat: 1 dev.qat.0.heartbeat_failed: 0 dev.qat.0.heartbeat_sent: 7 dev.qat.0.dev_cfg: [GENERAL] dev.qat.0.%parent: pci1 dev.qat.0.%pnpinfo: vendor=0x8086 device=0x19e2 subvendor=0x8086 subdevice=0x19e2 class=0x0b4000 dev.qat.0.%location: slot=0 function=0 dbsf=pci0:1:0:0 handle=\_SB_.PCI0.VRP2.PXSX dev.qat.0.%driver: qat dev.qat.0.%desc: Intel c3xxx QuickAssist dev.qat.%parent:
Is the
openssl qatengine
supposed to be located somewhere?️