23.09d - Is QAT Broken?
-
I'm not well-placed to test this as the main work VPN has moved to a different method, so I have no active VPN to test with.
However, even without VPN use I usually see minor QAT activity; typically traffic originating from pfSense itself (eg using TLS). With 23.09d I now see zero use of QAT.
For example, on 23.05 with no VPN use
vmstat
would show:[23.05-RELEASE][admin@Router-8.redacted.me]/root: vmstat -i | grep qat irq175: qat0:b1 58766 0 irq176: qat0:b2 50306 0 [23.05-RELEASE][admin@Router-8.redacted.me]/root: vmstat -i interrupt total rate irq16: sdhci_pci0+ 851 0 cpu0:timer 165496222 999 cpu1:timer 12993190 78 cpu2:timer 12832475 77 cpu3:timer 13726339 83 irq132: igc0:aq 2 0 irq143: igc3:rxq0 19867218 120 irq144: igc3:rxq1 5589075 34 irq145: igc3:rxq2 5026290 30 irq146: igc3:rxq3 3987482 24 irq147: igc3:aq 4 0 irq148: nvme0:admin 109 0 irq149: nvme0:io0 867687 5 irq150: nvme0:io1 848464 5 irq151: nvme0:io2 844418 5 irq152: nvme0:io3 844492 5 irq159: ix1:rxq0 13228573 80 irq160: ix1:rxq1 13794738 83 irq161: ix1:rxq2 10259754 62 irq162: ix1:rxq3 10325767 62 irq163: ix1:aq 15 0 irq175: qat0:b1 60980 0 irq176: qat0:b2 51956 0 Total 290646101 1755 [23.05-RELEASE][admin@Router-8.redacted.me]/root:
Now with 23.09d there is no traffic using QAT, no matter what the traffic being used by pfSense (updates, packages and anything else using TLS):
[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: vmstat -i | grep qat [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: vmstat -i interrupt total rate irq16: sdhci_pci0+ 813 0 cpu0:timer 71390683 1000 cpu1:timer 3199138 45 cpu2:timer 3128430 44 cpu3:timer 3099071 43 irq132: igc0:aq 1 0 irq143: igc3:rxq0 11236994 157 irq144: igc3:rxq1 2856745 40 irq145: igc3:rxq2 1878628 26 irq146: igc3:rxq3 2812566 39 irq147: igc3:aq 3 0 irq148: nvme0:admin 76 0 irq149: nvme0:io0 522650 7 irq150: nvme0:io1 489191 7 irq151: nvme0:io2 498077 7 irq152: nvme0:io3 501175 7 irq153: xhci0 72013 1 irq159: ix1:rxq0 5322352 75 irq160: ix1:rxq1 6048255 85 irq161: ix1:rxq2 4150278 58 irq162: ix1:rxq3 3683258 52 irq163: ix1:aq 18 0 Total 120890415 1693 [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:
The QAT details all appear to be in place (although I don't recall it being attached to pci1 before, but my memory and all...):
[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: sysctl -a | grep 'qat' qat0: <Intel c3xxx QuickAssist> mem 0x81500000-0x8153ffff,0x81540000-0x8157ffff at device 0.0 on pci1 qat0: qat_dev0 started 6 acceleration engines qat0: FW version: 4.18.0 qat0: Excessive clock measure delay qat_ocf0: <QAT engine> irq174: qat0:b0:351 @cpu0(domain0): 0 irq175: qat0:b1:353 @cpu0(domain0): 0 irq176: qat0:b2:355 @cpu0(domain0): 0 irq177: qat0:b3:357 @cpu0(domain0): 0 irq178: qat0:b4:359 @cpu0(domain0): 0 irq179: qat0:b5:361 @cpu0(domain0): 0 irq180: qat0:b6:363 @cpu0(domain0): 0 irq181: qat0:b7:365 @cpu0(domain0): 0 irq182: qat0:b8:367 @cpu0(domain0): 0 irq183: qat0:b9:369 @cpu0(domain0): 0 irq184: qat0:b10:371 @cpu0(domain0): 0 irq185: qat0:b11:373 @cpu0(domain0): 0 irq186: qat0:b12:375 @cpu0(domain0): 0 irq187: qat0:b13:377 @cpu0(domain0): 0 irq188: qat0:b14:379 @cpu0(domain0): 0 irq189: qat0:b15:381 @cpu0(domain0): 0 irq190: qat0:ae:383 @cpu0(domain0): 0 dev.qat_ocf.0.enable: 1 dev.qat_ocf.0.%parent: nexus0 dev.qat_ocf.0.%pnpinfo: dev.qat_ocf.0.%location: dev.qat_ocf.0.%driver: qat_ocf dev.qat_ocf.0.%desc: QAT engine dev.qat_ocf.%parent: dev.qat.0.frequency: 685000000 dev.qat.0.cnv_error: dev.qat.0.fw_counters: dev.qat.0.mmp_version: 6.0.0 dev.qat.0.hw_version: 17 dev.qat.0.fw_version: 4.18.0 dev.qat.0.heartbeat: 1 dev.qat.0.heartbeat_failed: 0 dev.qat.0.heartbeat_sent: 2 dev.qat.0.dev_cfg: [GENERAL] dev.qat.0.num_user_processes: 0 dev.qat.0.cfg_mode: ks dev.qat.0.cfg_services: sym;dc dev.qat.0.state: up dev.qat.0.%parent: pci1 dev.qat.0.%pnpinfo: vendor=0x8086 device=0x19e2 subvendor=0x8086 subdevice=0x19e2 class=0x0b4000 dev.qat.0.%location: slot=0 function=0 dbsf=pci0:1:0:0 handle=\_SB_.PCI0.VRP2.PXSX dev.qat.0.%driver: qat dev.qat.0.%desc: Intel c3xxx QuickAssist dev.qat.%parent: [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:
And, reassuringly, if I try to load the kernel manually I get the message you would expect:
[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: kldload qat_c3xxx kldload: can't load qat_c3xxx: module already loaded or in kernel [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:
The kldstat looks ok to my eyes too:
[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: kldstat -v | grep qat 11 1 0xffffffff84437000 4378 qat.ko (/boot/kernel/qat.ko) 699 nexus/qat 12 6 0xffffffff8443c000 14d60 qat_hw.ko (/boot/kernel/qat_hw.ko) 697 pci/qat_c4xxx 692 pci/qat_200xx 696 pci/qat_dh895xcc 693 pci/qat_4xxx 695 pci/qat_c3xxx 691 pci/qat_c62x 694 pci/qat_4xxxvf 13 9 0xffffffff84451000 2ff70 qat_common.ko (/boot/kernel/qat_common.ko) 689 qat_common 14 8 0xffffffff84481000 68cd8 qat_api.ko (/boot/kernel/qat_api.ko) 690 qat_api 15 1 0xffffffff844ea000 122c18 qat_c3xxx_fw.ko (/boot/kernel/qat_c3xxx_fw.ko) 698 qat_c3xxx_fw_fw [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:
I'm not sure what
dmsg
is trying to say about the clock:[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: dmesg | grep qat qat0: <Intel c3xxx QuickAssist> mem 0x81500000-0x8153ffff,0x81540000-0x8157ffff at device 0.0 on pci1 qat0: qat_dev0 started 6 acceleration engines qat0: FW version: 4.18.0 qat0: Excessive clock measure delay qat_ocf0: <QAT engine> [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:
Heartbeat is there:
dev.qat_ocf.0.enable: 1 dev.qat_ocf.0.%parent: nexus0 dev.qat_ocf.0.%pnpinfo: dev.qat_ocf.0.%location: dev.qat_ocf.0.%driver: qat_ocf dev.qat_ocf.0.%desc: QAT engine dev.qat_ocf.%parent: dev.qat.0.frequency: 685000000 dev.qat.0.cnv_error: +-----------------------------------------------------------------+ | CNV Error Freq Statistics for Qat Device | +-----------------------------------------------------------------+ |[AE 0]: TotalErrors: 0 : LastError: No Error [ 0] | |[AE 1]: TotalErrors: 0 : LastError: No Error [ 0] | |[AE 2]: TotalErrors: 0 : LastError: No Error [ 0] | |[AE 3]: TotalErrors: 0 : LastError: No Error [ 0] | |[AE 4]: TotalErrors: 0 : LastError: No Error [ 0] | |[AE 5]: TotalErrors: 0 : LastError: No Error [ 0] | dev.qat.0.fw_counters: +------------------------------------------------+ | FW Statistics for Qat Device | +------------------------------------------------+ AE 5 Firmware Responses:0 Firmware Requests:0 AE 4 Firmware Responses:0 Firmware Requests:0 AE 3 Firmware Responses:0 Firmware Requests:0 AE 2 Firmware Responses:0 Firmware Requests:0 AE 1 Firmware Responses:0 Firmware Requests:0 AE 0 Firmware Responses:0 Firmware Requests:0 dev.qat.0.mmp_version: 6.0.0 dev.qat.0.hw_version: 17 dev.qat.0.fw_version: 4.18.0 dev.qat.0.heartbeat: 1 dev.qat.0.heartbeat_failed: 0 dev.qat.0.heartbeat_sent: 1 dev.qat.0.dev_cfg: [GENERAL]
So QAT appears to be there, at least for the most part, but not being used at all.
Is this user error or is QAT broken in 23.09d?
๏ธ
-
It's working fine for me here on c3000
: dmesg | grep qat qat0: <Intel c3xxx QuickAssist> mem 0xdfd00000-0xdfd3ffff,0xdfd40000-0xdfd7ffff irq 18 at device 0.0 on pci1 qat0: qat_dev0 started 6 acceleration engines qat0: FW version: 4.18.0 qat0: Excessive clock measure delay qat_ocf0: <QAT engine> : vmstat -i | grep qat irq62: qat0:b1 40210 6 irq63: qat0:b2 11846 2
That's on the latest build from today.
There was a problem with C2000 QAT but that was fixed several days ago.
-
@jimp said in 23.09d - Is QAT Broken?:
Updated to today's snapshot but still no QAT on my 6100:
[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: dmesg | grep qat qat0: <Intel c3xxx QuickAssist> mem 0x81500000-0x8153ffff,0x81540000-0x8157ffff at device 0.0 on pci1 qat0: qat_dev0 started 6 acceleration engines qat0: FW version: 4.18.0 qat0: Excessive clock measure delay qat_ocf0: <QAT engine> [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: vmstat -i | grep qat [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:
A bit of a puzzle.
๏ธ
-
-
Do you also have IPsec-MB / IIMB loaded?
If so, it may be handling whatever encryption has been requested.
I don't see any interrupts on qat when I have both loaded here, but if I disable IPsec-MB, I do:
irq158: qat0:b2 1 0
-
@jimp said in 23.09d - Is QAT Broken?:
Do you also have IPsec-MB / IIMB loaded?
On my 4100, IPsec-MB is unchecked in the UI and "Intel QuickAssist (QAT)" is selected as the cryptographic hardware.
-
@jimp Tonight I will boot a 23.05.1 snapshot to confirm the prior behavior.
-
@jimp said in 23.09d - Is QAT Broken?:
Do you also have IPsec-MB / IIMB loaded?
I've not selected anything different, so just the default.
๏ธ
-
@jaltman said in 23.09d - Is QAT Broken?:
@jimp Tonight I will boot a 23.05.1 snapshot to confirm the prior behavior.
@jimp
Just rolled back to 23.05.1 and QAT works as expected:[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: vmstat -i | grep qat irq175: qat0:b1 46 0 irq176: qat0:b2 46 0 [23.05.1-RELEASE][admin@Router-8.redacted.me]/root: vmstat -i | grep qat irq175: qat0:b1 114 0 irq176: qat0:b2 90 0 [23.05.1-RELEASE][admin@Router-8.redacted.me]/root:
Back to the latest dev load and things go south:
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: vmstat -i | grep qat irq175: qat0:b1 176 0 irq176: qat0:b2 208 0 [23.05.1-RELEASE][admin@Router-8.redacted.me]/root: Netgate 6100 - Serial: 2xxxxxxxx8 - Netgate Device ID: redacted *** Welcome to Netgate pfSense Plus 23.09-DEVELOPMENT (amd64) on Router-8 *** Current Boot Environment: default Next Boot Environment: quick-20230930155240 [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: vmstat -i | grep qat [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:
It's only the 23.09d load that I have QAT issues with - something has broken, at least on my 6100.
๏ธ
-
@jimp
Loaded 23.09.a.20231002.0600 dev this morning - still no functioning QAT on my 6100:[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: vmstat -i | grep qat [23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:
๏ธ
-
If you check on the dashboard, does the list of accelerated algorithms match an algorithm in use by your VPNs?
For example on 23.09 the dashboard shows:
AES-CBC, AES-CCM, AES-GCM, AES-ICM, AES-XTS, SHA1, SHA256, SHA384, SHA512
I have two tunnels on the 4100 I am looking at. One has a P2 using AES-128, the other has a P2 using AES128-GCM. As I pass traffic over the tunnel, I see the interrupts on the QAT device increase. Note that it won't show any activity until the tunnel is connected and passing traffic.
irq157: qat0:b1 7 0 irq158: qat0:b2 6 0
It's also possible that something else that used to use QAT on 23.05.1 isn't using the same algorithm on 23.09 and now isn't being accelerated. For example if something used AES-128 before but now selected ChaCha20-Poly1305, then it wouldn't be using QAT.
-
@jimp
I'm not using a VPN at the moment, so that is not a factor. As described earlier, in this state the QAT would only be used for encrypted traffic that is requested and received by the router itself. I guess that almost all (if not all) will be TLS traffic.If I actively reach out from the router CLI with a given cypher, eg:
openssl s_client -host sdcstest.blob.core.windows.net -port 443 -cipher ECDHE-RSA-AES256-GCM-SHA384
I can achieve a correct SSL handshake with the cypher chosen but this traffic is also missing the QAT offload:
--- No client certificate CA names sent Peer signing digest: SHA256 Peer signature type: RSA-PSS Server Temp Key: ECDH, secp384r1, 384 bits --- SSL handshake has read 5764 bytes and written 442 bytes Verification: OK --- New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384 Server public key is 2048 bit Secure Renegotiation IS supported Compression: NONE Expansion: NONE No ALPN negotiated SSL-Session: Protocol : TLSv1.2 Cipher : ECDHE-RSA-AES256-GCM-SHA384 Session-ID: 552B0000932731A77EE20EBC73D3535D7A55980427FAC5D3... Session-ID-ctx: Master-Key: 99ED51BB752865C07DC71A98871DFEE6EBA67368F54CAE50... PSK identity: None PSK identity hint: None SRP username: None Start Time: 1696249971 Timeout : 7200 (sec) Verify return code: 0 (ok) Extended master secret: yes ---
There may be other test methods you prefer for diagnostics, so happy to try them too.
๏ธ
-
@RobbieTT said in 23.09d - Is QAT Broken?:
@jimp
I'm not using a VPN at the moment, so that is not a factor. As described earlier, in this state the QAT would only be used for encrypted traffic that is requested and received by the router itself. I guess that almost all (if not all) will be TLS traffic.What traffic exactly? What daemons do you have running on the firewall that would be using TLS/algorithms you expect to be accelerated?
Just the GUI (
nginx
) or something else? Thes_client
example you showed would only be hitting the GUI (nginx
)."QAT is broken" is a completely different statement than "QAT isn't working for daemons X, Y, Z"
We have already demonstrated that the former statement is untrue, what remains is determining the latter.
-
@jimp
Am I not correct in presuming that encrypted traffic originating from the router that uses TLS will use QAT?I don't know what daemons running on the router reach-out. I'm not good at guessing but router updates/checks, package updates, DoT (unbound), pfBlocker et al. I would be surprised if any of the services hosted on pfSense used unencrypted traffic but again, that is a guess from a non-developer.
The use of QAT has changed from 23.05.1 (and below) to 23.09 dev. I genuinely don't know why or how routine TLS traffic or a test openssl session with Microsoft fails to show any QAT usage. I read through the Intel QAT white-paper on testing and used the confirmation methods they gave.
Perhaps 'broken' is the wrong verb in dev-speak so if I need to use English in a different way please suggest a better one. As a customer I am trying my best here.
๏ธ
-
What I'm saying is you need to be a lot more specific. Yes, there is a change in behavior but you haven't even clearly defined what that is.
I ran some tests locally and it doesn't appear to be getting used by ssh, nginx, or openvpn at least and I seem to recall it worked at least with nginx in the past. I am not sure about outbound. I don't have anything left on 23.05.1 with QAT to confirm all of the old behavior.
Primarily what gets accelerated is use of the accelerated algorithms in the kernel, so IPsec and OpenVPN DCO are the main consumers.
What you'd need to check is looking at interrupts on the QAT device and see if they go up in proportion with traffic you send to things like SSH (type some lines, check the count, or SCP a large file), the GUI (check after refreshing the page), and so on. For testing outbound, you can try
pkg update -f
or try tocurl https://<something>
. -
Speaking with some others here the only things that would use QAT would be encryption in the kernel, so IPsec and OpenVPN DCO as I mentioned. It wouldn't get used for userspace daemons or clients.
So it would help to confirm what exactly was using QAT on 23.05.1 that you aren't seeing now.
-
I am trying to be specific as I can but I just don't have your knowledge. I can only give you exactly what I have tested and paste in the exact results given. Not that it matters, I did not state 'QAT is broken', I posted a question asking if it was, specific to a firmware load.
I have tried the testing methods listed in the Intel QAT paper, I have monitored the routine traffic originating from the router, I have run updates, basic
pkg update -f
&curl
to HTTPS sites and theopenssl
test session with Microsoft (as per their developer guide). There remains zero QAT interrupts, whatever I try.I am sure there are different crypto uses I have not tried and may work, but I can say that all the different (and somewhat basic) things I would expect QAT to work with now produce zero interrupts.
Have you tried any of the tests I have used and, if so, is everything working ok for you?
Apologies for being a pfSense newcomer and not having enough knowledge to resolve this on my own. If you need someone who can fly a fighter-jet or run a complex flight test profile I'm your guy but I need help with pfSense!
๏ธ
-
@jimp said in 23.09d - Is QAT Broken?:
Speaking with some others here the only things that would use QAT would be encryption in the kernel, so IPsec and OpenVPN DCO as I mentioned. It wouldn't get used for userspace daemons or clients.
So it would help to confirm what exactly was using QAT on 23.05.1 that you aren't seeing now.
Our messages crossed. How would you like me to test for that on a 23.05.1 snapshot?
๏ธ
-
Test that exactly as I mentioned before -- try running traffic through each daemon individually and see if you see interrupts on the QAT device when you do.
I just setup and tested an OpenVPN DCO tunnel here and it is also using QAT just like IPsec, both of which are in the kernel. That's what we expect to see.
So we need to figure out what you were seeing using DCO on 23.05.1 to narrow down what has changed in your environment.
-
@jimp
Thanks @jimp and happy to do so. To avoid more errata, how specifically do you want to achieve traffic through each daemon?Does this not conflict with your note that QAT is not used for userspace daemons?
Does the lack of QAT interrupts when using TLS, or a
curl
to HTTPS, orpkg update -f
or anopenssl
test session mean nothing and a zero result is actually the expected behaviour?I've become slightly confused to know if I have a problem or not?
I need more tea.
๏ธ