pfSense Active CARP Member Crashed: aesni_process -> crypto_dispatch ...
-
Hello,
We experienced a pfSense Firewall hard crash and reboot yesterday. Reviewing the debug information, we suspect an issue with IPSec/AES-NI. At the time, the firewall was processing ~70Mb/s of inbound IPSec traffic and 30Mb/s of outbound IPSec traffic. This is considerably below our peak and the Firewalls have been stable since we upgraded them to pfSense 2.4.4-p3 late last year.
The Firewall itself is a Dell R330 with a Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz. AES-NI CPU Crypto is active. We are using IPSec with AES256-CBC/SHA-256 for the phase 1 and phase 2. It is a site-to-site configuration.
Confirmation of pfSense version:
FreeBSD 11.2-RELEASE-p10 #9 4a2bfdce133(RELENG_2_4_4): Wed May 15 18:54:42 EDT 2019 root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-244/obj/amd64/ZfGpH5cd/build/ce-crossbuild-244/pfSense/tmp/FreeBSD-src/sys/pfSense
Buffer:
Fatal trap 12: page fault while in kernel mode cpuid = 0; Fatal trap 12: page fault while in kernel mode Fatal trap 12: page fault while in kernel mode cpuid = 7; apic id = 07 cpuid = 5; apic id = 05 Fatal trap 12: page fault while in kernel mode fault virtual address = 0x1 cpuid = 1; fault code = supervisor read data, page not present apic id = 01 instruction pointer = 0x20:0xffffffff80f447a4 stack pointer = 0x28:0xfffffe0000309e90 frame pointer = 0x28:0xfffffe000030a0c0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 fault virtual address = 0x1 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f447a4 stack pointer = 0x28:0xfffffe000031de90 frame pointer = 0x28:0xfffffe000031e0c0 fault virtual address = 0x1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq287: igb2:que 5)
Stack trace:
Tracing pid 12 tid 100126 td 0xfffff80004704620 pf_test() at pf_test+0x1d24/frame 0xfffffe000030a0c0 pf_check_out() at pf_check_out+0x1d/frame 0xfffffe000030a0e0 pfil_run_hooks() at pfil_run_hooks+0x90/frame 0xfffffe000030a170 ip_output() at ip_output+0xb1d/frame 0xfffffe000030a2a0 ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe000030a2f0 esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe000030a350 aesni_process() at aesni_process+0x151/frame 0xfffffe000030a400 crypto_dispatch() at crypto_dispatch+0x140/frame 0xfffffe000030a440 esp_output() at esp_output+0x5cc/frame 0xfffffe000030a4e0 ipsec4_perform_request() at ipsec4_perform_request+0x37f/frame 0xfffffe000030a580 ipsec4_forward() at ipsec4_forward+0x5a/frame 0xfffffe000030a5b0 ip_forward() at ip_forward+0x221/frame 0xfffffe000030a650 ip_input() at ip_input+0x72a/frame 0xfffffe000030a6b0 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe000030a700 ether_demux() at ether_demux+0x173/frame 0xfffffe000030a730 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe000030a790 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe000030a7e0 ether_input() at ether_input+0x26/frame 0xfffffe000030a800 igb_rxeof() at igb_rxeof+0x6e1/frame 0xfffffe000030a890 igb_msix_que() at igb_msix_que+0x110/frame 0xfffffe000030a8e0 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe000030a920 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe000030a970 fork_exit() at fork_exit+0x83/frame 0xfffffe000030a9b0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000030a9b0
Has this issue been seen before? The only similar issue we can find reported on the forum is this one:
https://forum.netgate.com/topic/122238/pfsense-2-4-1-ikev2-ipsec-tunnel-under-load-crashes-whole-firewall-vm/27Thanks
M -
That doesn't quite look like the other thread you mentioned, which I believe was also tied to https://redmine.pfsense.org/issues/8070 -- your crash has a bit different backtrace and also those were using AES-GCM, not AES-256.
I'm not finding anything else that lines up exactly with what you are seeing here on current versions of FreeBSD.
Does the behavior change if you disable AES-NI/cryptodev?
Could you possibly try a 2.4.5-RC snapshot and see if it happens there?
-
Thank you Jimp.
We had a re-occurrence last night. That was 12 days after the first incident. We are going to leave it another 12 days and then upgrade to 2.4.5. -
This has happened again, still on 2.4.4-p3:
Tracing pid 12 tid 100124 td 0xfffff80004760000 pf_test() at pf_test+0x1d24/frame 0xfffffe00003000c0 pf_check_out() at pf_check_out+0x1d/frame 0xfffffe00003000e0 pfil_run_hooks() at pfil_run_hooks+0x90/frame 0xfffffe0000300170 ip_output() at ip_output+0xb1d/frame 0xfffffe00003002a0 ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe00003002f0 esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe0000300350 aesni_process() at aesni_process+0x151/frame 0xfffffe0000300400 crypto_dispatch() at crypto_dispatch+0x140/frame 0xfffffe0000300440 esp_output() at esp_output+0x5cc/frame 0xfffffe00003004e0 ipsec4_perform_request() at ipsec4_perform_request+0x37f/frame 0xfffffe0000300580 ipsec4_forward() at ipsec4_forward+0x5a/frame 0xfffffe00003005b0 ip_forward() at ip_forward+0x221/frame 0xfffffe0000300650 ip_input() at ip_input+0x72a/frame 0xfffffe00003006b0 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0000300700 ether_demux() at ether_demux+0x173/frame 0xfffffe0000300730 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0000300790 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe00003007e0 ether_input() at ether_input+0x26/frame 0xfffffe0000300800 igb_rxeof() at igb_rxeof+0x6e1/frame 0xfffffe0000300890 igb_msix_que() at igb_msix_que+0x110/frame 0xfffffe00003008e0 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0000300920 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0000300970 fork_exit() at fork_exit+0x83/frame 0xfffffe00003009b0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00003009b0
That's the third time, 19 days since the last crash.
We have started rolling 2.4.5 out across our sites and dare say this firewall will be upgraded to 2.4.5 before it crashes again.Will advise whether the upgrade resolves the problem.
-
We upgraded to 2.4.5 on Monday (2020-05-04).
Tonight the primary firewall crashed again tonight. Stack trace looks very similar from line 7 onwards.We are going to turn off AES-NI acceleration later on tonight.
Tracing pid 12 tid 100135 td 0xfffff8000470f620 kdb_enter() at kdb_enter+0x3b/frame 0xfffffe0000336e10 vpanic() at vpanic+0x19b/frame 0xfffffe0000336e70 panic() at panic+0x43/frame 0xfffffe0000336ed0 trap_pfault() at trap_pfault/frame 0xfffffe0000336f20 trap_pfault() at trap_pfault+0x49/frame 0xfffffe0000336f80 trap() at trap+0x29d/frame 0xfffffe0000337090 calltrap() at calltrap+0x8/frame 0xfffffe0000337090 --- trap 0xc, rip = 0xffffffff80e89c3b, rsp = 0xfffffe0000337160, rbp = 0xfffffe0000337280 --- ip_output() at ip_output+0x12fb/frame 0xfffffe0000337280 ipsec_process_done() at ipsec_process_done+0x1c7/frame 0xfffffe00003372d0 esp_output_cb() at esp_output_cb+0xea/frame 0xfffffe0000337330 aesni_process() at aesni_process+0x151/frame 0xfffffe00003373e0 crypto_dispatch() at crypto_dispatch+0x14d/frame 0xfffffe0000337410 esp_output() at esp_output+0x601/frame 0xfffffe00003374b0 ipsec4_perform_request() at ipsec4_perform_request+0x38c/frame 0xfffffe0000337550 ipsec4_forward() at ipsec4_forward+0x5a/frame 0xfffffe0000337580 ip_forward() at ip_forward+0x230/frame 0xfffffe0000337620 ip_input() at ip_input+0x724/frame 0xfffffe00003376b0 netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe0000337700 ether_demux() at ether_demux+0x15b/frame 0xfffffe0000337730 ether_nh_input() at ether_nh_input+0x32c/frame 0xfffffe0000337790 netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe00003377e0 ether_input() at ether_input+0x26/frame 0xfffffe0000337800 igb_rxeof() at igb_rxeof+0x6d5/frame 0xfffffe0000337890 igb_msix_que() at igb_msix_que+0x101/frame 0xfffffe00003378e0 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0000337920 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0000337970 fork_exit() at fork_exit+0x83/frame 0xfffffe00003379b0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00003379b0
FreeBSD 11.3-STABLE #236 21cbb70bbd1(RELENG_2_4_5): Tue Mar 24 15:26:53 EDT 2020 root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-245/obj/amd64/YNx4Qq3j/build/ce-crossbuild-245/sources/FreeBSD-src/sys/pfSense
-
We turned off AES-NI but that capped IpSec VPN traffic to 8 Mb/s (seemed very drastic?)
We've turned AES-NI back on.We've got physically identical firewalls ready to go in at one of our other sites. Once we have those in situ, we will look at firmware upgrades and tweaking AES ciphers.
-
Quick update, we experienced an identical crash on 2020-05-08.
Will keep apprised for developments as we hopefully, in the coming weeks, roll out firmware upgrades. -
Under VPN > IPsec, Advanced settings tab, do you have Asynchronous Cryptography checked? If so, try unchecking it.
-
Hi Jimp,
We don't have the asynchronous cryptography option checked.
Thank you.
-
Another crash late last night.
Different stack trace this time.We haven't made any changes as yet as we have been waiting for our physically identical firewalls at our other site get some bedding in time.
Tracing pid 12 tid 100067 td 0xfffff8000435e000 kdb_enter() at kdb_enter+0x3b/frame 0xfffffe003e10e4a0 vpanic() at vpanic+0x19b/frame 0xfffffe003e10e500 panic() at panic+0x43/frame 0xfffffe003e10e560 trap_pfault() at trap_pfault/frame 0xfffffe003e10e5b0 trap_pfault() at trap_pfault+0x49/frame 0xfffffe003e10e610 trap() at trap+0x29d/frame 0xfffffe003e10e720 calltrap() at calltrap+0x8/frame 0xfffffe003e10e720 --- trap 0xc, rip = 0xffffffff80e8127a, rsp = 0xfffffe003e10e7f0, rbp = 0xfffffe003e10e870 --- ip_input() at ip_input+0x5da/frame 0xfffffe003e10e870 swi_net() at swi_net+0x143/frame 0xfffffe003e10e8e0 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe003e10e920 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe003e10e970 fork_exit() at fork_exit+0x83/frame 0xfffffe003e10e9b0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe003e10e9b0
-
I'm not sure whether this will help in shining any light on the issue, but I've collated the relevant contents of msgbuf.txt from each crash:
2020-05-26
Fatal trap 12: page fault while in kernel mode cpuid = 7; apic id = 07 fault virtual address = 0x3800 Fatal trap 12: page fault while in kernel mode fault code = supervisor write data, page not present cpuid = 3; apic id = 03 instruction pointer = 0x20:0xffffffff80e89c3b fault virtual address = 0x1 stack pointer = 0x0:0xfffffe000037d160 frame pointer = 0x0:0xfffffe000037d280 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 Fatal trap 12: page fault while in kernel mode fault virtual address = 0x800 cpuid = 5; apic id = 05 fault virtual address = 0x1 fault code = supervisor read data, page not present fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80e8127a stack pointer = 0x0:0xfffffe003e10e7f0 frame pointer = 0x0:0xfffffe003e10e870 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi1: netisr 6) trap number = 12 panic: page fault cpuid = 5 KDB: enter: panic
2020-05-16
Fatal trap 12: page fault while in kernel mode Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x1000 fault code = supervisor write data, page not present Fatal trap 12: page fault while in kernel mode Fatal trap 12: page fault while in kernel mode cpuid = 6; apic id = 06 fault virtual address = 0x3000 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80e89c3b stack pointer = 0x28:0xfffffe0000373160 frame pointer = 0x28:0xfffffe0000373280 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq288: igb2:que 6) trap number = 12 panic: page fault cpuid = 6 KDB: enter: panic
2020-05-08
Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x1000 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80e89c3b Fatal trap 12: page fault while in kernel mode stack pointer = 0x28:0xfffffe000034b160 frame pointer = 0x28:0xfffffe000034b280 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 Fatal trap 12: page fault while in kernel mode processor eflags = interrupt enabled, cpuid = 6; apic id = 06 fault virtual address = 0x3000 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80e89c3b stack pointer = 0x28:0xfffffe0000373160 frame pointer = 0x28:0xfffffe0000373280 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 12 (irq284: igb2:que 2) trap number = 12 panic: page fault cpuid = 2 KDB: enter: panic
2020-05-06
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80e89c3b stack pointer = 0x28:0xfffffe0000337160 frame pointer = 0x28:0xfffffe0000337280 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq282: igb2:que 0) trap number = 12 panic: page fault cpuid = 0 KDB: enter: panic
2020-04-14
Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 fault virtual address = 0x1 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f447a4 stack pointer = 0x28:0xfffffe00002ffe90 frame pointer = 0x28:0xfffffe00003000c0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq286: igb2:que 4)
2020-03-15
Fatal trap 12: page fault while in kernel mode cpuid = 0; Fatal trap 12: page fault while in kernel mode Fatal trap 12: page fault while in kernel mode cpuid = 7; apic id = 07 cpuid = 5; apic id = 05 Fatal trap 12: page fault while in kernel mode fault virtual address = 0x1 cpuid = 1; fault code = supervisor read data, page not present apic id = 01 instruction pointer = 0x20:0xffffffff80f447a4 stack pointer = 0x28:0xfffffe0000309e90 frame pointer = 0x28:0xfffffe000030a0c0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 fault virtual address = 0x1 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f447a4 stack pointer = 0x28:0xfffffe000031de90 frame pointer = 0x28:0xfffffe000031e0c0 fault virtual address = 0x1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq287: igb2:que 5)
We are going to do some benchmarking with OpenVPN today and see if we can use it instead of ipSec for a couple of weeks. We're also discussing swapping the master/stand-by firewalls around, in order to rule-in/rule-out are hardware problem.
-
we are seeing the same exact problem. panics/reboots every 5-7 days.
setup: 2.4.4-p3, CARP, VLANs, site-to-site Ipsec(aesni in use). hardware was fully replaced to eliminate HW related problems.
May 16db:0:kdb.enter.default> show pcpu cpuid = 3 dynamic pcpu = 0xfffffe044c606380 curthread = 0xfffff8000c429000: pid 12 "irq351: ixl3:q3" curpcb = 0xfffffe0451afa400 fpcurthread = none idlethread = 0xfffff8000835b620: tid 100006 "idle: cpu3" curpmap = 0xffffffff82b85998 tssp = 0xffffffff82bb6948 commontssp = 0xffffffff82bb6948 rsp0 = 0xfffffe0451afa400 gs32p = 0xffffffff82bbd1a0 ldt = 0xffffffff82bbd1e0 tss = 0xffffffff82bbd1d0 db:0:kdb.enter.default> bt Tracing pid 12 tid 100284 td 0xfffff8000c429000 pf_test() at pf_test+0x1d24/frame 0xfffffe0451af9880 pf_check_out() at pf_check_out+0x1d/frame 0xfffffe0451af98a0 pfil_run_hooks() at pfil_run_hooks+0x90/frame 0xfffffe0451af9930 ip_output() at ip_output+0xb1d/frame 0xfffffe0451af9a60 ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe0451af9ab0 esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe0451af9b10 aesni_process() at aesni_process+0x151/frame 0xfffffe0451af9bc0 crypto_dispatch() at crypto_dispatch+0x140/frame 0xfffffe0451af9c00 esp_output() at esp_output+0x5cc/frame 0xfffffe0451af9ca0 ipsec4_perform_request() at ipsec4_perform_request+0x37f/frame 0xfffffe0451af9d40 ipsec4_forward() at ipsec4_forward+0x5a/frame 0xfffffe0451af9d70 ip_forward() at ip_forward+0x221/frame 0xfffffe0451af9e10 ip_input() at ip_input+0x72a/frame 0xfffffe0451af9e70 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451af9ec0 ether_demux() at ether_demux+0x173/frame 0xfffffe0451af9ef0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451af9f50 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451af9fa0 ether_input() at ether_input+0x26/frame 0xfffffe0451af9fc0 vlan_input() at vlan_input+0x215/frame 0xfffffe0451afa070 ether_demux() at ether_demux+0x15c/frame 0xfffffe0451afa0a0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451afa100 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451afa150 ether_input() at ether_input+0x26/frame 0xfffffe0451afa170 ixl_rxeof() at ixl_rxeof+0x47b/frame 0xfffffe0451afa210 ixl_msix_que() at ixl_msix_que+0x42/frame 0xfffffe0451afa260 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0451afa2a0 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0451afa2f0 fork_exit() at fork_exit+0x83/frame 0xfffffe0451afa330 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0451afa330
May 21
db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command; use "help" to list available commands db:1:lockinfo> show alllocks No such command; use "help" to list available commands db:1:lockinfo> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 4 dynamic pcpu = 0xfffffe044c610380 curthread = 0xfffff8000ccf6620: pid 12 "irq352: ixl3:q4" curpcb = 0xfffffe0451aff400 fpcurthread = none idlethread = 0xfffff8000835b000: tid 100007 "idle: cpu4" curpmap = 0xffffffff82b85998 tssp = 0xffffffff82bb69b0 commontssp = 0xffffffff82bb69b0 rsp0 = 0xfffffe0451aff400 gs32p = 0xffffffff82bbd208 ldt = 0xffffffff82bbd248 tss = 0xffffffff82bbd238 db:0:kdb.enter.default> bt Tracing pid 12 tid 100285 td 0xfffff8000ccf6620 ip_input() at ip_input+0x60e/frame 0xfffffe0451afee70 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451afeec0 ether_demux() at ether_demux+0x173/frame 0xfffffe0451afeef0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451afef50 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451afefa0 ether_input() at ether_input+0x26/frame 0xfffffe0451afefc0 vlan_input() at vlan_input+0x215/frame 0xfffffe0451aff070 ether_demux() at ether_demux+0x15c/frame 0xfffffe0451aff0a0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451aff100 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451aff150 ether_input() at ether_input+0x26/frame 0xfffffe0451aff170 ixl_rxeof() at ixl_rxeof+0x47b/frame 0xfffffe0451aff210 ixl_msix_que() at ixl_msix_que+0x42/frame 0xfffffe0451aff260 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0451aff2a0 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0451aff2f0 fork_exit() at fork_exit+0x83/frame 0xfffffe0451aff330 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0451aff330
May 24
db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command; use "help" to list available commands db:1:lockinfo> show alllocks No such command; use "help" to list available commands db:1:lockinfo> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 0 dynamic pcpu = 0x898380 curthread = 0xfffff8000c88f620: pid 12 "irq339: ixl2:q0" curpcb = 0xfffffe0451a16400 fpcurthread = none idlethread = 0xfffff8000834c000: tid 100003 "idle: cpu0" curpmap = 0xffffffff82b85998 tssp = 0xffffffff82bb6810 commontssp = 0xffffffff82bb6810 rsp0 = 0xfffffe0451a16400 gs32p = 0xffffffff82bbd068 ldt = 0xffffffff82bbd0a8 tss = 0xffffffff82bbd098 db:0:kdb.enter.default> bt Tracing pid 12 tid 100263 td 0xfffff8000c88f620 ip_output() at ip_output+0x1418/frame 0xfffffe0451a15a60 ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe0451a15ab0 esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe0451a15b10 aesni_process() at aesni_process+0x151/frame 0xfffffe0451a15bc0 crypto_dispatch() at crypto_dispatch+0x140/frame 0xfffffe0451a15c00 esp_output() at esp_output+0x5cc/frame 0xfffffe0451a15ca0 ipsec4_perform_request() at ipsec4_perform_request+0x37f/frame 0xfffffe0451a15d40 ipsec4_forward() at ipsec4_forward+0x5a/frame 0xfffffe0451a15d70 ip_forward() at ip_forward+0x221/frame 0xfffffe0451a15e10 ip_input() at ip_input+0x72a/frame 0xfffffe0451a15e70 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451a15ec0 ether_demux() at ether_demux+0x173/frame 0xfffffe0451a15ef0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451a15f50 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451a15fa0 ether_input() at ether_input+0x26/frame 0xfffffe0451a15fc0 vlan_input() at vlan_input+0x215/frame 0xfffffe0451a16070 ether_demux() at ether_demux+0x15c/frame 0xfffffe0451a160a0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451a16100 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451a16150 ether_input() at ether_input+0x26/frame 0xfffffe0451a16170 ixl_rxeof() at ixl_rxeof+0x47b/frame 0xfffffe0451a16210 ixl_msix_que() at ixl_msix_que+0x42/frame 0xfffffe0451a16260 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0451a162a0 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0451a162f0 fork_exit() at fork_exit+0x83/frame 0xfffffe0451a16330 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0451a16330
-
Hello mazafak,
What hardware are you using?
When you replaced the hardware, was it a like for like replacement or did you change the specification in any way?Thanks
M -
Probably also worth mentioning, we are using two Intel I350 Quad Port 1GbE network cards in each of our pfSense boxes.
-
first it was a dell R430 with Intel X520 10gbe card (ix)
now it is a supermicro with Intel X710 10gbe card (ixl)configuration was moved from the first host to the next. problem with crashes and reboots stayed.
here is a new one from 5/28db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command; use "help" to list available commands db:1:lockinfo> show alllocks No such command; use "help" to list available commands db:1:lockinfo> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 2 dynamic pcpu = 0xfffffe044c5fc380 curthread = 0xfffff8000c67b000: pid 12 "irq350: ixl3:q2" curpcb = 0xfffffe0451af5400 fpcurthread = none idlethread = 0xfffff8000834b000: tid 100005 "idle: cpu2" curpmap = 0xffffffff82b85998 tssp = 0xffffffff82bb68e0 commontssp = 0xffffffff82bb68e0 rsp0 = 0xfffffe0451af5400 gs32p = 0xffffffff82bbd138 ldt = 0xffffffff82bbd178 tss = 0xffffffff82bbd168 db:0:kdb.enter.default> bt Tracing pid 12 tid 100283 td 0xfffff8000c67b000 ip_output() at ip_output+0x1418/frame 0xfffffe0451af4a60 ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe0451af4ab0 esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe0451af4b10 aesni_process() at aesni_process+0x151/frame 0xfffffe0451af4bc0 crypto_dispatch() at crypto_dispatch+0x140/frame 0xfffffe0451af4c00 esp_output() at esp_output+0x5cc/frame 0xfffffe0451af4ca0 ipsec4_perform_request() at ipsec4_perform_request+0x37f/frame 0xfffffe0451af4d40 ipsec4_forward() at ipsec4_forward+0x5a/frame 0xfffffe0451af4d70 ip_forward() at ip_forward+0x221/frame 0xfffffe0451af4e10 ip_input() at ip_input+0x72a/frame 0xfffffe0451af4e70 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451af4ec0 ether_demux() at ether_demux+0x173/frame 0xfffffe0451af4ef0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451af4f50 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451af4fa0 ether_input() at ether_input+0x26/frame 0xfffffe0451af4fc0 vlan_input() at vlan_input+0x215/frame 0xfffffe0451af5070 ether_demux() at ether_demux+0x15c/frame 0xfffffe0451af50a0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451af5100 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451af5150 ether_input() at ether_input+0x26/frame 0xfffffe0451af5170 ixl_rxeof() at ixl_rxeof+0x47b/frame 0xfffffe0451af5210 ixl_msix_que() at ixl_msix_que+0x42/frame 0xfffffe0451af5260 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0451af52a0 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0451af52f0 fork_exit() at fork_exit+0x83/frame 0xfffffe0451af5330 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0451af5330
-
Hi Mazafak,
Any chance that you could let us know what model CPUs/chipsets you have reproduced the issue with?
Also the ciphers/hash algorithms used?Thank you.
I've made a mistake in the initial post. We are using AES256-GCM, not AES256-CBC.
-
I took ipsec settings from:
https://docs.netgate.com/pfsense/en/latest/vpn/scaling.html (scroll down to "Optimal Encryption Settings")the supermicro mb is X11SDV-8C-TP8F, that's the one having crashes similar to the dell R430.
what is interesting is that on the other side of the ipsec tunnel we have another pfsense running on Super Micro XG-1537. it doesn't have VLAN/LACP setup and it has been rock solid for over a year.
so, it could be the VLAN code that's crashing for me. I will attempt upgrading to 2.4.5-p1 once it is available to see if it is any better.
-
Thank you Mazafak.
We had another crash last night:
Tracing pid 12 tid 100065 td 0xfffff80004359000 kdb_enter() at kdb_enter+0x3b/frame 0xfffffe003e1044a0 vpanic() at vpanic+0x19b/frame 0xfffffe003e104500 panic() at panic+0x43/frame 0xfffffe003e104560 trap_pfault() at trap_pfault/frame 0xfffffe003e1045b0 trap_pfault() at trap_pfault+0x49/frame 0xfffffe003e104610 trap() at trap+0x29d/frame 0xfffffe003e104720 calltrap() at calltrap+0x8/frame 0xfffffe003e104720 --- trap 0xc, rip = 0xffffffff80e8127a, rsp = 0xfffffe003e1047f0, rbp = 0xfffffe003e104870 --- ip_input() at ip_input+0x5da/frame 0xfffffe003e104870 swi_net() at swi_net+0x143/frame 0xfffffe003e1048e0 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe003e104920 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe003e104970 fork_exit() at fork_exit+0x83/frame 0xfffffe003e1049b0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe003e1049b0
-
We had very similar crashes on different pfSense 2.4.4 machines along with
kernel: [zone: pf frag entries] PF frag entries limit reached
messages in logs.Crashes are gone after
Enable MSS clamping on VPN traffic
was enabled in IPsec advanced settings. Maybe its your case too ? -
@astabing said in pfSense Active CARP Member Crashed: aesni_process -> crypto_dispatch ...:
We had very similar crashes on different pfSense 2.4.4 machines along with
kernel: [zone: pf frag entries] PF frag entries limit reached
messages in logs.Crashes are gone after
Enable MSS clamping on VPN traffic
was enabled in IPsec advanced settings. Maybe its your case too ?We've never seen
kernel: [zone: pf frag entries] PF frag entries limit reached
and we enabled MSS clamping last week to resolve an IPSec throughput issue. We had another crash within 48 hours of making that change, so it didn't resolve it.We've raised a FreeBSD Bugzilla report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246951.
Thanks