pfSense Active CARP Member Crashed: aesni_process -> crypto_dispatch ...
-
we are seeing the same exact problem. panics/reboots every 5-7 days.
setup: 2.4.4-p3, CARP, VLANs, site-to-site Ipsec(aesni in use). hardware was fully replaced to eliminate HW related problems.
May 16db:0:kdb.enter.default> show pcpu cpuid = 3 dynamic pcpu = 0xfffffe044c606380 curthread = 0xfffff8000c429000: pid 12 "irq351: ixl3:q3" curpcb = 0xfffffe0451afa400 fpcurthread = none idlethread = 0xfffff8000835b620: tid 100006 "idle: cpu3" curpmap = 0xffffffff82b85998 tssp = 0xffffffff82bb6948 commontssp = 0xffffffff82bb6948 rsp0 = 0xfffffe0451afa400 gs32p = 0xffffffff82bbd1a0 ldt = 0xffffffff82bbd1e0 tss = 0xffffffff82bbd1d0 db:0:kdb.enter.default> bt Tracing pid 12 tid 100284 td 0xfffff8000c429000 pf_test() at pf_test+0x1d24/frame 0xfffffe0451af9880 pf_check_out() at pf_check_out+0x1d/frame 0xfffffe0451af98a0 pfil_run_hooks() at pfil_run_hooks+0x90/frame 0xfffffe0451af9930 ip_output() at ip_output+0xb1d/frame 0xfffffe0451af9a60 ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe0451af9ab0 esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe0451af9b10 aesni_process() at aesni_process+0x151/frame 0xfffffe0451af9bc0 crypto_dispatch() at crypto_dispatch+0x140/frame 0xfffffe0451af9c00 esp_output() at esp_output+0x5cc/frame 0xfffffe0451af9ca0 ipsec4_perform_request() at ipsec4_perform_request+0x37f/frame 0xfffffe0451af9d40 ipsec4_forward() at ipsec4_forward+0x5a/frame 0xfffffe0451af9d70 ip_forward() at ip_forward+0x221/frame 0xfffffe0451af9e10 ip_input() at ip_input+0x72a/frame 0xfffffe0451af9e70 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451af9ec0 ether_demux() at ether_demux+0x173/frame 0xfffffe0451af9ef0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451af9f50 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451af9fa0 ether_input() at ether_input+0x26/frame 0xfffffe0451af9fc0 vlan_input() at vlan_input+0x215/frame 0xfffffe0451afa070 ether_demux() at ether_demux+0x15c/frame 0xfffffe0451afa0a0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451afa100 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451afa150 ether_input() at ether_input+0x26/frame 0xfffffe0451afa170 ixl_rxeof() at ixl_rxeof+0x47b/frame 0xfffffe0451afa210 ixl_msix_que() at ixl_msix_que+0x42/frame 0xfffffe0451afa260 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0451afa2a0 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0451afa2f0 fork_exit() at fork_exit+0x83/frame 0xfffffe0451afa330 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0451afa330
May 21
db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command; use "help" to list available commands db:1:lockinfo> show alllocks No such command; use "help" to list available commands db:1:lockinfo> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 4 dynamic pcpu = 0xfffffe044c610380 curthread = 0xfffff8000ccf6620: pid 12 "irq352: ixl3:q4" curpcb = 0xfffffe0451aff400 fpcurthread = none idlethread = 0xfffff8000835b000: tid 100007 "idle: cpu4" curpmap = 0xffffffff82b85998 tssp = 0xffffffff82bb69b0 commontssp = 0xffffffff82bb69b0 rsp0 = 0xfffffe0451aff400 gs32p = 0xffffffff82bbd208 ldt = 0xffffffff82bbd248 tss = 0xffffffff82bbd238 db:0:kdb.enter.default> bt Tracing pid 12 tid 100285 td 0xfffff8000ccf6620 ip_input() at ip_input+0x60e/frame 0xfffffe0451afee70 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451afeec0 ether_demux() at ether_demux+0x173/frame 0xfffffe0451afeef0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451afef50 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451afefa0 ether_input() at ether_input+0x26/frame 0xfffffe0451afefc0 vlan_input() at vlan_input+0x215/frame 0xfffffe0451aff070 ether_demux() at ether_demux+0x15c/frame 0xfffffe0451aff0a0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451aff100 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451aff150 ether_input() at ether_input+0x26/frame 0xfffffe0451aff170 ixl_rxeof() at ixl_rxeof+0x47b/frame 0xfffffe0451aff210 ixl_msix_que() at ixl_msix_que+0x42/frame 0xfffffe0451aff260 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0451aff2a0 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0451aff2f0 fork_exit() at fork_exit+0x83/frame 0xfffffe0451aff330 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0451aff330
May 24
db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command; use "help" to list available commands db:1:lockinfo> show alllocks No such command; use "help" to list available commands db:1:lockinfo> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 0 dynamic pcpu = 0x898380 curthread = 0xfffff8000c88f620: pid 12 "irq339: ixl2:q0" curpcb = 0xfffffe0451a16400 fpcurthread = none idlethread = 0xfffff8000834c000: tid 100003 "idle: cpu0" curpmap = 0xffffffff82b85998 tssp = 0xffffffff82bb6810 commontssp = 0xffffffff82bb6810 rsp0 = 0xfffffe0451a16400 gs32p = 0xffffffff82bbd068 ldt = 0xffffffff82bbd0a8 tss = 0xffffffff82bbd098 db:0:kdb.enter.default> bt Tracing pid 12 tid 100263 td 0xfffff8000c88f620 ip_output() at ip_output+0x1418/frame 0xfffffe0451a15a60 ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe0451a15ab0 esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe0451a15b10 aesni_process() at aesni_process+0x151/frame 0xfffffe0451a15bc0 crypto_dispatch() at crypto_dispatch+0x140/frame 0xfffffe0451a15c00 esp_output() at esp_output+0x5cc/frame 0xfffffe0451a15ca0 ipsec4_perform_request() at ipsec4_perform_request+0x37f/frame 0xfffffe0451a15d40 ipsec4_forward() at ipsec4_forward+0x5a/frame 0xfffffe0451a15d70 ip_forward() at ip_forward+0x221/frame 0xfffffe0451a15e10 ip_input() at ip_input+0x72a/frame 0xfffffe0451a15e70 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451a15ec0 ether_demux() at ether_demux+0x173/frame 0xfffffe0451a15ef0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451a15f50 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451a15fa0 ether_input() at ether_input+0x26/frame 0xfffffe0451a15fc0 vlan_input() at vlan_input+0x215/frame 0xfffffe0451a16070 ether_demux() at ether_demux+0x15c/frame 0xfffffe0451a160a0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451a16100 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451a16150 ether_input() at ether_input+0x26/frame 0xfffffe0451a16170 ixl_rxeof() at ixl_rxeof+0x47b/frame 0xfffffe0451a16210 ixl_msix_que() at ixl_msix_que+0x42/frame 0xfffffe0451a16260 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0451a162a0 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0451a162f0 fork_exit() at fork_exit+0x83/frame 0xfffffe0451a16330 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0451a16330
-
Hello mazafak,
What hardware are you using?
When you replaced the hardware, was it a like for like replacement or did you change the specification in any way?Thanks
M -
Probably also worth mentioning, we are using two Intel I350 Quad Port 1GbE network cards in each of our pfSense boxes.
-
first it was a dell R430 with Intel X520 10gbe card (ix)
now it is a supermicro with Intel X710 10gbe card (ixl)configuration was moved from the first host to the next. problem with crashes and reboots stayed.
here is a new one from 5/28db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command; use "help" to list available commands db:1:lockinfo> show alllocks No such command; use "help" to list available commands db:1:lockinfo> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 2 dynamic pcpu = 0xfffffe044c5fc380 curthread = 0xfffff8000c67b000: pid 12 "irq350: ixl3:q2" curpcb = 0xfffffe0451af5400 fpcurthread = none idlethread = 0xfffff8000834b000: tid 100005 "idle: cpu2" curpmap = 0xffffffff82b85998 tssp = 0xffffffff82bb68e0 commontssp = 0xffffffff82bb68e0 rsp0 = 0xfffffe0451af5400 gs32p = 0xffffffff82bbd138 ldt = 0xffffffff82bbd178 tss = 0xffffffff82bbd168 db:0:kdb.enter.default> bt Tracing pid 12 tid 100283 td 0xfffff8000c67b000 ip_output() at ip_output+0x1418/frame 0xfffffe0451af4a60 ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe0451af4ab0 esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe0451af4b10 aesni_process() at aesni_process+0x151/frame 0xfffffe0451af4bc0 crypto_dispatch() at crypto_dispatch+0x140/frame 0xfffffe0451af4c00 esp_output() at esp_output+0x5cc/frame 0xfffffe0451af4ca0 ipsec4_perform_request() at ipsec4_perform_request+0x37f/frame 0xfffffe0451af4d40 ipsec4_forward() at ipsec4_forward+0x5a/frame 0xfffffe0451af4d70 ip_forward() at ip_forward+0x221/frame 0xfffffe0451af4e10 ip_input() at ip_input+0x72a/frame 0xfffffe0451af4e70 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451af4ec0 ether_demux() at ether_demux+0x173/frame 0xfffffe0451af4ef0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451af4f50 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451af4fa0 ether_input() at ether_input+0x26/frame 0xfffffe0451af4fc0 vlan_input() at vlan_input+0x215/frame 0xfffffe0451af5070 ether_demux() at ether_demux+0x15c/frame 0xfffffe0451af50a0 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe0451af5100 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451af5150 ether_input() at ether_input+0x26/frame 0xfffffe0451af5170 ixl_rxeof() at ixl_rxeof+0x47b/frame 0xfffffe0451af5210 ixl_msix_que() at ixl_msix_que+0x42/frame 0xfffffe0451af5260 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0451af52a0 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0451af52f0 fork_exit() at fork_exit+0x83/frame 0xfffffe0451af5330 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0451af5330
-
Hi Mazafak,
Any chance that you could let us know what model CPUs/chipsets you have reproduced the issue with?
Also the ciphers/hash algorithms used?Thank you.
I've made a mistake in the initial post. We are using AES256-GCM, not AES256-CBC.
-
I took ipsec settings from:
https://docs.netgate.com/pfsense/en/latest/vpn/scaling.html (scroll down to "Optimal Encryption Settings")the supermicro mb is X11SDV-8C-TP8F, that's the one having crashes similar to the dell R430.
what is interesting is that on the other side of the ipsec tunnel we have another pfsense running on Super Micro XG-1537. it doesn't have VLAN/LACP setup and it has been rock solid for over a year.
so, it could be the VLAN code that's crashing for me. I will attempt upgrading to 2.4.5-p1 once it is available to see if it is any better.
-
Thank you Mazafak.
We had another crash last night:
Tracing pid 12 tid 100065 td 0xfffff80004359000 kdb_enter() at kdb_enter+0x3b/frame 0xfffffe003e1044a0 vpanic() at vpanic+0x19b/frame 0xfffffe003e104500 panic() at panic+0x43/frame 0xfffffe003e104560 trap_pfault() at trap_pfault/frame 0xfffffe003e1045b0 trap_pfault() at trap_pfault+0x49/frame 0xfffffe003e104610 trap() at trap+0x29d/frame 0xfffffe003e104720 calltrap() at calltrap+0x8/frame 0xfffffe003e104720 --- trap 0xc, rip = 0xffffffff80e8127a, rsp = 0xfffffe003e1047f0, rbp = 0xfffffe003e104870 --- ip_input() at ip_input+0x5da/frame 0xfffffe003e104870 swi_net() at swi_net+0x143/frame 0xfffffe003e1048e0 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe003e104920 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe003e104970 fork_exit() at fork_exit+0x83/frame 0xfffffe003e1049b0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe003e1049b0
-
We had very similar crashes on different pfSense 2.4.4 machines along with
kernel: [zone: pf frag entries] PF frag entries limit reached
messages in logs.Crashes are gone after
Enable MSS clamping on VPN traffic
was enabled in IPsec advanced settings. Maybe its your case too ? -
@astabing said in pfSense Active CARP Member Crashed: aesni_process -> crypto_dispatch ...:
We had very similar crashes on different pfSense 2.4.4 machines along with
kernel: [zone: pf frag entries] PF frag entries limit reached
messages in logs.Crashes are gone after
Enable MSS clamping on VPN traffic
was enabled in IPsec advanced settings. Maybe its your case too ?We've never seen
kernel: [zone: pf frag entries] PF frag entries limit reached
and we enabled MSS clamping last week to resolve an IPSec throughput issue. We had another crash within 48 hours of making that change, so it didn't resolve it.We've raised a FreeBSD Bugzilla report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246951.
Thanks
-
Looks like this may be fixed:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246951#c16
-
opened https://redmine.pfsense.org/issues/10745
so that we know when this gets applied to pfsense and when we can go back to IPSec.