esp_input_cb() panic
-
Just adding to this, since moving my VPN setup from OpenVPN to WireGuard, I'm also experiencing intermittent crashing with no real indication or pattern as to what it ultimately is being caused by. The only commonality is that this is also since moving to WireGuard on 2.5.0.
There is a constant stream of UDP traffic in/out the wg0 interface (which is about 100% of my total available bandwidth), wg1 is assigned to a guest VLAN and currently has no traffic going through it.
I did have Suricata monitoring the gateway interface for wg0 initially, but after the first crash, disabled this and turned it off to rule that out.
edit I've added another dump from a crash which has just happened now (textdump-2.tar).
-
I split your topic off since the crash backtraces don't point to WireGuard where you'd originally posted it.
Both of the crash dumps have the same backtrace and it looks more like IPsec to me:
db:0:kdb.enter.default> show pcpu cpuid = 0 dynamic pcpu = 0x993380 curthread = 0xfffff800042e6740: pid 5 tid 100059 "crypto returns 2" curpcb = 0xfffff800042e6ce0 fpcurthread = none idlethread = 0xfffff80004218000: tid 100003 "idle: cpu0" curpmap = 0xffffffff8368d5a8 tssp = 0xffffffff83717620 commontssp = 0xffffffff83717620 rsp0 = 0xfffffe0000442cc0 kcr3 = 0x8000000003d06002 ucr3 = 0xffffffffffffffff scr3 = 0x9138c811 gs32p = 0xffffffff8371de38 ldt = 0xffffffff8371de78 tss = 0xffffffff8371de68 tlb gen = 44131 curvnet = 0xfffff8000500ae40 db:0:kdb.enter.default> bt Tracing pid 5 tid 100059 td 0xfffff800042e6740 kdb_enter() at kdb_enter+0x37/frame 0xfffffe0000442760 vpanic() at vpanic+0x197/frame 0xfffffe00004427b0 panic() at panic+0x43/frame 0xfffffe0000442810 trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000442870 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004428c0 trap() at trap+0x286/frame 0xfffffe00004429d0 calltrap() at calltrap+0x8/frame 0xfffffe00004429d0 --- trap 0xc, rip = 0xffffffff810872a4, rsp = 0xfffffe0000442aa0, rbp = 0xfffffe0000442b30 --- esp_input_cb() at esp_input_cb+0xf4/frame 0xfffffe0000442b30 crypto_ret_proc() at crypto_ret_proc+0x1d9/frame 0xfffffe0000442bb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe0000442bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000442bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x3000 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff810872a4 stack pointer = 0x28:0xfffffe000043daa0 frame pointer = 0x28:0xfffffe000043db30 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 5 (crypto returns 2) trap number = 12 panic: page fault cpuid = 3 time = 1614410553 KDB: enter: panic
Can you give some more information about the way IPsec is configured and used on your firewall?
-
@jimp Sure, I've 3 site-to-site tunnels running with 2 phase-2 entries each. They've been in place for about 24 months and never had any issues, they're still running now & are used to just remotely admin 3 seperate locations with very minimal traffic between them.
There is also a tunnel set up for mobile clients to remotely access. This one has been in place around 4 years.
Interestingly though, since disabling WireGuard, removing the tunnel all together for it & moving back to OpenVPN, there has been no more crashes which is a little strange then if the crashes are more related to IPSEC than WireGuard. Nothing else has changed with any of the setup or configuration other than that.
-
None of the other 3 instances have had any unexpected crashes or failures (they're also all on 2.5.0, everything was upgraded at the same time). But also, none of those have had WireGuard configured for anything else.
-
Does your system happen to have AES-NI present and enabled?
I see you're using SHA256, there is a bug we're tracking at https://redmine.pfsense.org/issues/11524 but thus far we've not seen a panic from it.
-
@jimp yes, AES-NI is enabled on all 4 instances, but only the one was crashing. They are also all virtualised.
-
Can you try putting WireGuard back and disabling AES-NI temporarily to see if the crashes still occur?