2.45_p1 Upgrade - Kernel panic on boot when 2nd WAN plugged in
-
Hi,
Long time lurker, first time poster. I recently upgraded to 2.4.5_p1, and imported my config.xml from 2.4.5 installation.
On boot, I get a kernel panic but I can mitigate this by unplugging the 2nd WAN interface and plugging it in after the machine has booted. The closing lines of the crash dump output read as below.
Any ideas where to start diagnosing this one?
<118>Configuring WAN interface... <5>igb3: link state changed to UP <5>igb3.666: link state changed to UP <5>igb0: link state changed to UP <118>done. <118>Configuring LAN interface...done. <118>Configuring DMZ interface...done. <118>Configuring IOT interface...done. <118>Configuring WAN_PN interface... <6>ng0: changing name to 'pppoe0' <5>igb1: link state changed to UP <6>pflog0: promiscuous mode enabled <118>done. <118>Configuring CARP settings...done. <118>Syncing OpenVPN settings...done. <118>route: writing to routing socket: Invalid argument <118>route: writing to routing socket: Invalid argument <118>Configuring firewall..... <5>igb2: link state changed to UP <118>.done. <118>Starting PFLOG...done. <118>Setting up gateway monitors...done. <118>Setting up static routes... Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 04 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f9ea8e stack pointer = 0x28:0xfffffe0111d857d0 frame pointer = 0x28:0xfffffe0111d857e0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi1: pfsync) trap number = 12 panic: page fault cpuid = 2 KDB: enter: panic panic.txt0600001213726374403 7142 ustarrootwheelpage faultversion.txt06000033013726374403 7620 ustarrootwheelFreeBSD 11.3-STABLE #243 abf8cba50ce(RELENG_2_4_5): Tue Jun 2 17:53:37 EDT 2020 root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-245/obj/amd64/YNx4Qq3j/build/ce-crossbuild-245/sources/FreeBSD-src/sys/pfSense ```
-
Can we see the backtrace from the crash report? Assuming you see one.
Everything between> bt
and> ps
.Is the backtrace the same everytime it crashes?
Which interface is your second WAN? Is it a different NIC type?
Steve
-
Hey Steve,
Hopefully this is what you're after. I've got the textdump.tar.0 as well if needed.
> bt Tracing pid 12 tid 100139 td 0xfffff80006093620 kdb_enter() at kdb_enter+0x3b/frame 0xfffffe0111d85480 vpanic() at vpanic+0x19b/frame 0xfffffe0111d854e0 panic() at panic+0x43/frame 0xfffffe0111d85540 trap_pfault() at trap_pfault/frame 0xfffffe0111d85590 trap_pfault() at trap_pfault+0x49/frame 0xfffffe0111d855f0 trap() at trap+0x29d/frame 0xfffffe0111d85700 calltrap() at calltrap+0x8/frame 0xfffffe0111d85700 --- trap 0xc, rip = 0xffffffff80f9ea8e, rsp = 0xfffffe0111d857d0, rbp = 0xfffffe0111d857e0 --- pfsync_state_export() at pfsync_state_export+0x1e/frame 0xfffffe0111d857e0 pfsync_sendout() at pfsync_sendout+0x1cf/frame 0xfffffe0111d85890 pfsyncintr() at pfsyncintr+0xc6/frame 0xfffffe0111d858e0 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0111d85920 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0111d85970 fork_exit() at fork_exit+0x83/frame 0xfffffe0111d859b0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0111d859b0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
-
Yes, same each time it crashes. That said I think once it recovered all by itself, but every reboot since has resulted in the same.
It's perfectly stable once it's up, so I'm 99.9% sure it's something in the config or from the import rather than a hardware or NIC issue.
It's a 4 port Intel NIC, same once that's been in there for 2.4.5 running for the last 120 days rock solid stable.
-
Hmm, not a crash I'm familiar with.
It's in pfsync, I assume this is an HA pair? Is the second WAN connected to both?
Steve
-
Hey Steve,
No nothing fancy. Single host, dual WAN with 2 LANs. 2nd LAN has two VLANs.
I was going to tear down the 2nd WAN connection/interface assignment tonight and see how it behaves, then rebuild it and see if that helps. I suspect it will still crater, but something to try.
-
Hmm, do you have any HA settings enabled there? State sync?
You will have a pfsync interface but I would not expect it to be doing anything on a single firewall.
Steve
-
Hi @stephenw10
Not that I've configured, nothing under CARP - anywhere else I can check?
-
Does your sync interface have any config on it?
[2.4.5-RELEASE][admin@t70.stevew.lan]/root: ifconfig pfsync0 pfsync0: flags=0<> metric 0 mtu 1500 groups: pfsync
Steve
-
[2.4.5-RELEASE][admin@incognito.local]/root: ifconfig pfsync0 pfsync0: flags=0<> metric 0 mtu 1500 groups: pfsync
That's what I have sir.
-
Hmm, odd. And it does that with the same crash when you boot with WAN2 connected?
What if you boot with the NIC connected but not actually connected to the WAN2 modem? That might determine if it's a hardware/driver issue or a network stack problem. I could see pfsync being either.
Steve