2.45_p1 Upgrade - Kernel panic on boot when 2nd WAN plugged in



  • Hi,

    Long time lurker, first time poster. I recently upgraded to 2.4.5_p1, and imported my config.xml from 2.4.5 installation.

    On boot, I get a kernel panic but I can mitigate this by unplugging the 2nd WAN interface and plugging it in after the machine has booted. The closing lines of the crash dump output read as below.

    Any ideas where to start diagnosing this one?

    <118>Configuring WAN interface...
    <5>igb3: link state changed to UP
    <5>igb3.666: link state changed to UP
    <5>igb0: link state changed to UP
    <118>done.
    <118>Configuring LAN interface...done.
    <118>Configuring DMZ interface...done.
    <118>Configuring IOT interface...done.
    <118>Configuring WAN_PN interface...
    <6>ng0: changing name to 'pppoe0'
    <5>igb1: link state changed to UP
    <6>pflog0: promiscuous mode enabled
    <118>done.
    <118>Configuring CARP settings...done.
    <118>Syncing OpenVPN settings...done.
    <118>route: writing to routing socket: Invalid argument
    <118>route: writing to routing socket: Invalid argument
    <118>Configuring firewall.....
    <5>igb2: link state changed to UP
    <118>.done.
    <118>Starting PFLOG...done.
    <118>Setting up gateway monitors...done.
    <118>Setting up static routes...
    
    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 2; apic id = 04
    fault virtual address	= 0x0
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff80f9ea8e
    stack pointer	        = 0x28:0xfffffe0111d857d0
    frame pointer	        = 0x28:0xfffffe0111d857e0
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 12 (swi1: pfsync)
    trap number		= 12
    panic: page fault
    cpuid = 2
    KDB: enter: panic
    panic.txt0600001213726374403  7142 ustarrootwheelpage faultversion.txt06000033013726374403  7620 ustarrootwheelFreeBSD 11.3-STABLE #243 abf8cba50ce(RELENG_2_4_5): Tue Jun  2 17:53:37 EDT 2020
        root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-245/obj/amd64/YNx4Qq3j/build/ce-crossbuild-245/sources/FreeBSD-src/sys/pfSense
    			```

  • Netgate Administrator

    Can we see the backtrace from the crash report? Assuming you see one.
    Everything between > bt and > ps.

    Is the backtrace the same everytime it crashes?

    Which interface is your second WAN? Is it a different NIC type?

    Steve



  • Hey Steve,

    Hopefully this is what you're after. I've got the textdump.tar.0 as well if needed.

    >  bt
    Tracing pid 12 tid 100139 td 0xfffff80006093620
    kdb_enter() at kdb_enter+0x3b/frame 0xfffffe0111d85480
    vpanic() at vpanic+0x19b/frame 0xfffffe0111d854e0
    panic() at panic+0x43/frame 0xfffffe0111d85540
    trap_pfault() at trap_pfault/frame 0xfffffe0111d85590
    trap_pfault() at trap_pfault+0x49/frame 0xfffffe0111d855f0
    trap() at trap+0x29d/frame 0xfffffe0111d85700
    calltrap() at calltrap+0x8/frame 0xfffffe0111d85700
    --- trap 0xc, rip = 0xffffffff80f9ea8e, rsp = 0xfffffe0111d857d0, rbp = 0xfffffe0111d857e0 ---
    pfsync_state_export() at pfsync_state_export+0x1e/frame 0xfffffe0111d857e0
    pfsync_sendout() at pfsync_sendout+0x1cf/frame 0xfffffe0111d85890
    pfsyncintr() at pfsyncintr+0xc6/frame 0xfffffe0111d858e0
    intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0111d85920
    ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0111d85970
    fork_exit() at fork_exit+0x83/frame 0xfffffe0111d859b0
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0111d859b0
    --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
    


  • Yes, same each time it crashes. That said I think once it recovered all by itself, but every reboot since has resulted in the same.

    It's perfectly stable once it's up, so I'm 99.9% sure it's something in the config or from the import rather than a hardware or NIC issue.

    It's a 4 port Intel NIC, same once that's been in there for 2.4.5 running for the last 120 days rock solid stable.


  • Netgate Administrator

    Hmm, not a crash I'm familiar with.

    It's in pfsync, I assume this is an HA pair? Is the second WAN connected to both?

    Steve



  • Hey Steve,

    No nothing fancy. Single host, dual WAN with 2 LANs. 2nd LAN has two VLANs.

    I was going to tear down the 2nd WAN connection/interface assignment tonight and see how it behaves, then rebuild it and see if that helps. I suspect it will still crater, but something to try.


  • Netgate Administrator

    Hmm, do you have any HA settings enabled there? State sync?

    You will have a pfsync interface but I would not expect it to be doing anything on a single firewall.

    Steve



  • Hi @stephenw10

    Not that I've configured, nothing under CARP - anywhere else I can check?


  • Netgate Administrator

    Does your sync interface have any config on it?

    [2.4.5-RELEASE][admin@t70.stevew.lan]/root: ifconfig pfsync0
    pfsync0: flags=0<> metric 0 mtu 1500
    	groups: pfsync
    

    Steve



  • [2.4.5-RELEASE][admin@incognito.local]/root: ifconfig pfsync0
    pfsync0: flags=0<> metric 0 mtu 1500
            groups: pfsync
    
    

    That's what I have sir.


  • Netgate Administrator

    Hmm, odd. And it does that with the same crash when you boot with WAN2 connected?

    What if you boot with the NIC connected but not actually connected to the WAN2 modem? That might determine if it's a hardware/driver issue or a network stack problem. I could see pfsync being either.

    Steve


Log in to reply