• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Crashing on startup (23.09 beta)

Plus 23.09 Development Snapshots (Retired)
3
7
510
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    mikebenna
    last edited by Oct 16, 2023, 7:05 PM

    Hi, I've got a pair of Plus VM's in HA mode. My slave VM was happy running the beta over the weekend. This morning I upgraded my master VM and it crashed 3 times immediately after the upgrade. It was happy for 30min or so, then I did another reboot to see what happened and boom... crashing again during startup... once, twice, three, four times. I may simply revert. Oh wait, fifth reboot succeeded! I'll leave it here for now. :)

    Looks like a page fault right around the time of bringing up a Wireguard tunnel.

    Slave is running a similar Wireguard config but has never crashed.

    How can I be of further assistance? Please reach out if you want the dump files.

    Version info:

    amd64
    14.0-CURRENT
    FreeBSD 14.0-CURRENT amd64 1400094 #1 plus-RELENG_23_09-n256151-106588946ac: Mon Oct 16 03:08:01 UTC 2023     root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_09-main/obj/amd64/nZsGV1oA/var/jenkins/workspace/pfSense-Plus-snapshots-23_09-main/
    
    1 Reply Last reply Reply Quote 0
    • M
      marcosm Netgate
      last edited by Oct 16, 2023, 8:11 PM

      If available, share the panic/backtrace.

      M 1 Reply Last reply Oct 16, 2023, 8:38 PM Reply Quote 0
      • M
        mikebenna @marcosm
        last edited by Oct 16, 2023, 8:38 PM

        Hi @marcosm, I suspect the entire tar file contains more info than I wish to post in a public forum, but here's the snippet of the crash in progress:

        ...
        <118>Starting syslog...done.
        <6>carp: 4@vtnet2: INIT -> BACKUP (initialization complete)
        <6>carp: demoted by -240 to 720 (interface up)
        <6>carp: 3@vtnet1: INIT -> BACKUP (initialization complete)
        <6>carp: demoted by -240 to 480 (interface up)
        <6>carp: 1@vtnet0.32: INIT -> BACKUP (initialization complete)
        <6>carp: demoted by -240 to 240 (interface up)
        <6>carp: 2@vtnet0: INIT -> BACKUP (initialization complete)
        <6>carp: demoted by -240 to 0 (interface up)
        <118>Starting CRON... done.
        <118>>>> Upgrading necessary packages... 
        <6>carp: 2@vtnet0: BACKUP -> MASTER (preempting a slower master)
        <6>carp: 1@vtnet0.32: BACKUP -> MASTER (preempting a slower master)
        <6>carp: 3@vtnet1: BACKUP -> MASTER (preempting a slower master)
        <6>carp: 4@vtnet2: BACKUP -> MASTER (preempting a slower master)
        <118>Checking for upgrades (2 candidates): .. done
        <118>Processing candidates (2 candidates): .. done
        <118>Checking integrity... done (0 conflicting)
        <118>The following 2 package(s) will be affected (of 0 checked):
        <118>
        <118>Installed packages to be UPGRADED:
        <118>	pfSense-pkg-WireGuard: 0.2.0_2 -> 0.2.1 [pfSense]
        <118>
        <118>Installed packages to be REINSTALLED:
        <118>	pfSense-pkg-node_exporter-0.18.1_3 [pfSense] (direct dependency changed: python311)
        <118>
        <118>Number of packages to be upgraded: 1
        <118>Number of packages to be reinstalled: 1
        <118>[1/2] Upgrading pfSense-pkg-WireGuard from 0.2.0_2 to 0.2.1...
        <118>[1/2] Extracting pfSense-pkg-WireGuard-0.2.1: .......... done
        <118>Removing WireGuard components...
        <118>Menu items... done.
        <118>Services... done.
        <118>Loading package instructions...
        <118>  Removing WireGuard early shell commands...done.
        <118>  Removing WireGuard interface group...done.
        <118>  Removing WireGuard temporary files...done.
        <118>  Keeping WireGuard configuration settings...done.
        <118>  Removing WireGuard Unbound access list...done.
        <118>  Destroying WireGuard tunnels...
        <6>tun_wg0: link state changed to DOWN
        <6>tun_wg1: link state changed to DOWN
        <118>done.
        <118>  Stopping and removing the WireGuard service...
        
        
        Fatal trap 12: page fault while in kernel mode
        cpuid = 2; apic id = 02
        fault virtual address	= 0x50
        fault code		= supervisor read data, page not present
        instruction pointer	= 0x20:0xffffffff80fb883c
        stack pointer	        = 0x28:0xfffffe00595c2810
        frame pointer	        = 0x28:0xfffffe00595c28d0
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 12 (irq25: virtio_pci0)
        rdi: fffff8008d1ee01a rsi: fffffe0005bb082d rdx: 0000000000000000
        rcx: fffffe0005bb0370  r8: fffffe00595c2930  r9: 0000000000000000
        rax: 0000000000000000 rbx: fffff800aa6248b8 rbp: fffffe00595c28d0
        r10: 0000000000000000 r11: fffffe0005bb0370 r12: fffffe0005bb0370
        r13: 0000000000000002 r14: fffff800bbd5a160 r15: fffffe00595c2b08
        trap number		= 12
        panic: page fault
        cpuid = 2
        time = 1697479957
        KDB: enter: panic
        

        and here's a bit of the trace:

        db:0:kdb.enter.default>  run pfs
        db:1:pfs> bt
        Tracing pid 12 tid 100070 td 0xfffffe0005d78020
        kdb_enter() at kdb_enter+0x32/frame 0xfffffe00595c24f0
        vpanic() at vpanic+0x163/frame 0xfffffe00595c2620
        panic() at panic+0x43/frame 0xfffffe00595c2680
        trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00595c26e0
        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00595c2740
        calltrap() at calltrap+0x8/frame 0xfffffe00595c2740
        --- trap 0xc, rip = 0xffffffff80fb883c, rsp = 0xfffffe00595c2810, rbp = 0xfffffe00595c28d0 ---
        pf_route() at pf_route+0x56c/frame 0xfffffe00595c28d0
        pf_test() at pf_test+0xd7b/frame 0xfffffe00595c2a70
        pf_check_out() at pf_check_out+0x22/frame 0xfffffe00595c2a90
        pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe00595c2ac0
        ip_tryforward() at ip_tryforward+0x29d/frame 0xfffffe00595c2b70
        ip_input() at ip_input+0x350/frame 0xfffffe00595c2bd0
        netisr_dispatch_src() at netisr_dispatch_src+0x22c/frame 0xfffffe00595c2c20
        ether_demux() at ether_demux+0x149/frame 0xfffffe00595c2c50
        ether_nh_input() at ether_nh_input+0x36e/frame 0xfffffe00595c2cb0
        netisr_dispatch_src() at netisr_dispatch_src+0xaf/frame 0xfffffe00595c2d00
        ether_input() at ether_input+0x69/frame 0xfffffe00595c2d60
        vtnet_rxq_eof() at vtnet_rxq_eof+0x6d2/frame 0xfffffe00595c2e20
        vtnet_rx_vq_process() at vtnet_rx_vq_process+0xbc/frame 0xfffffe00595c2e60
        ithread_loop() at ithread_loop+0x257/frame 0xfffffe00595c2ef0
        fork_exit() at fork_exit+0x7f/frame 0xfffffe00595c2f30
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00595c2f30
        --- trap 0x7c9ebd07, rip = 0x8000065320cca1bc, rsp = 0xffffffff00000007, rbp = 0xbaa558 ---
        db:1:pfs>  show registers
        cs                        0x20
        ds                        0x3b
        es                        0x3b
        fs                        0x13
        gs                        0x1b
        ss                        0x28
        rax                       0x12
        rcx         0xffffffff814589e2
        rdx                      0x3f8
        rbx                      0x100
        rsp         0xfffffe00595c24f0
        rbp         0xfffffe00595c24f0
        rsi                        0xa
        rdi         0xffffffff82d4e850  gdb_consdev
        r8                           0
        r9                           0
        r10                          0
        r11                          0
        r12                          0
        r13                          0
        r14         0xffffffff813dbf5b
        r15         0xfffffe0005d78020
        rip         0xffffffff80d388c2  kdb_enter+0x32
        rflags                    0x82
        kdb_enter+0x32: movq    $0,0x2344f43(%rip)
        db:1:pfs>  show pcpu
        cpuid        = 2
        dynamic pcpu = 0xfffffe0082c79f00
        curthread    = 0xfffffe0005d78020: pid 12 tid 100070 critnest 1 "irq25: virtio_pci0"
        curpcb       = 0xfffffe0005d78540
        fpcurthread  = none
        idlethread   = 0xfffffe0005c67560: tid 100005 "idle: cpu2"
        self         = 0xffffffff84012000
        curpmap      = 0xffffffff83021ab0
        tssp         = 0xffffffff84012384
        rsp0         = 0xfffffe00595c3000
        kcr3         = 0x80000000b7900007
        ucr3         = 0xffffffffffffffff
        scr3         = 0xa5b7a9e4
        gs32p        = 0xffffffff84012404
        ldt          = 0xffffffff84012444
        tss          = 0xffffffff84012434
        curvnet      = 0xfffff800011c7580
        db:1:pfs>  run lockinfo
        db:2:lockinfo> show locks
        No such command; use "help" to list available commands
        db:2:lockinfo>  show alllocks
        No such command; use "help" to list available commands
        db:2:lockinfo>  show lockedvnods
        Locked vnodes
        db:1:pfs>  acttrace
        
        Tracing command idle pid 11 tid 100003 td 0xfffffe0005c683a0 (CPU 0)
        cpustop_handler() at cpustop_handler+0x27/frame 0xffffffff83105680
        ipi_nmi_handler() at ipi_nmi_handler+0x3a/frame 0xffffffff83105690
        trap() at trap+0x3f/frame 0xffffffff831057b0
        nmi_calltrap() at nmi_calltrap+0x8/frame 0xffffffff831057b0
        --- trap 0x13, rip = 0xffffffff8125c526, rsp = 0xfffffe0005176d70, rbp = 0xfffffe0005176d70 ---
        acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfffffe0005176d70
        acpi_cpu_idle() at acpi_cpu_idle+0x2fe/frame 0xfffffe0005176db0
        cpu_idle_acpi() at cpu_idle_acpi+0x46/frame 0xfffffe0005176dd0
        cpu_idle() at cpu_idle+0x9d/frame 0xfffffe0005176df0
        sched_idletd() at sched_idletd+0x576/frame 0xfffffe0005176ef0
        fork_exit() at fork_exit+0x7f/frame 0xfffffe0005176f30
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0005176f30
        --- trap 0x9824848b, rip = 0x5ede158d48564100, rsp = 0x894c78ec8348db31, rbp = 0x2464894cd23115eb ---
        

        This is totally not my specialty, but it looks like a virtio driver crash. Fwiw the host OS is Truenas Scale Cobia RC.1, adapter type is 'VirtIO', and it attaches to a software bridge on the linux side.

        I can send the tar file directly if you wish.
        Mike

        p.s. it's been running solid for 2 hours now, so likely it's fairly stable once everything comes up. I haven't tried rebooting again.

        1 Reply Last reply Reply Quote 0
        • S
          stephenw10 Netgate Administrator
          last edited by Oct 16, 2023, 9:00 PM

          Looks like it's this: https://redmine.pfsense.org/issues/14804

          If you look at the comments there it should be fixed in tomorrows build.

          Steve

          M 1 Reply Last reply Oct 16, 2023, 10:30 PM Reply Quote 0
          • M
            mikebenna @stephenw10
            last edited by Oct 16, 2023, 10:30 PM

            Hi @stephenw10,

            Based on that theory, I disabled CARP on the master to cause a failover to the slave, waited a minute, then re-enabled CARP on the master. Both failovers went perfectly smoothly and the master did not crash when taking over this time.

            Further anecdotal evidence: I did two more reboots (after the above test) and both succeeded. Most likely that just means I don't have a 100% reproducible test case.

            In any case, I'll install tonight's snapshot build once it's up.

            Mike

            1 Reply Last reply Reply Quote 1
            • S
              stephenw10 Netgate Administrator
              last edited by Oct 16, 2023, 10:38 PM

              The backtrace you saw is pretty much identical so I'd be surprised if it wasn't that. 👍

              M 1 Reply Last reply Oct 17, 2023, 5:49 PM Reply Quote 0
              • M
                mikebenna @stephenw10
                last edited by Oct 17, 2023, 5:49 PM

                I installed the latest beta snapshot this morning on both VM's and so far no crashes. If anything comes up I'll report back here but so far looking good.

                1 Reply Last reply Reply Quote 1
                3 out of 7
                • First post
                  3/7
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.