Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Random pfSense crash after running for a week with no issues

    Scheduled Pinned Locked Moved General pfSense Questions
    5 Posts 3 Posters 514 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      MReprogle
      last edited by

      I have used pfSense for over 5 years and have never run into this issue, but I just bought a new Topton N6000 router and moved from baremetal to running ESXI on the Topton. I've virtualized pfSense in the past with no issues, but am now questioning whether to go back to baremetal or not. The main reasons I went with ESXI was for the i226 NIC support as well as the ability to run backups to my Synology NAS.

      So, at first, I believe that the Synology Active Backup for Business service was killing my WAN, since it consistently died shortly after the backup finished. Basically, ESXi creates a snapshot of the VM, and I believe that when consolidating this snapshot at the end of the backup, something weird happened to the WAN interface and it would go down. I installed a Cron job that just pings out every 5 minutes and if it cannot resolve an IP address from google.com and yahoo.com, it restarts the interface. Since implementing that, all seems to be fine. I know that the WAN went down at least one time after I disabled the Synology backup service,, but it seemed random and not at a consistent time like it was.

      Today, on the other hand, I was at work and got alerts that my entire network went down. I figured that the Cron job would fix it, but it never did. When I finally rebooted, pfSense came up with a crash report, and I have gone through it and cannot find anything that I see to be a red flag.

      If anyone is better with these crash logs than me, please let me know if you see anything that I should address first to. I'm kinda hoping I am just missing something blatantly obvious when it comes to virtualizing pfSense, since it has been a few years since then.

      textdump.tar.0

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        There are three panics shown there in the message buffer but only one backrtrace:

        db:1:pfs> bt
        Tracing pid 0 tid 100306 td 0xfffffe010b2611e0
        kdb_enter() at kdb_enter+0x32/frame 0xfffffe010aaaeb20
        vpanic() at vpanic+0x183/frame 0xfffffe010aaaeb70
        panic() at panic+0x43/frame 0xfffffe010aaaebd0
        trap_fatal() at trap_fatal+0x409/frame 0xfffffe010aaaec30
        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe010aaaec90
        calltrap() at calltrap+0x8/frame 0xfffffe010aaaec90
        --- trap 0xc, rip = 0xffffffff84138690, rsp = 0xfffffe010aaaed68, rbp = 0xfffffe010aaaee40 ---
        wg_send() at wg_send/frame 0xfffffe010aaaee40
        gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe010aaaeec0
        gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc3/frame 0xfffffe010aaaeef0
        fork_exit() at fork_exit+0x7d/frame 0xfffffe010aaaef30
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe010aaaef30
        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
        db:1:pfs>  show registers
        cs                        0x20
        ds                        0x3b
        es                        0x3b
        fs                        0x13
        gs                        0x1b
        ss                        0x28
        rax                       0x12
        rcx                        0x1
        rdx         0xfffffe010aaae740
        rbx                      0x100
        rsp         0xfffffe010aaaeb20
        rbp         0xfffffe010aaaeb20
        rsi                       0x19
        rdi         0xffffffff82d836d8  vt_conswindow+0x10
        r8                           0
        r9                      0x304f  _binary_elf_vdso_so_1_size+0x2a3f
        r10         0xffffffff82d83818  vt_consdev
        r11         0xcedfc2df9afff59c
        r12                          0
        r13         0xfffffe010aaaeca0
        r14         0xfffffe010aaaebb0
        r15         0xfffffe010b2611e0
        rip         0xffffffff80d48ff2  kdb_enter+0x32
        rflags                    0x82
        kdb_enter+0x32: movq    $0,0x2342e13(%rip)
        db:1:pfs>  show pcpu
        cpuid        = 2
        dynamic pcpu = 0xfffffe00981f6580
        curthread    = 0xfffffe010b2611e0: pid 0 tid 100306 critnest 1 "wg_tqg_2"
        curpcb       = 0xfffffe010b261700
        fpcurthread  = none
        idlethread   = 0xfffffe001b1b5560: tid 100005 "idle: cpu2"
        self         = 0xffffffff84012000
        curpmap      = 0xffffffff8303ff50
        tssp         = 0xffffffff84012384
        rsp0         = 0xfffffe010aaaf000
        kcr3         = 0xffffffffffffffff
        ucr3         = 0xffffffffffffffff
        scr3         = 0x0
        gs32p        = 0xffffffff84012404
        ldt          = 0xffffffff84012444
        tss          = 0xffffffff84012434
        curvnet      = 0
        

        Two of the panics appear to be in Wirewguard so you might try running without that enabled as a test.

        However it looks like you're running a Jasper Lake CPU and they have known issues with virtulisation. Make sure you have the current BIOS/microcode running there.
        I'm not sure if it applies directly to ESXi but still should be checked: https://forums.servethehome.com/index.php?threads/jasper-lake-proxmox-kvm-qemu-vm-guest-stability.38824/

        Steve

        M 1 Reply Last reply Reply Quote 0
        • M
          MReprogle @stephenw10
          last edited by

          @stephenw10 said in Random pfSense crash after running for a week with no issues:

          https://forums.servethehome.com/index.php?threads/jasper-lake-proxmox-kvm-qemu-vm-guest-stability.38824/

          Thanks! That definitely gives me something to go off of, and would make sense. I have had days where I wake up and check my phone, and have no issues visiting one or two sites, then the WAN dies. Makes me think that the CPU is likely idle for hours, then finally tries to 'wake up', then crashes, or at least has issues bringing up my WAN NIC.

          You definitely take a risk buying one of these Topton boxes from China in terms of firmware / microcode, and it looks like I am likely going to have to go down the path of getting it up to date, which they don't make easy since they don't have a website for technical support..

          1 Reply Last reply Reply Quote 0
          • N
            nimrod
            last edited by

            I had extremely bad experiences with Topton, XCY and other cheap Chinese appliances. They randomly reboot/crash and have overheating issues. If you need a cheap unit, go with Qotom.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              First try disabling CPU power saving modes in the BIOS and see if that changes anything.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.