Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-4860 crashing daily

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 2 Posters 1.6k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H Offline
      homer2320776
      last edited by homer2320776

      I'm not exactly sure when this started, but I hadn't logged into the UI for a bit and saw the crash reports and downloaded the files.

      I'm thinking its related to Wireguard or Tailscale but i'm not 100%.

      textdump.tar.0

      1 Reply Last reply Reply Quote 0
      • stephenw10S Offline
        stephenw10 Netgate Administrator
        last edited by

        Important parts are:

        db:0:kdb.enter.default>  bt
        Tracing pid 35301 tid 100294 td 0xfffff8017297e740
        kdb_enter() at kdb_enter+0x37/frame 0xfffffe004d3c97d0
        vpanic() at vpanic+0x194/frame 0xfffffe004d3c9820
        panic() at panic+0x43/frame 0xfffffe004d3c9880
        trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004d3c98e0
        calltrap() at calltrap+0x8/frame 0xfffffe004d3c98e0
        --- trap 0x9, rip = 0xffffffff8120594b, rsp = 0xfffffe004d3c99b0, rbp = 0xfffffe004d3c99c0 ---
        vm_radix_remove() at vm_radix_remove+0x1b/frame 0xfffffe004d3c99c0
        vm_page_free_prep() at vm_page_free_prep+0x55/frame 0xfffffe004d3c99e0
        vm_page_free_toq() at vm_page_free_toq+0x12/frame 0xfffffe004d3c9a10
        vm_object_page_remove() at vm_object_page_remove+0x61/frame 0xfffffe004d3c9a70
        vm_map_entry_delete() at vm_map_entry_delete+0xff/frame 0xfffffe004d3c9ac0
        vm_map_delete() at vm_map_delete+0x184/frame 0xfffffe004d3c9b20
        vm_map_remove() at vm_map_remove+0xab/frame 0xfffffe004d3c9b50
        vmspace_exit() at vmspace_exit+0xcb/frame 0xfffffe004d3c9b90
        exit1() at exit1+0x51c/frame 0xfffffe004d3c9bf0
        sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe004d3c9c00
        amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe004d3c9d30
        fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe004d3c9d30
        --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8003eb74a, rsp = 0x7fffffffeb38, rbp = 0x7fffffffeb50 ---
        
        Fatal trap 9: general protection fault while in kernel mode
        cpuid = 3; apic id = 06
        instruction pointer	= 0x20:0xffffffff8120594b
        stack pointer	        = 0x28:0xfffffe004d3c99b0
        frame pointer	        = 0x28:0xfffffe004d3c99c0
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 35301 (egrep)
        trap number		= 9
        panic: general protection fault
        cpuid = 3
        time = 1659503270
        KDB: enter: panic
        

        Neither of which point to anything specific unfortunately.

        Are the crash reports always identical? Or close to it?

        Steve

        H 1 Reply Last reply Reply Quote 0
        • H Offline
          homer2320776 @stephenw10
          last edited by

          @stephenw10 Thanks for the quick reply. I'm attaching 2 older reports and will collect future ones.

          textdump.tar
          textdump1.tar

          1 Reply Last reply Reply Quote 0
          • stephenw10S Offline
            stephenw10 Netgate Administrator
            last edited by

            One of those appears to be the same file you posted earlier. The other one is different though:

            db:0:kdb.enter.default>  bt
            Tracing pid 89157 tid 100203 td 0xfffff8017e1a2000
            kdb_enter() at kdb_enter+0x37/frame 0xfffffe004cdc4ab0
            vpanic() at vpanic+0x194/frame 0xfffffe004cdc4b00
            panic() at panic+0x43/frame 0xfffffe004cdc4b60
            trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004cdc4bc0
            trap_pfault() at trap_pfault+0x4f/frame 0xfffffe004cdc4c20
            trap() at trap+0x425/frame 0xfffffe004cdc4d30
            calltrap() at calltrap+0x8/frame 0xfffffe004cdc4d30
            --- trap 0xc, rip = 0x8004ed10c, rsp = 0x7fffdfffdd60, rbp = 0x7fffdfffddc0 ---
            
            Fatal trap 12: page fault while in user mode
            cpuid = 2; apic id = 04
            fault virtual address	= 0x800a008c8
            fault code		= user read data, reserved bits in PTE
            instruction pointer	= 0x43:0x8004ed10c
            stack pointer	        = 0x3b:0x7fffdfffdd60
            frame pointer	        = 0x3b:0x7fffdfffddc0
            code segment		= base 0x0, limit 0xfffff, type 0x1b
            			= DPL 3, pres 1, long 1, def32 0, gran 1
            processor eflags	= interrupt enabled, resume, IOPL = 0
            current process		= 89157 (charon)
            trap number		= 12
            panic: page fault
            cpuid = 2
            time = 1659406768
            KDB: enter: panic
            

            Very different crash reports like that starts to look like a hardware issue.

            You think this started happening after installing Wireguard?

            Or after upgrading to 22.05 maybe?

            Steve

            H 1 Reply Last reply Reply Quote 0
            • H Offline
              homer2320776 @stephenw10
              last edited by

              @stephenw10 I found some more crash logs that I had sent to myself over Telegram. Hopefully these might shed some light.

              textdump0.tar
              textdump1.tar
              textdump2.tar

              1 Reply Last reply Reply Quote 0
              • stephenw10S Offline
                stephenw10 Netgate Administrator
                last edited by

                Mmm, those are all different. That is looking more like a memory fault unfortunately.

                Are you able to try a clean install of 22.05?

                Steve

                H 1 Reply Last reply Reply Quote 0
                • H Offline
                  homer2320776 @stephenw10
                  last edited by

                  @stephenw10 This is currently the production firewall for this location. I purchased a XG-1537 last year and a stack of new switches to install but haven't scheduled a time to replace it all.

                  I'll try to reload the 4860 after everything stabilizes.

                  Last nights crash dump.
                  textdump.tar.0

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S Offline
                    stephenw10 Netgate Administrator
                    last edited by

                    Mmm, another similar crash but different panic. Again it doesn't point to any specific thing and looks increasingly like a hardware issue unfortunately.

                    Steve

                    H 1 Reply Last reply Reply Quote 0
                    • H Offline
                      homer2320776 @stephenw10
                      last edited by

                      @stephenw10 The device hadn't crashed in a few days, but this morning it has a PHP crash log as well.

                      [12-Aug-2022 00:42:00 UTC] PHP Warning:  Static function mbereg_search() cannot be abstract in Unknown on line 0
                      

                      textdump.tar.0

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S Offline
                        stephenw10 Netgate Administrator
                        last edited by

                        Hmm, that looks different, more like it just ran out of memory.

                        That also ties in with this:
                        <6>pid 71216 (unbound), jid 0, uid 59: exited on signal 11

                        If you check the monitoring graphs in Status > Monitoring do you see memory usage increasing with time?

                        H 1 Reply Last reply Reply Quote 0
                        • H Offline
                          homer2320776 @stephenw10
                          last edited by

                          @stephenw10 I checked the memory graph for a 2 day period with 5 min resolution and didn't see the free memory decrease except during the crashes.

                          bf256829-12a8-47da-ab13-85b2767805f6-image.png

                          I'll keep a watch for anything new.

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S Offline
                            stephenw10 Netgate Administrator
                            last edited by

                            Mmm, I agree, it doesn't look like it's exhausting the memory directly.

                            H 1 Reply Last reply Reply Quote 0
                            • H Offline
                              homer2320776 @stephenw10
                              last edited by

                              @stephenw10 I believe I have narrowed the issue down to the tailscale package. I noticed when I came back from vacation that the firewall had been up over 8 days w/o a crash.

                              Checking the logs showed that either PHP or PHP-CGI was exiting on signal 11 with a core dump, and the services section showed that tailscale wasn't running either.

                              On a hunch I started the tailscale service yesterday morning to see if a crash would happen. Sure enough, last night it crashed again.

                              Attached is the latest dump. textdump.tar.0

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S Offline
                                stephenw10 Netgate Administrator
                                last edited by

                                So you had disabled tailscale while you were away? Or it had stopped by itself and then crashed after you restarted it?

                                Steve

                                H 1 Reply Last reply Reply Quote 0
                                • H Offline
                                  homer2320776 @stephenw10
                                  last edited by

                                  @stephenw10 tailscale had crashed apparently, but the connections it made we're still running so I didn't notice the service itself was down.

                                  I restarted the service yesterday morning to see if it was the cause of the crashes, then this morning when I logged in, I saw the crash report.

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Mmm, not familiar to me. Let me see if any one else has seen it....

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.