Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-4860 crashing daily

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 2 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H
      homer2320776
      last edited by homer2320776

      I'm not exactly sure when this started, but I hadn't logged into the UI for a bit and saw the crash reports and downloaded the files.

      I'm thinking its related to Wireguard or Tailscale but i'm not 100%.

      textdump.tar.0

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Important parts are:

        db:0:kdb.enter.default>  bt
        Tracing pid 35301 tid 100294 td 0xfffff8017297e740
        kdb_enter() at kdb_enter+0x37/frame 0xfffffe004d3c97d0
        vpanic() at vpanic+0x194/frame 0xfffffe004d3c9820
        panic() at panic+0x43/frame 0xfffffe004d3c9880
        trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004d3c98e0
        calltrap() at calltrap+0x8/frame 0xfffffe004d3c98e0
        --- trap 0x9, rip = 0xffffffff8120594b, rsp = 0xfffffe004d3c99b0, rbp = 0xfffffe004d3c99c0 ---
        vm_radix_remove() at vm_radix_remove+0x1b/frame 0xfffffe004d3c99c0
        vm_page_free_prep() at vm_page_free_prep+0x55/frame 0xfffffe004d3c99e0
        vm_page_free_toq() at vm_page_free_toq+0x12/frame 0xfffffe004d3c9a10
        vm_object_page_remove() at vm_object_page_remove+0x61/frame 0xfffffe004d3c9a70
        vm_map_entry_delete() at vm_map_entry_delete+0xff/frame 0xfffffe004d3c9ac0
        vm_map_delete() at vm_map_delete+0x184/frame 0xfffffe004d3c9b20
        vm_map_remove() at vm_map_remove+0xab/frame 0xfffffe004d3c9b50
        vmspace_exit() at vmspace_exit+0xcb/frame 0xfffffe004d3c9b90
        exit1() at exit1+0x51c/frame 0xfffffe004d3c9bf0
        sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe004d3c9c00
        amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe004d3c9d30
        fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe004d3c9d30
        --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8003eb74a, rsp = 0x7fffffffeb38, rbp = 0x7fffffffeb50 ---
        
        Fatal trap 9: general protection fault while in kernel mode
        cpuid = 3; apic id = 06
        instruction pointer	= 0x20:0xffffffff8120594b
        stack pointer	        = 0x28:0xfffffe004d3c99b0
        frame pointer	        = 0x28:0xfffffe004d3c99c0
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 35301 (egrep)
        trap number		= 9
        panic: general protection fault
        cpuid = 3
        time = 1659503270
        KDB: enter: panic
        

        Neither of which point to anything specific unfortunately.

        Are the crash reports always identical? Or close to it?

        Steve

        H 1 Reply Last reply Reply Quote 0
        • H
          homer2320776 @stephenw10
          last edited by

          @stephenw10 Thanks for the quick reply. I'm attaching 2 older reports and will collect future ones.

          textdump.tar
          textdump1.tar

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            One of those appears to be the same file you posted earlier. The other one is different though:

            db:0:kdb.enter.default>  bt
            Tracing pid 89157 tid 100203 td 0xfffff8017e1a2000
            kdb_enter() at kdb_enter+0x37/frame 0xfffffe004cdc4ab0
            vpanic() at vpanic+0x194/frame 0xfffffe004cdc4b00
            panic() at panic+0x43/frame 0xfffffe004cdc4b60
            trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004cdc4bc0
            trap_pfault() at trap_pfault+0x4f/frame 0xfffffe004cdc4c20
            trap() at trap+0x425/frame 0xfffffe004cdc4d30
            calltrap() at calltrap+0x8/frame 0xfffffe004cdc4d30
            --- trap 0xc, rip = 0x8004ed10c, rsp = 0x7fffdfffdd60, rbp = 0x7fffdfffddc0 ---
            
            Fatal trap 12: page fault while in user mode
            cpuid = 2; apic id = 04
            fault virtual address	= 0x800a008c8
            fault code		= user read data, reserved bits in PTE
            instruction pointer	= 0x43:0x8004ed10c
            stack pointer	        = 0x3b:0x7fffdfffdd60
            frame pointer	        = 0x3b:0x7fffdfffddc0
            code segment		= base 0x0, limit 0xfffff, type 0x1b
            			= DPL 3, pres 1, long 1, def32 0, gran 1
            processor eflags	= interrupt enabled, resume, IOPL = 0
            current process		= 89157 (charon)
            trap number		= 12
            panic: page fault
            cpuid = 2
            time = 1659406768
            KDB: enter: panic
            

            Very different crash reports like that starts to look like a hardware issue.

            You think this started happening after installing Wireguard?

            Or after upgrading to 22.05 maybe?

            Steve

            H 1 Reply Last reply Reply Quote 0
            • H
              homer2320776 @stephenw10
              last edited by

              @stephenw10 I found some more crash logs that I had sent to myself over Telegram. Hopefully these might shed some light.

              textdump0.tar
              textdump1.tar
              textdump2.tar

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Mmm, those are all different. That is looking more like a memory fault unfortunately.

                Are you able to try a clean install of 22.05?

                Steve

                H 1 Reply Last reply Reply Quote 0
                • H
                  homer2320776 @stephenw10
                  last edited by

                  @stephenw10 This is currently the production firewall for this location. I purchased a XG-1537 last year and a stack of new switches to install but haven't scheduled a time to replace it all.

                  I'll try to reload the 4860 after everything stabilizes.

                  Last nights crash dump.
                  textdump.tar.0

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Mmm, another similar crash but different panic. Again it doesn't point to any specific thing and looks increasingly like a hardware issue unfortunately.

                    Steve

                    H 1 Reply Last reply Reply Quote 0
                    • H
                      homer2320776 @stephenw10
                      last edited by

                      @stephenw10 The device hadn't crashed in a few days, but this morning it has a PHP crash log as well.

                      [12-Aug-2022 00:42:00 UTC] PHP Warning:  Static function mbereg_search() cannot be abstract in Unknown on line 0
                      

                      textdump.tar.0

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Hmm, that looks different, more like it just ran out of memory.

                        That also ties in with this:
                        <6>pid 71216 (unbound), jid 0, uid 59: exited on signal 11

                        If you check the monitoring graphs in Status > Monitoring do you see memory usage increasing with time?

                        H 1 Reply Last reply Reply Quote 0
                        • H
                          homer2320776 @stephenw10
                          last edited by

                          @stephenw10 I checked the memory graph for a 2 day period with 5 min resolution and didn't see the free memory decrease except during the crashes.

                          bf256829-12a8-47da-ab13-85b2767805f6-image.png

                          I'll keep a watch for anything new.

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Mmm, I agree, it doesn't look like it's exhausting the memory directly.

                            H 1 Reply Last reply Reply Quote 0
                            • H
                              homer2320776 @stephenw10
                              last edited by

                              @stephenw10 I believe I have narrowed the issue down to the tailscale package. I noticed when I came back from vacation that the firewall had been up over 8 days w/o a crash.

                              Checking the logs showed that either PHP or PHP-CGI was exiting on signal 11 with a core dump, and the services section showed that tailscale wasn't running either.

                              On a hunch I started the tailscale service yesterday morning to see if a crash would happen. Sure enough, last night it crashed again.

                              Attached is the latest dump. textdump.tar.0

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                So you had disabled tailscale while you were away? Or it had stopped by itself and then crashed after you restarted it?

                                Steve

                                H 1 Reply Last reply Reply Quote 0
                                • H
                                  homer2320776 @stephenw10
                                  last edited by

                                  @stephenw10 tailscale had crashed apparently, but the connections it made we're still running so I didn't notice the service itself was down.

                                  I restarted the service yesterday morning to see if it was the cause of the crashes, then this morning when I logged in, I saw the crash report.

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Mmm, not familiar to me. Let me see if any one else has seen it....

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.