Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-4860 crashing daily

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 2 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H
      homer2320776 @stephenw10
      last edited by

      @stephenw10 Thanks for the quick reply. I'm attaching 2 older reports and will collect future ones.

      textdump.tar
      textdump1.tar

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        One of those appears to be the same file you posted earlier. The other one is different though:

        db:0:kdb.enter.default>  bt
        Tracing pid 89157 tid 100203 td 0xfffff8017e1a2000
        kdb_enter() at kdb_enter+0x37/frame 0xfffffe004cdc4ab0
        vpanic() at vpanic+0x194/frame 0xfffffe004cdc4b00
        panic() at panic+0x43/frame 0xfffffe004cdc4b60
        trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004cdc4bc0
        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe004cdc4c20
        trap() at trap+0x425/frame 0xfffffe004cdc4d30
        calltrap() at calltrap+0x8/frame 0xfffffe004cdc4d30
        --- trap 0xc, rip = 0x8004ed10c, rsp = 0x7fffdfffdd60, rbp = 0x7fffdfffddc0 ---
        
        Fatal trap 12: page fault while in user mode
        cpuid = 2; apic id = 04
        fault virtual address	= 0x800a008c8
        fault code		= user read data, reserved bits in PTE
        instruction pointer	= 0x43:0x8004ed10c
        stack pointer	        = 0x3b:0x7fffdfffdd60
        frame pointer	        = 0x3b:0x7fffdfffddc0
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 3, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 89157 (charon)
        trap number		= 12
        panic: page fault
        cpuid = 2
        time = 1659406768
        KDB: enter: panic
        

        Very different crash reports like that starts to look like a hardware issue.

        You think this started happening after installing Wireguard?

        Or after upgrading to 22.05 maybe?

        Steve

        H 1 Reply Last reply Reply Quote 0
        • H
          homer2320776 @stephenw10
          last edited by

          @stephenw10 I found some more crash logs that I had sent to myself over Telegram. Hopefully these might shed some light.

          textdump0.tar
          textdump1.tar
          textdump2.tar

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Mmm, those are all different. That is looking more like a memory fault unfortunately.

            Are you able to try a clean install of 22.05?

            Steve

            H 1 Reply Last reply Reply Quote 0
            • H
              homer2320776 @stephenw10
              last edited by

              @stephenw10 This is currently the production firewall for this location. I purchased a XG-1537 last year and a stack of new switches to install but haven't scheduled a time to replace it all.

              I'll try to reload the 4860 after everything stabilizes.

              Last nights crash dump.
              textdump.tar.0

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Mmm, another similar crash but different panic. Again it doesn't point to any specific thing and looks increasingly like a hardware issue unfortunately.

                Steve

                H 1 Reply Last reply Reply Quote 0
                • H
                  homer2320776 @stephenw10
                  last edited by

                  @stephenw10 The device hadn't crashed in a few days, but this morning it has a PHP crash log as well.

                  [12-Aug-2022 00:42:00 UTC] PHP Warning:  Static function mbereg_search() cannot be abstract in Unknown on line 0
                  

                  textdump.tar.0

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Hmm, that looks different, more like it just ran out of memory.

                    That also ties in with this:
                    <6>pid 71216 (unbound), jid 0, uid 59: exited on signal 11

                    If you check the monitoring graphs in Status > Monitoring do you see memory usage increasing with time?

                    H 1 Reply Last reply Reply Quote 0
                    • H
                      homer2320776 @stephenw10
                      last edited by

                      @stephenw10 I checked the memory graph for a 2 day period with 5 min resolution and didn't see the free memory decrease except during the crashes.

                      bf256829-12a8-47da-ab13-85b2767805f6-image.png

                      I'll keep a watch for anything new.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Mmm, I agree, it doesn't look like it's exhausting the memory directly.

                        H 1 Reply Last reply Reply Quote 0
                        • H
                          homer2320776 @stephenw10
                          last edited by

                          @stephenw10 I believe I have narrowed the issue down to the tailscale package. I noticed when I came back from vacation that the firewall had been up over 8 days w/o a crash.

                          Checking the logs showed that either PHP or PHP-CGI was exiting on signal 11 with a core dump, and the services section showed that tailscale wasn't running either.

                          On a hunch I started the tailscale service yesterday morning to see if a crash would happen. Sure enough, last night it crashed again.

                          Attached is the latest dump. textdump.tar.0

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            So you had disabled tailscale while you were away? Or it had stopped by itself and then crashed after you restarted it?

                            Steve

                            H 1 Reply Last reply Reply Quote 0
                            • H
                              homer2320776 @stephenw10
                              last edited by

                              @stephenw10 tailscale had crashed apparently, but the connections it made we're still running so I didn't notice the service itself was down.

                              I restarted the service yesterday morning to see if it was the cause of the crashes, then this morning when I logged in, I saw the crash report.

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Mmm, not familiar to me. Let me see if any one else has seen it....

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.