Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-4860 crashing daily

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 2 Posters 1.6k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S Online
      stephenw10 Netgate Administrator
      last edited by

      One of those appears to be the same file you posted earlier. The other one is different though:

      db:0:kdb.enter.default>  bt
      Tracing pid 89157 tid 100203 td 0xfffff8017e1a2000
      kdb_enter() at kdb_enter+0x37/frame 0xfffffe004cdc4ab0
      vpanic() at vpanic+0x194/frame 0xfffffe004cdc4b00
      panic() at panic+0x43/frame 0xfffffe004cdc4b60
      trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004cdc4bc0
      trap_pfault() at trap_pfault+0x4f/frame 0xfffffe004cdc4c20
      trap() at trap+0x425/frame 0xfffffe004cdc4d30
      calltrap() at calltrap+0x8/frame 0xfffffe004cdc4d30
      --- trap 0xc, rip = 0x8004ed10c, rsp = 0x7fffdfffdd60, rbp = 0x7fffdfffddc0 ---
      
      Fatal trap 12: page fault while in user mode
      cpuid = 2; apic id = 04
      fault virtual address	= 0x800a008c8
      fault code		= user read data, reserved bits in PTE
      instruction pointer	= 0x43:0x8004ed10c
      stack pointer	        = 0x3b:0x7fffdfffdd60
      frame pointer	        = 0x3b:0x7fffdfffddc0
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 3, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 89157 (charon)
      trap number		= 12
      panic: page fault
      cpuid = 2
      time = 1659406768
      KDB: enter: panic
      

      Very different crash reports like that starts to look like a hardware issue.

      You think this started happening after installing Wireguard?

      Or after upgrading to 22.05 maybe?

      Steve

      H 1 Reply Last reply Reply Quote 0
      • H Offline
        homer2320776 @stephenw10
        last edited by

        @stephenw10 I found some more crash logs that I had sent to myself over Telegram. Hopefully these might shed some light.

        textdump0.tar
        textdump1.tar
        textdump2.tar

        1 Reply Last reply Reply Quote 0
        • stephenw10S Online
          stephenw10 Netgate Administrator
          last edited by

          Mmm, those are all different. That is looking more like a memory fault unfortunately.

          Are you able to try a clean install of 22.05?

          Steve

          H 1 Reply Last reply Reply Quote 0
          • H Offline
            homer2320776 @stephenw10
            last edited by

            @stephenw10 This is currently the production firewall for this location. I purchased a XG-1537 last year and a stack of new switches to install but haven't scheduled a time to replace it all.

            I'll try to reload the 4860 after everything stabilizes.

            Last nights crash dump.
            textdump.tar.0

            1 Reply Last reply Reply Quote 0
            • stephenw10S Online
              stephenw10 Netgate Administrator
              last edited by

              Mmm, another similar crash but different panic. Again it doesn't point to any specific thing and looks increasingly like a hardware issue unfortunately.

              Steve

              H 1 Reply Last reply Reply Quote 0
              • H Offline
                homer2320776 @stephenw10
                last edited by

                @stephenw10 The device hadn't crashed in a few days, but this morning it has a PHP crash log as well.

                [12-Aug-2022 00:42:00 UTC] PHP Warning:  Static function mbereg_search() cannot be abstract in Unknown on line 0
                

                textdump.tar.0

                1 Reply Last reply Reply Quote 0
                • stephenw10S Online
                  stephenw10 Netgate Administrator
                  last edited by

                  Hmm, that looks different, more like it just ran out of memory.

                  That also ties in with this:
                  <6>pid 71216 (unbound), jid 0, uid 59: exited on signal 11

                  If you check the monitoring graphs in Status > Monitoring do you see memory usage increasing with time?

                  H 1 Reply Last reply Reply Quote 0
                  • H Offline
                    homer2320776 @stephenw10
                    last edited by

                    @stephenw10 I checked the memory graph for a 2 day period with 5 min resolution and didn't see the free memory decrease except during the crashes.

                    bf256829-12a8-47da-ab13-85b2767805f6-image.png

                    I'll keep a watch for anything new.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S Online
                      stephenw10 Netgate Administrator
                      last edited by

                      Mmm, I agree, it doesn't look like it's exhausting the memory directly.

                      H 1 Reply Last reply Reply Quote 0
                      • H Offline
                        homer2320776 @stephenw10
                        last edited by

                        @stephenw10 I believe I have narrowed the issue down to the tailscale package. I noticed when I came back from vacation that the firewall had been up over 8 days w/o a crash.

                        Checking the logs showed that either PHP or PHP-CGI was exiting on signal 11 with a core dump, and the services section showed that tailscale wasn't running either.

                        On a hunch I started the tailscale service yesterday morning to see if a crash would happen. Sure enough, last night it crashed again.

                        Attached is the latest dump. textdump.tar.0

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S Online
                          stephenw10 Netgate Administrator
                          last edited by

                          So you had disabled tailscale while you were away? Or it had stopped by itself and then crashed after you restarted it?

                          Steve

                          H 1 Reply Last reply Reply Quote 0
                          • H Offline
                            homer2320776 @stephenw10
                            last edited by

                            @stephenw10 tailscale had crashed apparently, but the connections it made we're still running so I didn't notice the service itself was down.

                            I restarted the service yesterday morning to see if it was the cause of the crashes, then this morning when I logged in, I saw the crash report.

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S Online
                              stephenw10 Netgate Administrator
                              last edited by

                              Mmm, not familiar to me. Let me see if any one else has seen it....

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.