Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-4860 crashing daily

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 2 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      One of those appears to be the same file you posted earlier. The other one is different though:

      db:0:kdb.enter.default>  bt
      Tracing pid 89157 tid 100203 td 0xfffff8017e1a2000
      kdb_enter() at kdb_enter+0x37/frame 0xfffffe004cdc4ab0
      vpanic() at vpanic+0x194/frame 0xfffffe004cdc4b00
      panic() at panic+0x43/frame 0xfffffe004cdc4b60
      trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004cdc4bc0
      trap_pfault() at trap_pfault+0x4f/frame 0xfffffe004cdc4c20
      trap() at trap+0x425/frame 0xfffffe004cdc4d30
      calltrap() at calltrap+0x8/frame 0xfffffe004cdc4d30
      --- trap 0xc, rip = 0x8004ed10c, rsp = 0x7fffdfffdd60, rbp = 0x7fffdfffddc0 ---
      
      Fatal trap 12: page fault while in user mode
      cpuid = 2; apic id = 04
      fault virtual address	= 0x800a008c8
      fault code		= user read data, reserved bits in PTE
      instruction pointer	= 0x43:0x8004ed10c
      stack pointer	        = 0x3b:0x7fffdfffdd60
      frame pointer	        = 0x3b:0x7fffdfffddc0
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 3, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 89157 (charon)
      trap number		= 12
      panic: page fault
      cpuid = 2
      time = 1659406768
      KDB: enter: panic
      

      Very different crash reports like that starts to look like a hardware issue.

      You think this started happening after installing Wireguard?

      Or after upgrading to 22.05 maybe?

      Steve

      H 1 Reply Last reply Reply Quote 0
      • H
        homer2320776 @stephenw10
        last edited by

        @stephenw10 I found some more crash logs that I had sent to myself over Telegram. Hopefully these might shed some light.

        textdump0.tar
        textdump1.tar
        textdump2.tar

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Mmm, those are all different. That is looking more like a memory fault unfortunately.

          Are you able to try a clean install of 22.05?

          Steve

          H 1 Reply Last reply Reply Quote 0
          • H
            homer2320776 @stephenw10
            last edited by

            @stephenw10 This is currently the production firewall for this location. I purchased a XG-1537 last year and a stack of new switches to install but haven't scheduled a time to replace it all.

            I'll try to reload the 4860 after everything stabilizes.

            Last nights crash dump.
            textdump.tar.0

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Mmm, another similar crash but different panic. Again it doesn't point to any specific thing and looks increasingly like a hardware issue unfortunately.

              Steve

              H 1 Reply Last reply Reply Quote 0
              • H
                homer2320776 @stephenw10
                last edited by

                @stephenw10 The device hadn't crashed in a few days, but this morning it has a PHP crash log as well.

                [12-Aug-2022 00:42:00 UTC] PHP Warning:  Static function mbereg_search() cannot be abstract in Unknown on line 0
                

                textdump.tar.0

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Hmm, that looks different, more like it just ran out of memory.

                  That also ties in with this:
                  <6>pid 71216 (unbound), jid 0, uid 59: exited on signal 11

                  If you check the monitoring graphs in Status > Monitoring do you see memory usage increasing with time?

                  H 1 Reply Last reply Reply Quote 0
                  • H
                    homer2320776 @stephenw10
                    last edited by

                    @stephenw10 I checked the memory graph for a 2 day period with 5 min resolution and didn't see the free memory decrease except during the crashes.

                    bf256829-12a8-47da-ab13-85b2767805f6-image.png

                    I'll keep a watch for anything new.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Mmm, I agree, it doesn't look like it's exhausting the memory directly.

                      H 1 Reply Last reply Reply Quote 0
                      • H
                        homer2320776 @stephenw10
                        last edited by

                        @stephenw10 I believe I have narrowed the issue down to the tailscale package. I noticed when I came back from vacation that the firewall had been up over 8 days w/o a crash.

                        Checking the logs showed that either PHP or PHP-CGI was exiting on signal 11 with a core dump, and the services section showed that tailscale wasn't running either.

                        On a hunch I started the tailscale service yesterday morning to see if a crash would happen. Sure enough, last night it crashed again.

                        Attached is the latest dump. textdump.tar.0

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          So you had disabled tailscale while you were away? Or it had stopped by itself and then crashed after you restarted it?

                          Steve

                          H 1 Reply Last reply Reply Quote 0
                          • H
                            homer2320776 @stephenw10
                            last edited by

                            @stephenw10 tailscale had crashed apparently, but the connections it made we're still running so I didn't notice the service itself was down.

                            I restarted the service yesterday morning to see if it was the cause of the crashes, then this morning when I logged in, I saw the crash report.

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Mmm, not familiar to me. Let me see if any one else has seen it....

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.