Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-4860 crashing daily

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 2 Posters 1.6k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S Online
      stephenw10 Netgate Administrator
      last edited by

      Important parts are:

      db:0:kdb.enter.default>  bt
      Tracing pid 35301 tid 100294 td 0xfffff8017297e740
      kdb_enter() at kdb_enter+0x37/frame 0xfffffe004d3c97d0
      vpanic() at vpanic+0x194/frame 0xfffffe004d3c9820
      panic() at panic+0x43/frame 0xfffffe004d3c9880
      trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004d3c98e0
      calltrap() at calltrap+0x8/frame 0xfffffe004d3c98e0
      --- trap 0x9, rip = 0xffffffff8120594b, rsp = 0xfffffe004d3c99b0, rbp = 0xfffffe004d3c99c0 ---
      vm_radix_remove() at vm_radix_remove+0x1b/frame 0xfffffe004d3c99c0
      vm_page_free_prep() at vm_page_free_prep+0x55/frame 0xfffffe004d3c99e0
      vm_page_free_toq() at vm_page_free_toq+0x12/frame 0xfffffe004d3c9a10
      vm_object_page_remove() at vm_object_page_remove+0x61/frame 0xfffffe004d3c9a70
      vm_map_entry_delete() at vm_map_entry_delete+0xff/frame 0xfffffe004d3c9ac0
      vm_map_delete() at vm_map_delete+0x184/frame 0xfffffe004d3c9b20
      vm_map_remove() at vm_map_remove+0xab/frame 0xfffffe004d3c9b50
      vmspace_exit() at vmspace_exit+0xcb/frame 0xfffffe004d3c9b90
      exit1() at exit1+0x51c/frame 0xfffffe004d3c9bf0
      sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe004d3c9c00
      amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe004d3c9d30
      fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe004d3c9d30
      --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8003eb74a, rsp = 0x7fffffffeb38, rbp = 0x7fffffffeb50 ---
      
      Fatal trap 9: general protection fault while in kernel mode
      cpuid = 3; apic id = 06
      instruction pointer	= 0x20:0xffffffff8120594b
      stack pointer	        = 0x28:0xfffffe004d3c99b0
      frame pointer	        = 0x28:0xfffffe004d3c99c0
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 35301 (egrep)
      trap number		= 9
      panic: general protection fault
      cpuid = 3
      time = 1659503270
      KDB: enter: panic
      

      Neither of which point to anything specific unfortunately.

      Are the crash reports always identical? Or close to it?

      Steve

      H 1 Reply Last reply Reply Quote 0
      • H Offline
        homer2320776 @stephenw10
        last edited by

        @stephenw10 Thanks for the quick reply. I'm attaching 2 older reports and will collect future ones.

        textdump.tar
        textdump1.tar

        1 Reply Last reply Reply Quote 0
        • stephenw10S Online
          stephenw10 Netgate Administrator
          last edited by

          One of those appears to be the same file you posted earlier. The other one is different though:

          db:0:kdb.enter.default>  bt
          Tracing pid 89157 tid 100203 td 0xfffff8017e1a2000
          kdb_enter() at kdb_enter+0x37/frame 0xfffffe004cdc4ab0
          vpanic() at vpanic+0x194/frame 0xfffffe004cdc4b00
          panic() at panic+0x43/frame 0xfffffe004cdc4b60
          trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004cdc4bc0
          trap_pfault() at trap_pfault+0x4f/frame 0xfffffe004cdc4c20
          trap() at trap+0x425/frame 0xfffffe004cdc4d30
          calltrap() at calltrap+0x8/frame 0xfffffe004cdc4d30
          --- trap 0xc, rip = 0x8004ed10c, rsp = 0x7fffdfffdd60, rbp = 0x7fffdfffddc0 ---
          
          Fatal trap 12: page fault while in user mode
          cpuid = 2; apic id = 04
          fault virtual address	= 0x800a008c8
          fault code		= user read data, reserved bits in PTE
          instruction pointer	= 0x43:0x8004ed10c
          stack pointer	        = 0x3b:0x7fffdfffdd60
          frame pointer	        = 0x3b:0x7fffdfffddc0
          code segment		= base 0x0, limit 0xfffff, type 0x1b
          			= DPL 3, pres 1, long 1, def32 0, gran 1
          processor eflags	= interrupt enabled, resume, IOPL = 0
          current process		= 89157 (charon)
          trap number		= 12
          panic: page fault
          cpuid = 2
          time = 1659406768
          KDB: enter: panic
          

          Very different crash reports like that starts to look like a hardware issue.

          You think this started happening after installing Wireguard?

          Or after upgrading to 22.05 maybe?

          Steve

          H 1 Reply Last reply Reply Quote 0
          • H Offline
            homer2320776 @stephenw10
            last edited by

            @stephenw10 I found some more crash logs that I had sent to myself over Telegram. Hopefully these might shed some light.

            textdump0.tar
            textdump1.tar
            textdump2.tar

            1 Reply Last reply Reply Quote 0
            • stephenw10S Online
              stephenw10 Netgate Administrator
              last edited by

              Mmm, those are all different. That is looking more like a memory fault unfortunately.

              Are you able to try a clean install of 22.05?

              Steve

              H 1 Reply Last reply Reply Quote 0
              • H Offline
                homer2320776 @stephenw10
                last edited by

                @stephenw10 This is currently the production firewall for this location. I purchased a XG-1537 last year and a stack of new switches to install but haven't scheduled a time to replace it all.

                I'll try to reload the 4860 after everything stabilizes.

                Last nights crash dump.
                textdump.tar.0

                1 Reply Last reply Reply Quote 0
                • stephenw10S Online
                  stephenw10 Netgate Administrator
                  last edited by

                  Mmm, another similar crash but different panic. Again it doesn't point to any specific thing and looks increasingly like a hardware issue unfortunately.

                  Steve

                  H 1 Reply Last reply Reply Quote 0
                  • H Offline
                    homer2320776 @stephenw10
                    last edited by

                    @stephenw10 The device hadn't crashed in a few days, but this morning it has a PHP crash log as well.

                    [12-Aug-2022 00:42:00 UTC] PHP Warning:  Static function mbereg_search() cannot be abstract in Unknown on line 0
                    

                    textdump.tar.0

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S Online
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, that looks different, more like it just ran out of memory.

                      That also ties in with this:
                      <6>pid 71216 (unbound), jid 0, uid 59: exited on signal 11

                      If you check the monitoring graphs in Status > Monitoring do you see memory usage increasing with time?

                      H 1 Reply Last reply Reply Quote 0
                      • H Offline
                        homer2320776 @stephenw10
                        last edited by

                        @stephenw10 I checked the memory graph for a 2 day period with 5 min resolution and didn't see the free memory decrease except during the crashes.

                        bf256829-12a8-47da-ab13-85b2767805f6-image.png

                        I'll keep a watch for anything new.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S Online
                          stephenw10 Netgate Administrator
                          last edited by

                          Mmm, I agree, it doesn't look like it's exhausting the memory directly.

                          H 1 Reply Last reply Reply Quote 0
                          • H Offline
                            homer2320776 @stephenw10
                            last edited by

                            @stephenw10 I believe I have narrowed the issue down to the tailscale package. I noticed when I came back from vacation that the firewall had been up over 8 days w/o a crash.

                            Checking the logs showed that either PHP or PHP-CGI was exiting on signal 11 with a core dump, and the services section showed that tailscale wasn't running either.

                            On a hunch I started the tailscale service yesterday morning to see if a crash would happen. Sure enough, last night it crashed again.

                            Attached is the latest dump. textdump.tar.0

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S Online
                              stephenw10 Netgate Administrator
                              last edited by

                              So you had disabled tailscale while you were away? Or it had stopped by itself and then crashed after you restarted it?

                              Steve

                              H 1 Reply Last reply Reply Quote 0
                              • H Offline
                                homer2320776 @stephenw10
                                last edited by

                                @stephenw10 tailscale had crashed apparently, but the connections it made we're still running so I didn't notice the service itself was down.

                                I restarted the service yesterday morning to see if it was the cause of the crashes, then this morning when I logged in, I saw the crash report.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S Online
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Mmm, not familiar to me. Let me see if any one else has seen it....

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.