Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-4860 crashing daily

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 2 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Important parts are:

      db:0:kdb.enter.default>  bt
      Tracing pid 35301 tid 100294 td 0xfffff8017297e740
      kdb_enter() at kdb_enter+0x37/frame 0xfffffe004d3c97d0
      vpanic() at vpanic+0x194/frame 0xfffffe004d3c9820
      panic() at panic+0x43/frame 0xfffffe004d3c9880
      trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004d3c98e0
      calltrap() at calltrap+0x8/frame 0xfffffe004d3c98e0
      --- trap 0x9, rip = 0xffffffff8120594b, rsp = 0xfffffe004d3c99b0, rbp = 0xfffffe004d3c99c0 ---
      vm_radix_remove() at vm_radix_remove+0x1b/frame 0xfffffe004d3c99c0
      vm_page_free_prep() at vm_page_free_prep+0x55/frame 0xfffffe004d3c99e0
      vm_page_free_toq() at vm_page_free_toq+0x12/frame 0xfffffe004d3c9a10
      vm_object_page_remove() at vm_object_page_remove+0x61/frame 0xfffffe004d3c9a70
      vm_map_entry_delete() at vm_map_entry_delete+0xff/frame 0xfffffe004d3c9ac0
      vm_map_delete() at vm_map_delete+0x184/frame 0xfffffe004d3c9b20
      vm_map_remove() at vm_map_remove+0xab/frame 0xfffffe004d3c9b50
      vmspace_exit() at vmspace_exit+0xcb/frame 0xfffffe004d3c9b90
      exit1() at exit1+0x51c/frame 0xfffffe004d3c9bf0
      sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe004d3c9c00
      amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe004d3c9d30
      fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe004d3c9d30
      --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8003eb74a, rsp = 0x7fffffffeb38, rbp = 0x7fffffffeb50 ---
      
      Fatal trap 9: general protection fault while in kernel mode
      cpuid = 3; apic id = 06
      instruction pointer	= 0x20:0xffffffff8120594b
      stack pointer	        = 0x28:0xfffffe004d3c99b0
      frame pointer	        = 0x28:0xfffffe004d3c99c0
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 35301 (egrep)
      trap number		= 9
      panic: general protection fault
      cpuid = 3
      time = 1659503270
      KDB: enter: panic
      

      Neither of which point to anything specific unfortunately.

      Are the crash reports always identical? Or close to it?

      Steve

      H 1 Reply Last reply Reply Quote 0
      • H
        homer2320776 @stephenw10
        last edited by

        @stephenw10 Thanks for the quick reply. I'm attaching 2 older reports and will collect future ones.

        textdump.tar
        textdump1.tar

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          One of those appears to be the same file you posted earlier. The other one is different though:

          db:0:kdb.enter.default>  bt
          Tracing pid 89157 tid 100203 td 0xfffff8017e1a2000
          kdb_enter() at kdb_enter+0x37/frame 0xfffffe004cdc4ab0
          vpanic() at vpanic+0x194/frame 0xfffffe004cdc4b00
          panic() at panic+0x43/frame 0xfffffe004cdc4b60
          trap_fatal() at trap_fatal+0x38f/frame 0xfffffe004cdc4bc0
          trap_pfault() at trap_pfault+0x4f/frame 0xfffffe004cdc4c20
          trap() at trap+0x425/frame 0xfffffe004cdc4d30
          calltrap() at calltrap+0x8/frame 0xfffffe004cdc4d30
          --- trap 0xc, rip = 0x8004ed10c, rsp = 0x7fffdfffdd60, rbp = 0x7fffdfffddc0 ---
          
          Fatal trap 12: page fault while in user mode
          cpuid = 2; apic id = 04
          fault virtual address	= 0x800a008c8
          fault code		= user read data, reserved bits in PTE
          instruction pointer	= 0x43:0x8004ed10c
          stack pointer	        = 0x3b:0x7fffdfffdd60
          frame pointer	        = 0x3b:0x7fffdfffddc0
          code segment		= base 0x0, limit 0xfffff, type 0x1b
          			= DPL 3, pres 1, long 1, def32 0, gran 1
          processor eflags	= interrupt enabled, resume, IOPL = 0
          current process		= 89157 (charon)
          trap number		= 12
          panic: page fault
          cpuid = 2
          time = 1659406768
          KDB: enter: panic
          

          Very different crash reports like that starts to look like a hardware issue.

          You think this started happening after installing Wireguard?

          Or after upgrading to 22.05 maybe?

          Steve

          H 1 Reply Last reply Reply Quote 0
          • H
            homer2320776 @stephenw10
            last edited by

            @stephenw10 I found some more crash logs that I had sent to myself over Telegram. Hopefully these might shed some light.

            textdump0.tar
            textdump1.tar
            textdump2.tar

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Mmm, those are all different. That is looking more like a memory fault unfortunately.

              Are you able to try a clean install of 22.05?

              Steve

              H 1 Reply Last reply Reply Quote 0
              • H
                homer2320776 @stephenw10
                last edited by

                @stephenw10 This is currently the production firewall for this location. I purchased a XG-1537 last year and a stack of new switches to install but haven't scheduled a time to replace it all.

                I'll try to reload the 4860 after everything stabilizes.

                Last nights crash dump.
                textdump.tar.0

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Mmm, another similar crash but different panic. Again it doesn't point to any specific thing and looks increasingly like a hardware issue unfortunately.

                  Steve

                  H 1 Reply Last reply Reply Quote 0
                  • H
                    homer2320776 @stephenw10
                    last edited by

                    @stephenw10 The device hadn't crashed in a few days, but this morning it has a PHP crash log as well.

                    [12-Aug-2022 00:42:00 UTC] PHP Warning:  Static function mbereg_search() cannot be abstract in Unknown on line 0
                    

                    textdump.tar.0

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, that looks different, more like it just ran out of memory.

                      That also ties in with this:
                      <6>pid 71216 (unbound), jid 0, uid 59: exited on signal 11

                      If you check the monitoring graphs in Status > Monitoring do you see memory usage increasing with time?

                      H 1 Reply Last reply Reply Quote 0
                      • H
                        homer2320776 @stephenw10
                        last edited by

                        @stephenw10 I checked the memory graph for a 2 day period with 5 min resolution and didn't see the free memory decrease except during the crashes.

                        bf256829-12a8-47da-ab13-85b2767805f6-image.png

                        I'll keep a watch for anything new.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Mmm, I agree, it doesn't look like it's exhausting the memory directly.

                          H 1 Reply Last reply Reply Quote 0
                          • H
                            homer2320776 @stephenw10
                            last edited by

                            @stephenw10 I believe I have narrowed the issue down to the tailscale package. I noticed when I came back from vacation that the firewall had been up over 8 days w/o a crash.

                            Checking the logs showed that either PHP or PHP-CGI was exiting on signal 11 with a core dump, and the services section showed that tailscale wasn't running either.

                            On a hunch I started the tailscale service yesterday morning to see if a crash would happen. Sure enough, last night it crashed again.

                            Attached is the latest dump. textdump.tar.0

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              So you had disabled tailscale while you were away? Or it had stopped by itself and then crashed after you restarted it?

                              Steve

                              H 1 Reply Last reply Reply Quote 0
                              • H
                                homer2320776 @stephenw10
                                last edited by

                                @stephenw10 tailscale had crashed apparently, but the connections it made we're still running so I didn't notice the service itself was down.

                                I restarted the service yesterday morning to see if it was the cause of the crashes, then this morning when I logged in, I saw the crash report.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Mmm, not familiar to me. Let me see if any one else has seen it....

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.