Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-1100: unexpexcted reboots and vm_fault in logs - how to diagnose?

    Scheduled Pinned Locked Moved Hardware
    27 Posts 2 Posters 2.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by stephenw10

      Ok well that's definitely a kernel panic.

      What you want from the start of the panic output. So the initial panic string and the backtrace (bt>)

      So as an example:

      0:kdb.enter.default>  run pfs
      db:1:pfs> bt
      Tracing pid 0 tid 100007 td 0xfffffe00119bd720
      kdb_enter() at kdb_enter+0x32/frame 0xfffffe00101a86c0
      vpanic() at vpanic+0x182/frame 0xfffffe00101a8710
      panic() at panic+0x43/frame 0xfffffe00101a8770
      trap_fatal() at trap_fatal+0x409/frame 0xfffffe00101a87d0
      trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00101a8830
      calltrap() at calltrap+0x8/frame 0xfffffe00101a8830
      --- trap 0xc, rip = 0xffffffff80f66369, rsp = 0xfffffe00101a8900, rbp = 0xfffffe00101a8930 ---
      pppoe_findsession() at pppoe_findsession+0x79/frame 0xfffffe00101a8930
      ng_pppoe_rcvdata_ether() at ng_pppoe_rcvdata_ether+0x461/frame 0xfffffe00101a89b0
      ng_apply_item() at ng_apply_item+0x2bf/frame 0xfffffe00101a8a40
      ng_snd_item() at ng_snd_item+0x28e/frame 0xfffffe00101a8a80
      ether_demux() at ether_demux+0x212/frame 0xfffffe00101a8ab0
      ether_nh_input() at ether_nh_input+0x353/frame 0xfffffe00101a8b10
      netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00101a8b60
      ether_input() at ether_input+0x69/frame 0xfffffe00101a8bc0
      ether_demux() at ether_demux+0x9e/frame 0xfffffe00101a8bf0
      ether_nh_input() at ether_nh_input+0x353/frame 0xfffffe00101a8c50
      netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00101a8ca0
      ether_input() at ether_input+0x69/frame 0xfffffe00101a8d00
      iflib_rxeof() at iflib_rxeof+0xbdb/frame 0xfffffe00101a8e00
      _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00101a8e40
      gtaskqueue_run_locked() at gtaskqueue_run_locked+0x15d/frame 0xfffffe00101a8ec0
      gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc3/frame 0xfffffe00101a8ef0
      fork_exit() at fork_exit+0x7e/frame 0xfffffe00101a8f30
      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00101a8f30
      --- trap 0x107a772c, rip = 0x11d295625b1b31a1, rsp = 0xf02460003c98dfb3, rbp = 0x41abfa0065646f ---
      db:1:pfs>  show registers
      cs                        0x20
      ds                        0x3b
      es                        0x3b
      fs                        0x13
      gs                        0x1b
      ss                        0x28
      rax                       0x12
      rcx                        0x1
      rdx         0xfffffe00101a82e0
      rbx                      0x100
      rsp         0xfffffe00101a86c0
      rbp         0xfffffe00101a86c0
      rsi                          0
      rdi         0xffffffff83183f98  vt_conswindow+0x10
      r8                           0
      r9                  0x1c200001
      r10         0xffffffff83183f88  vt_conswindow
      r11                       0x20
      r12             0x2cdc1f807000
      r13         0xfffffe00101a8840
      r14         0xfffffe00101a8750
      r15         0xfffffe00119bd720
      rip         0xffffffff80dd82f2  kdb_enter+0x32
      rflags                    0x82
      kdb_enter+0x32: movq    $0,0x27bd313(%rip)
      db:1:pfs>  show pcpu
      cpuid        = 0
      dynamic pcpu = 0xbf6800
      curthread    = 0xfffffe00119bd720: pid 0 tid 100007 critnest 1 "if_io_tqg_0"
      curpcb       = 0xfffffe00119bdc40
      fpcurthread  = none
      idlethread   = 0xfffffe00119bf3a0: tid 100003 "idle: cpu0"
      self         = 0xffffffff84010000
      curpmap      = 0xffffffff83549750
      tssp         = 0xffffffff84010384
      rsp0         = 0xfffffe00101a9000
      kcr3         = 0x8000000081c3e002
      ucr3         = 0xffffffffffffffff
      scr3         = 0x1958cac7f
      gs32p        = 0xffffffff84010404
      ldt          = 0xffffffff84010444
      tss          = 0xffffffff84010434
      curvnet      = 0xfffff800011c7a00
      
      P 1 Reply Last reply Reply Quote 0
      • P
        Pizzamaka @stephenw10
        last edited by

        @stephenw10 thanks! Got it - what should I do with it?

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          You can upload it here and I'll check it:
          https://nc.netgate.com/nextcloud/s/sFoGNTrcoDypsx5

          P 1 Reply Last reply Reply Quote 0
          • P
            Pizzamaka @stephenw10
            last edited by

            @stephenw10 done :-)

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Ok. Backtrace:

              db:1:pfs> bt
              Tracing pid 0 tid 100131 td 0xffff00008ae04640
              db_trace_self() at db_trace_self
              db_stack_trace() at db_stack_trace+0x120
              db_command() at db_command+0x368
              db_script_exec() at db_script_exec+0x1ac
              db_command() at db_command+0x368
              db_script_exec() at db_script_exec+0x1ac
              db_script_kdbenter() at db_script_kdbenter+0x5c
              db_trap() at db_trap+0xfc
              kdb_trap() at kdb_trap+0x314
              handle_el1h_sync() at handle_el1h_sync+0x18
              --- exception, esr 0xf2000000
              kdb_enter() at kdb_enter+0x4c
              vpanic() at vpanic+0x1e0
              panic() at panic+0x48
              vm_fault() at vm_fault+0x1780
              vm_fault_trap() at vm_fault_trap+0xa0
              data_abort() at data_abort+0xc8
              handle_el1h_sync() at handle_el1h_sync+0x18
              --- exception, esr 0x8600000f
              $d.2() at $d.2+0xc29
              range_tree_add_impl() at range_tree_add_impl+0x8c
              metaslab_alloc_dva() at metaslab_alloc_dva+0xf48
              metaslab_alloc() at metaslab_alloc+0xcc
              zio_dva_allocate() at zio_dva_allocate+0xb8
              zio_execute() at zio_execute+0x58
              taskqueue_run_locked() at taskqueue_run_locked+0x194
              taskqueue_thread_loop() at taskqueue_thread_loop+0x134
              fork_exit() at fork_exit+0x8c
              fork_trampoline() at fork_trampoline+0x18
              

              Mmm, that looks like a drive or filesystem issue. Did you reinstall this clean? If not I would probably try that to rule out any filesytstem issue.

              P 1 Reply Last reply Reply Quote 0
              • P
                Pizzamaka @stephenw10
                last edited by

                @stephenw10 I did install it clean some 2 months ago, but I can retry - will come back after that.
                I already suspected something with storage in the past, but couldn't find any hint that shows a failing storage (I did check usage level as in the docs for the SG 1100, but that seems good).

                1 Reply Last reply Reply Quote 0
                • P
                  Pizzamaka
                  last edited by

                  Quick update to anyone stumbling on this:
                  The reboots kept coming in an irregular way. What seemed to help was reducing the number of feeds for pfBlockerNG (even though memory did not seem to be the probelm). At some point I installed 24.11 RC and then 24.11 final. That seemed to finally do the trick: I had an uptime of some 9 days.

                  For me the issue is closed, since I recently upgraded to a SG-2100 that I was able to get for a good price. Interestingly even though memory never seemd to be the problem now I see the CPU also running at a lower average (.2 vs .5 before).

                  1 Reply Last reply Reply Quote 1
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.