Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Random crash on latest 23.09.1

    Scheduled Pinned Locked Moved General pfSense Questions
    21 Posts 3 Posters 2.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Y
      Yathus @Yathus
      last edited by

      reboot done, we'll see ;-)

      Y 1 Reply Last reply Reply Quote 1
      • Y
        Yathus @Yathus
        last edited by

        i had my first crash on a vmotion on the secondary pfsense :

        Fatal trap 12: page fault while in kernel mode
        cpuid = 3; apic id = 03
        fault virtual address	= 0x0
        fault code		= supervisor read data, page not present
        instruction pointer	= 0x20:0xffffffff80fb1c0a
        stack pointer	        = 0x0:0xfffffe000859f7d0
        frame pointer	        = 0x0:0xfffffe000859f920
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 0 (if_io_tqg_3)
        rdi: 0000000000000000 rsi: fffff800b4b9c07a rdx: 0000000000000000
        rcx: 0000000005966257  r8: 00000000a1990c31  r9: 0000000023e34fa7
        rax: 0000000000000002 rbx: fffff800b4b9c000 rbp: fffffe000859f920
        r10: 0000000000003354 r11: fffff800b4b9c000 r12: fffffe000859f980
        r13: 0000000000000000 r14: 0000000000000000 r15: fffff8000cce2608
        trap number		= 12
        panic: page fault
        cpuid = 3
        time = 1707492630
        KDB: enter: panic
        

        i vmotion the primary and no crash...

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          We need to see the backtrace to know more there.

          Y 1 Reply Last reply Reply Quote 0
          • Y
            Yathus @stephenw10
            last edited by

            @stephenw10 i upload files in your nextcloud link.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Backtrace:

              db:1:pfs> bt
              Tracing pid 0 tid 100014 td 0xfffffe000932a740
              kdb_enter() at kdb_enter+0x32/frame 0xfffffe000859f4b0
              vpanic() at vpanic+0x163/frame 0xfffffe000859f5e0
              panic() at panic+0x43/frame 0xfffffe000859f640
              trap_fatal() at trap_fatal+0x40c/frame 0xfffffe000859f6a0
              trap_pfault() at trap_pfault+0x4f/frame 0xfffffe000859f700
              calltrap() at calltrap+0x8/frame 0xfffffe000859f700
              --- trap 0xc, rip = 0xffffffff80fb1c0a, rsp = 0xfffffe000859f7d0, rbp = 0xfffffe000859f920 ---
              pf_test_state_tcp() at pf_test_state_tcp+0x125a/frame 0xfffffe000859f920
              pf_test() at pf_test+0x1353/frame 0xfffffe000859fac0
              pf_check_in() at pf_check_in+0x27/frame 0xfffffe000859fae0
              pfil_mbuf_in() at pfil_mbuf_in+0x38/frame 0xfffffe000859fb10
              ip_input() at ip_input+0x3ae/frame 0xfffffe000859fb70
              netisr_dispatch_src() at netisr_dispatch_src+0x22c/frame 0xfffffe000859fbc0
              ether_demux() at ether_demux+0x149/frame 0xfffffe000859fbf0
              ether_nh_input() at ether_nh_input+0x36e/frame 0xfffffe000859fc50
              netisr_dispatch_src() at netisr_dispatch_src+0xaf/frame 0xfffffe000859fca0
              ether_input() at ether_input+0x69/frame 0xfffffe000859fd00
              iflib_rxeof() at iflib_rxeof+0xc46/frame 0xfffffe000859fe00
              _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe000859fe40
              gtaskqueue_run_locked() at gtaskqueue_run_locked+0x14e/frame 0xfffffe000859fec0
              gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe000859fef0
              fork_exit() at fork_exit+0x7f/frame 0xfffffe000859ff30
              fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000859ff30
              --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
              

              So not the same issue.

              Seems similar to a few other bugs but not identical.
              The message buffer shows it failing back and forth between the nodes a few times was that expected?

              K 1 Reply Last reply Reply Quote 0
              • K
                kprovost @stephenw10
                last edited by

                @stephenw10 said in Random crash on latest 23.09.1:

                That last backtrace decodes to /var/jenkins/workspace/pfSense-Plus-snapshots-23_09_1-main/sources/FreeBSD-src-plus-RELENG_23_09_1/sys/netpfil/pf/pf.c:5743, which is in pf_test_state_tcp(), where it applies NAT. It likely means that the state has a NULL key (pf_kstate->key[]).
                It's not clear to me how that'd happen. Speculatively, perhaps there's a race on state insertion, or there's something wrong in the pfsync state transfer. A full core dump might be helpful here, if this can be reproduced.

                Y 1 Reply Last reply Reply Quote 0
                • Y
                  Yathus @kprovost
                  last edited by

                  @kprovost how can i have a full core dump ?

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    You can just set the ddb file to dump rather than textdump but you need enough SWAP space to dump to and 1GB probably isn't enough.

                    So you can reinstall with more swap space or add SWAP somehow. For example: https://forum.netgate.com/post/1127502

                    Y 1 Reply Last reply Reply Quote 0
                    • Y
                      Yathus @stephenw10
                      last edited by

                      @stephenw10 i add a second disk to VM and i have now a 12Go SWAP.

                      My config was :

                      #script kdb.enter.default=textdump set; capture on; run pfs ; capture off; textdump dump; reset
                      

                      Replaced by :

                      script kdb.enter.default=bt ; show registers ; dump ; reset
                      

                      I reboot too.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Great. You can check that it's working as expected by forcing a panic and seeing if the kernel core dump is created.

                        Running: sysctl debug.kdb.panic=1 will panic the system immediately and should create the core dump.

                        Steve

                        Y 1 Reply Last reply Reply Quote 0
                        • Y
                          Yathus @stephenw10
                          last edited by

                          @stephenw10 i test your command on my "backup" pfsense and it's worked, got a 1Go file vmcore.0
                          So we just have to wait now...

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.