Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfSense Crash at a randem time and wont fully reboot

    Scheduled Pinned Locked Moved General pfSense Questions
    6 Posts 2 Posters 944 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      shaddow
      last edited by

      Hello All

      I have a problem and it might have been answered some wear else so if so please point me in the right direction.

      One of the pfsense boxes gets a crash at random times and when it reboots it does not finish the boot.

      Yes all is up and running as such but pfblockerng will not do a update as it states its still in boot mode.
      Also openvpn will not allow users to connect, only Peer to Peer.

      It doesn't get to the command line in the console, but if I CtrC it goes to sh.

      Of note the unit is on xcp-ng, its been working great for so long but now days it comes up with this problem, the last took 20 days before this happened, and only this virtual, not the others.

      The xcp-ng still has so much resource not in use.

      I have attached the dump file textdump.tar

      when it does reboot I see upto the error of xenguest but after that nothing but the "config_aqm".
      I have got to the point of having a working image to revert back too, this is the only quick way to get the thing back up and running for now.

      But I hope someone can help

      Shane.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        The important parts of that crash are the the backtrace:

        db:0:kdb.enter.default>  bt
        Tracing pid 12 tid 100122 td 0xfffff80008304000
        kdb_enter() at kdb_enter+0x37/frame 0xfffffe00020cf240
        vpanic() at vpanic+0x197/frame 0xfffffe00020cf290
        panic() at panic+0x43/frame 0xfffffe00020cf2f0
        trap_fatal() at trap_fatal+0x391/frame 0xfffffe00020cf350
        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00020cf3a0
        trap() at trap+0x286/frame 0xfffffe00020cf4b0
        calltrap() at calltrap+0x8/frame 0xfffffe00020cf4b0
        --- trap 0xc, rip = 0xffffffff8109c7fa, rsp = 0xfffffe00020cf580, rbp = 0xfffffe00020cf5f0 ---
        pf_test_state_udp() at pf_test_state_udp+0x2ba/frame 0xfffffe00020cf5f0
        pf_test() at pf_test+0x1db8/frame 0xfffffe00020cf830
        pf_check_in() at pf_check_in+0x1d/frame 0xfffffe00020cf850
        pfil_run_hooks() at pfil_run_hooks+0xa1/frame 0xfffffe00020cf8f0
        ip_tryforward() at ip_tryforward+0x193/frame 0xfffffe00020cf970
        ip_input() at ip_input+0x3fe/frame 0xfffffe00020cfa20
        netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00020cfa70
        ether_demux() at ether_demux+0x16a/frame 0xfffffe00020cfaa0
        ether_nh_input() at ether_nh_input+0x330/frame 0xfffffe00020cfb00
        netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00020cfb50
        ether_input() at ether_input+0x4b/frame 0xfffffe00020cfb80
        xn_rxeof() at xn_rxeof+0x55d/frame 0xfffffe00020cfc50
        xn_intr() at xn_intr+0x58/frame 0xfffffe00020cfc90
        ithread_loop() at ithread_loop+0x23c/frame 0xfffffe00020cfcf0
        fork_exit() at fork_exit+0x7e/frame 0xfffffe00020cfd30
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00020cfd30
        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
        

        And the panic:

        Fatal trap 12: page fault while in kernel mode
        cpuid = 0; apic id = 00
        fault virtual address	= 0x0
        fault code		= supervisor read data, page not present
        instruction pointer	= 0x20:0xffffffff8109c7fa
        stack pointer	        = 0x0:0xfffffe00020cf580
        frame pointer	        = 0x0:0xfffffe00020cf5f0
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 12 (irq2343: xn1)
        trap number		= 12
        panic: page fault
        cpuid = 0
        time = 1656263327
        KDB: enter: panic
        

        But also we can see the logs show a lot of:

        config_aqm Unable to configure flowset, flowset busy!
        config_aqm Unable to configure flowset, flowset busy!
        

        I assume you're running Limiters. How are they configured?

        It looks you're running 2.5.2 there, is there any specific reason for that?

        If it is some bug you're hitting it will not be fixed in 2.5.2. You should upgrade to 2.6 or 22.05.

        Steve

        1 Reply Last reply Reply Quote 0
        • S
          shaddow
          last edited by

          Sorry for the slow reply

          I am running 2.5 as for some reason it would not install 2.6 over its 2.5 but I am looking at upgrade soon.
          The config_aqm is because of limiters at the time.
          But what I found on my side was when the page fault happened it did not fully reboot, I had removed a few functions like snmp and it became stable for a bit but in the last two days its happened again, but at least its booting fully now.

          I have added the dumps.

          11Aug
          textdump.tar.0
          info.0

          12Aug
          textdump.tar.0
          info.0

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Hmm, well the good thing there is that those are all identical backtraces so it almost certainly is a software bug of some sort.
            The bad news is that it looks to be in the xn(4) driver which is a lot less common than it once was.

            However since it also appears to be in udp the first thing to do here is make sure you disable all hardware off-loading in Sys > Adv > Networking. There are known bugs there.

            Steve

            1 Reply Last reply Reply Quote 0
            • S
              shaddow
              last edited by

              Steve

              First up, Thanks for the help.

              On the settings, what is checked is
              Disable hardware checksum offload,
              Disable hardware TCP segmentation offload,
              Disable hardware large receive offload,

              But Also ticked is "Enable the ALTQ support for hn NICs"

              But also I could change the nic from Intel e1000 to Realtek rtl8139 as at the moment this is on XCP-ng but not sure if that would work, what do you think.

              I am looking at doing a full install of 2.6 I hope in a month or so, but this is a production machine and so I have to do this during a quiet time in the office.

              Shane

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                @shaddow said in pfSense Crash at a randem time and wont fully reboot:

                Enable the ALTQ support for hn NICs

                That only does anything for hn(4) NICs so Hyper-V or Azure. It doesn't matter here.

                pfSense only sees the Xen NIC so changing it from Intel to Realtek would only make any difference if you enabled hardware pass through.

                Check the output of: ifconfig -vm xn0
                Make sure the hardware off-loading options are actually disabled.

                Steve

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.