Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Panic String: page fault after migrating to a baremetal install

    Scheduled Pinned Locked Moved Hardware
    11 Posts 2 Posters 289 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K
      KingCobra021
      last edited by

      I had my pfSense installation on a Proxmox VM, and I recently moved it to a bare-metal installation on the same hardware, importing the XML configs from the VM. Lately, I have been getting page fault errors that for the life of me cannot figure out what is causing them. I was using a Realtek 2.5 G NIC that didn't have a native driver, so I had to download one, which I have since removed the driver and replaced it with an Intel 2.5 G NIC with native driver support, but I am still getting the page faults.

      info.0
      textdump.tar.0

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Backtrace:

        db:1:pfs> bt
        Tracing pid 7 tid 100234 td 0xfffff80102ee0000
        kdb_enter() at kdb_enter+0x33/frame 0xfffffe0103ee2c40
        panic() at panic+0x43/frame 0xfffffe0103ee2ca0
        trap_fatal() at trap_fatal+0x40b/frame 0xfffffe0103ee2d00
        trap_pfault() at trap_pfault+0x46/frame 0xfffffe0103ee2d50
        calltrap() at calltrap+0x8/frame 0xfffffe0103ee2d50
        --- trap 0xc, rip = 0xffffffff80fcc88b, rsp = 0xfffffe0103ee2e20, rbp = 0xfffffe0103ee2e40 ---
        pf_state_expires() at pf_state_expires+0xb/frame 0xfffffe0103ee2e40
        pf_purge_expired_states() at pf_purge_expired_states+0xd8/frame 0xfffffe0103ee2e90
        pf_purge_thread() at pf_purge_thread+0x15b/frame 0xfffffe0103ee2ef0
        fork_exit() at fork_exit+0x7b/frame 0xfffffe0103ee2f30
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0103ee2f30
        --- trap 0x5668643d, rip = 0xde2f6d95d26f6d91, rsp = 0x86e709278aa70923, rbp = 0x3ad825083698250c ---
        

        Panic:

        Fatal trap 12: page fault while in kernel mode
        cpuid = 11; apic id = 0b
        fault virtual address	= 0x100000012
        fault code		= supervisor read data, page not present
        instruction pointer	= 0x20:0xffffffff80fcc88b
        stack pointer	        = 0x28:0xfffffe0103ee2e20
        frame pointer	        = 0x28:0xfffffe0103ee2e40
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 7 (pf purge)
        rdi: 0000000100000000 rsi: 000000000000000c rdx: 0000000000000580
        rcx: 0000000000383b50  r8: 000000000000b000  r9: 0000000000000fff
        rax: fffffe01145a0a80 rbx: 00000000000b3f10 rbp: fffffe0103ee2e40
        r10: 0000000000001388 r11: 00000000815eda0a r12: 0000000000000000
        r13: fffff80102ee0000 r14: 0000000100000000 r15: fffffe01145a0aa0
        trap number		= 12
        panic: page fault
        cpuid = 11
        time = 1751555338
        KDB: enter: panic
        

        Is that the same backtrace every time?

        It looks like this which was only seen one time: https://redmine.pfsense.org/issues/13417

        The message buffer is full of theses ARP movements:

        <6>arp: 10.27.27.19 moved from 48:b4:23:e1:a5:4b to f0:2f:74:7e:13:d0 on igc1
        <6>arp: 10.27.27.19 moved from f0:2f:74:7e:13:d0 to 48:b4:23:e1:a5:4b on igc1
        <6>arp: 10.27.27.19 moved from 48:b4:23:e1:a5:4b to f0:2f:74:7e:13:d0 on igc1
        <6>arp: 10.27.27.19 moved from f0:2f:74:7e:13:d0 to 48:b4:23:e1:a5:4b on igc1
        

        Is that expected?

        There is a Realtek NIC that's failing to attach. It looks like the same device failing 4 times. You should remove or disable that if you're not using it.

        re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xf000-0xf0ff mem 0xfc704000-0xfc704fff,0xfc700000-0xfc703fff at device 0.0 on pci4
        re0: Using 1 MSI-X message
        re0: Chip rev. 0x54000000
        re0: MAC rev. 0x00100000
        re0: attaching PHYs failed
        device_attach: re0 attach returned 6
        
        1 Reply Last reply Reply Quote 0
        • K
          KingCobra021
          last edited by

          thanks for the replay

          Is that the same backtrace every time?
          Yes
          Is that expected?
          That looks like the NIC's IP, so maybe
          The re0 is not in use, and the Realtek drives are disabled and deleted, but re0 is the motherboard's NIC, so I can't remove it, but I can disable it. Also since I removed the drivers, it's not even showing up as an interface for it to be enabled

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Is that IP actually a device that should be moving between MAC values though? Like a bonded link or some load balancer?
            If it's not you might have an IP conflict.

            Yes, you should disable that re NIC. It's only using resources.

            However neither should be causing that panic....

            K 1 Reply Last reply Reply Quote 0
            • K
              KingCobra021 @stephenw10
              last edited by

              @stephenw10

              So the device is expected to have multiple MACs, and I will disable the NIC through the terminal, but you said neither should cause the panic, so any idea where I can look to see what could be a possible cause?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Not yet, still digging.

                Is there anything in the main system log just before this happens?

                You have anything 'exotic' configured using rules? Scheduled rules maybe or UPnP?

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Do you have more reports we can review?

                  How often does this happen?

                  K 1 Reply Last reply Reply Quote 0
                  • K
                    KingCobra021 @stephenw10
                    last edited by

                    @stephenw10
                    Hi used to be every hour, but now that I disabled the Realtek NIC on the motherboard, it's less often Here are the new dump files and this time it seems like the panic didn't cause any outage, unlike before
                    info.0
                    textdump.tar.0

                    K 1 Reply Last reply Reply Quote 0
                    • K
                      KingCobra021 @KingCobra021
                      last edited by

                      @KingCobra021
                      info.0 textdump.tar.0

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Mmm, OK that last one is a completely different panic. You should run a memory test to rule out some RAM glitch because that would certainly explain it.

                        Completely unrelated panics like that are almost always hardware.

                        K 1 Reply Last reply Reply Quote 0
                        • K
                          KingCobra021 @stephenw10
                          last edited by

                          @stephenw10
                          ok i sure will do thanks

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.