Panic String: page fault after migrating to a baremetal install
-
I had my pfSense installation on a Proxmox VM, and I recently moved it to a bare-metal installation on the same hardware, importing the XML configs from the VM. Lately, I have been getting page fault errors that for the life of me cannot figure out what is causing them. I was using a Realtek 2.5 G NIC that didn't have a native driver, so I had to download one, which I have since removed the driver and replaced it with an Intel 2.5 G NIC with native driver support, but I am still getting the page faults.
-
Backtrace:
db:1:pfs> bt Tracing pid 7 tid 100234 td 0xfffff80102ee0000 kdb_enter() at kdb_enter+0x33/frame 0xfffffe0103ee2c40 panic() at panic+0x43/frame 0xfffffe0103ee2ca0 trap_fatal() at trap_fatal+0x40b/frame 0xfffffe0103ee2d00 trap_pfault() at trap_pfault+0x46/frame 0xfffffe0103ee2d50 calltrap() at calltrap+0x8/frame 0xfffffe0103ee2d50 --- trap 0xc, rip = 0xffffffff80fcc88b, rsp = 0xfffffe0103ee2e20, rbp = 0xfffffe0103ee2e40 --- pf_state_expires() at pf_state_expires+0xb/frame 0xfffffe0103ee2e40 pf_purge_expired_states() at pf_purge_expired_states+0xd8/frame 0xfffffe0103ee2e90 pf_purge_thread() at pf_purge_thread+0x15b/frame 0xfffffe0103ee2ef0 fork_exit() at fork_exit+0x7b/frame 0xfffffe0103ee2f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0103ee2f30 --- trap 0x5668643d, rip = 0xde2f6d95d26f6d91, rsp = 0x86e709278aa70923, rbp = 0x3ad825083698250c ---
Panic:
Fatal trap 12: page fault while in kernel mode cpuid = 11; apic id = 0b fault virtual address = 0x100000012 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80fcc88b stack pointer = 0x28:0xfffffe0103ee2e20 frame pointer = 0x28:0xfffffe0103ee2e40 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 7 (pf purge) rdi: 0000000100000000 rsi: 000000000000000c rdx: 0000000000000580 rcx: 0000000000383b50 r8: 000000000000b000 r9: 0000000000000fff rax: fffffe01145a0a80 rbx: 00000000000b3f10 rbp: fffffe0103ee2e40 r10: 0000000000001388 r11: 00000000815eda0a r12: 0000000000000000 r13: fffff80102ee0000 r14: 0000000100000000 r15: fffffe01145a0aa0 trap number = 12 panic: page fault cpuid = 11 time = 1751555338 KDB: enter: panic
Is that the same backtrace every time?
It looks like this which was only seen one time: https://redmine.pfsense.org/issues/13417
The message buffer is full of theses ARP movements:
<6>arp: 10.27.27.19 moved from 48:b4:23:e1:a5:4b to f0:2f:74:7e:13:d0 on igc1 <6>arp: 10.27.27.19 moved from f0:2f:74:7e:13:d0 to 48:b4:23:e1:a5:4b on igc1 <6>arp: 10.27.27.19 moved from 48:b4:23:e1:a5:4b to f0:2f:74:7e:13:d0 on igc1 <6>arp: 10.27.27.19 moved from f0:2f:74:7e:13:d0 to 48:b4:23:e1:a5:4b on igc1
Is that expected?
There is a Realtek NIC that's failing to attach. It looks like the same device failing 4 times. You should remove or disable that if you're not using it.
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xf000-0xf0ff mem 0xfc704000-0xfc704fff,0xfc700000-0xfc703fff at device 0.0 on pci4 re0: Using 1 MSI-X message re0: Chip rev. 0x54000000 re0: MAC rev. 0x00100000 re0: attaching PHYs failed device_attach: re0 attach returned 6
-
thanks for the replay
Is that the same backtrace every time?
Yes
Is that expected?
That looks like the NIC's IP, so maybe
The re0 is not in use, and the Realtek drives are disabled and deleted, but re0 is the motherboard's NIC, so I can't remove it, but I can disable it. Also since I removed the drivers, it's not even showing up as an interface for it to be enabled -
Is that IP actually a device that should be moving between MAC values though? Like a bonded link or some load balancer?
If it's not you might have an IP conflict.Yes, you should disable that re NIC. It's only using resources.
However neither should be causing that panic....
-
So the device is expected to have multiple MACs, and I will disable the NIC through the terminal, but you said neither should cause the panic, so any idea where I can look to see what could be a possible cause?
-
Not yet, still digging.
Is there anything in the main system log just before this happens?
You have anything 'exotic' configured using rules? Scheduled rules maybe or UPnP?
-
Do you have more reports we can review?
How often does this happen?
-
@stephenw10
Hi used to be every hour, but now that I disabled the Realtek NIC on the motherboard, it's less often Here are the new dump files and this time it seems like the panic didn't cause any outage, unlike before
info.0
textdump.tar.0 -
-
Mmm, OK that last one is a completely different panic. You should run a memory test to rule out some RAM glitch because that would certainly explain it.
Completely unrelated panics like that are almost always hardware.
-
@stephenw10
ok i sure will do thanks