Panic String: page fault after migrating to a baremetal install

KingCobra021

I had my pfSense installation on a Proxmox VM, and I recently moved it to a bare-metal installation on the same hardware, importing the XML configs from the VM. Lately, I have been getting page fault errors that for the life of me cannot figure out what is causing them. I was using a Realtek 2.5 G NIC that didn't have a native driver, so I had to download one, which I have since removed the driver and replaced it with an Intel 2.5 G NIC with native driver support, but I am still getting the page faults.

info.0
textdump.tar.0

stephenw10

Backtrace:

db:1:pfs> bt
Tracing pid 7 tid 100234 td 0xfffff80102ee0000
kdb_enter() at kdb_enter+0x33/frame 0xfffffe0103ee2c40
panic() at panic+0x43/frame 0xfffffe0103ee2ca0
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe0103ee2d00
trap_pfault() at trap_pfault+0x46/frame 0xfffffe0103ee2d50
calltrap() at calltrap+0x8/frame 0xfffffe0103ee2d50
--- trap 0xc, rip = 0xffffffff80fcc88b, rsp = 0xfffffe0103ee2e20, rbp = 0xfffffe0103ee2e40 ---
pf_state_expires() at pf_state_expires+0xb/frame 0xfffffe0103ee2e40
pf_purge_expired_states() at pf_purge_expired_states+0xd8/frame 0xfffffe0103ee2e90
pf_purge_thread() at pf_purge_thread+0x15b/frame 0xfffffe0103ee2ef0
fork_exit() at fork_exit+0x7b/frame 0xfffffe0103ee2f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0103ee2f30
--- trap 0x5668643d, rip = 0xde2f6d95d26f6d91, rsp = 0x86e709278aa70923, rbp = 0x3ad825083698250c ---

Panic:

Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 0b
fault virtual address	= 0x100000012
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80fcc88b
stack pointer	        = 0x28:0xfffffe0103ee2e20
frame pointer	        = 0x28:0xfffffe0103ee2e40
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 7 (pf purge)
rdi: 0000000100000000 rsi: 000000000000000c rdx: 0000000000000580
rcx: 0000000000383b50  r8: 000000000000b000  r9: 0000000000000fff
rax: fffffe01145a0a80 rbx: 00000000000b3f10 rbp: fffffe0103ee2e40
r10: 0000000000001388 r11: 00000000815eda0a r12: 0000000000000000
r13: fffff80102ee0000 r14: 0000000100000000 r15: fffffe01145a0aa0
trap number		= 12
panic: page fault
cpuid = 11
time = 1751555338
KDB: enter: panic

Is that the same backtrace every time?

It looks like this which was only seen one time: https://redmine.pfsense.org/issues/13417

The message buffer is full of theses ARP movements:

<6>arp: 10.27.27.19 moved from 48:b4:23:e1:a5:4b to f0:2f:74:7e:13:d0 on igc1
<6>arp: 10.27.27.19 moved from f0:2f:74:7e:13:d0 to 48:b4:23:e1:a5:4b on igc1
<6>arp: 10.27.27.19 moved from 48:b4:23:e1:a5:4b to f0:2f:74:7e:13:d0 on igc1
<6>arp: 10.27.27.19 moved from f0:2f:74:7e:13:d0 to 48:b4:23:e1:a5:4b on igc1

Is that expected?

There is a Realtek NIC that's failing to attach. It looks like the same device failing 4 times. You should remove or disable that if you're not using it.

re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xf000-0xf0ff mem 0xfc704000-0xfc704fff,0xfc700000-0xfc703fff at device 0.0 on pci4
re0: Using 1 MSI-X message
re0: Chip rev. 0x54000000
re0: MAC rev. 0x00100000
re0: attaching PHYs failed
device_attach: re0 attach returned 6

KingCobra021

thanks for the replay

Is that the same backtrace every time?
Yes
Is that expected?
That looks like the NIC's IP, so maybe
The re0 is not in use, and the Realtek drives are disabled and deleted, but re0 is the motherboard's NIC, so I can't remove it, but I can disable it. Also since I removed the drivers, it's not even showing up as an interface for it to be enabled

stephenw10

Is that IP actually a device that should be moving between MAC values though? Like a bonded link or some load balancer?
If it's not you might have an IP conflict.

Yes, you should disable that re NIC. It's only using resources.

However neither should be causing that panic....

KingCobra021

@stephenw10

So the device is expected to have multiple MACs, and I will disable the NIC through the terminal, but you said neither should cause the panic, so any idea where I can look to see what could be a possible cause?

stephenw10

Not yet, still digging.

Is there anything in the main system log just before this happens?

You have anything 'exotic' configured using rules? Scheduled rules maybe or UPnP?

stephenw10

Do you have more reports we can review?

How often does this happen?

KingCobra021

@stephenw10
Hi used to be every hour, but now that I disabled the Realtek NIC on the motherboard, it's less often Here are the new dump files and this time it seems like the panic didn't cause any outage, unlike before
info.0
textdump.tar.0

KingCobra021

@KingCobra021
info.0 textdump.tar.0

stephenw10

Mmm, OK that last one is a completely different panic. You should run a memory test to rule out some RAM glitch because that would certainly explain it.

Completely unrelated panics like that are almost always hardware.

KingCobra021

@stephenw10
ok i sure will do thanks