pfSense Crash "Fatal trap 12: page fault while in kernel mode"

DrAg0n141

@stephenw10 I have the FRR Packages installed and using BGP.
On the other hardware the crash where daily, now on my test vm it crashes 2-3 times a week.

stephenw10

Ok let's see if a second crash is also in tailscaled then.

DrAg0n141

I have now after a longer time my next crash. Its again with tailscale.

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0xb8
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f44300
stack pointer	        = 0x28:0xfffffe008b818c80
frame pointer	        = 0x28:0xfffffe008b818d00
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 39102 (tailscaled)
rdi: ffffffff82d62a40 rsi: 000000000000c8cb rdx: 0000000000000000
rcx: 0000000000000000  r8: fffff8000745fb00  r9: 0000000000000000
rax: 0000000000000030 rbx: fffff80006b44540 rbp: fffffe008b818d00
r10: 0000000000000000 r11: fffffe0069391e20 r12: fffff800198e1360
r13: 000000000000c8cb r14: 0000000000000001 r15: fffff8000745fb00
trap number		= 12
panic: page fault
cpuid = 0
time = 1710249378
KDB: enter: panic

stephenw10

Do you have the bacltrace? Full crash report? You can upload it to the same NextCloud link above.

DrAg0n141

@stephenw10 Files are uploaded.

stephenw10

Backtrace:

db:0:kdb.enter.default>  bt
Tracing pid 39102 tid 110696 td 0xfffffe0069391900
kdb_enter() at kdb_enter+0x32/frame 0xfffffe008b818960
vpanic() at vpanic+0x163/frame 0xfffffe008b818a90
panic() at panic+0x43/frame 0xfffffe008b818af0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe008b818b50
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008b818bb0
calltrap() at calltrap+0x8/frame 0xfffffe008b818bb0
--- trap 0xc, rip = 0xffffffff80f44300, rsp = 0xfffffe008b818c80, rbp = 0xfffffe008b818d00 ---
in6_pcbbind() at in6_pcbbind+0x440/frame 0xfffffe008b818d00
udp6_bind() at udp6_bind+0x13c/frame 0xfffffe008b818d60
sobind() at sobind+0x32/frame 0xfffffe008b818d80
kern_bindat() at kern_bindat+0x96/frame 0xfffffe008b818dc0
sys_bind() at sys_bind+0x9b/frame 0xfffffe008b818e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe008b818f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe008b818f30
--- syscall (104, FreeBSD ELF64, bind), rip = 0x482bff, rsp = 0x871d42a50, rbp = 0x871d42a50 ---

The message buffer is spammed with ARP movement logs. If that is expected from something you should consider disabling those logs. It may be hiding other useful entries:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/logs-arp-moved.html

That backtrace looks identical though so that looks like a software bug at this point.

What interfaces do you have? That error is something trying to listen on IPv6 and hitting something unexpected.

DrAg0n141

I had another Crash before an hour, with tailscale too.

That are my interfaces, none of them has ipv6 configured.

7952b964-6263-4c62-b005-e2f1c5fd5f8f-12.03.2024-370.png

PS: I have uploaded the logs again if you need them, but i think they are identical.

stephenw10

Yup same crash but at least here it happened soon enough the message buffer still has useful data in it.

So I can see that you're running virtualised with vtnet NICs, a bunch of VLANs, and a PPPoE interface.

One of those things probably has something unusual about the v6 linklocal address. Can you send me the output of: ifcnbfig -vma to the nc folder?

DrAg0n141

@stephenw10
Done.

stephenw10

Ok two things:
Your tailscale interface has a valid IPv6 address which is probably why the error is happening there.

But more likely you somehow have a lagg interface that doesn't have any member interfaces. I'm not sure how you might have that. Did you have a lagg configured previously? Is there any lagg config left over?

DrAg0n141

@stephenw10

The LAGG is from my pfsense box, i use the vm at the moment to check if i have a hardware problem.

Disable IPV6 on Tailscale is not possible, should i then enable ipv6 on the pfsense again?

stephenw10

Nope there should be no problem having IPv6 only on tailscale.

More likely it's trying to listen on all interfaces including lagg0 but lagg0 is invalid. Remove the lagg entirely.

DrAg0n141

@stephenw10

Ok LAGG Interface is removed, then i am waiting and check i have another crash.

DrAg0n141

Hi,

i changed back to my primary hardware last week. Now i got my first crash.


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address	= 0xb8
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f44300
stack pointer	        = 0x28:0xffffffff83796c80
frame pointer	        = 0x28:0xffffffff83796d00
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 6121 (tailscaled)
rdi: ffffffff82d62a40 rsi: 0000000000005ce5 rdx: 0000000000000000
rcx: 0000000000000000  r8: fffff8001dd2f700  r9: 0000000000000000
rax: 0000000000000030 rbx: fffff8001da28380 rbp: ffffffff83796d00
r10: 0000000000000000 r11: fffffe006b33a8c0 r12: fffff80123e9bb80
r13: 0000000000005ce5 r14: 0000000000000001 r15: fffff8001dd2f700
trap number		= 12
panic: page fault
cpuid = 0
time = 1711432782
KDB: enter: panic

PS: Have uploaded the dump.

stephenw10

Pretty much identical backtrace:

db:0:kdb.enter.default>  bt
Tracing pid 6121 tid 101274 td 0xfffffe006b33a3a0
kdb_enter() at kdb_enter+0x32/frame 0xffffffff83796960
vpanic() at vpanic+0x163/frame 0xffffffff83796a90
panic() at panic+0x43/frame 0xffffffff83796af0
trap_fatal() at trap_fatal+0x40c/frame 0xffffffff83796b50
trap_pfault() at trap_pfault+0x4f/frame 0xffffffff83796bb0
calltrap() at calltrap+0x8/frame 0xffffffff83796bb0
--- trap 0xc, rip = 0xffffffff80f44300, rsp = 0xffffffff83796c80, rbp = 0xffffffff83796d00 ---
in6_pcbbind() at in6_pcbbind+0x440/frame 0xffffffff83796d00
udp6_bind() at udp6_bind+0x13c/frame 0xffffffff83796d60
sobind() at sobind+0x32/frame 0xffffffff83796d80
kern_bindat() at kern_bindat+0x96/frame 0xffffffff83796dc0
sys_bind() at sys_bind+0x9b/frame 0xffffffff83796e00
amd64_syscall() at amd64_syscall+0x109/frame 0xffffffff83796f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xffffffff83796f30
--- syscall (104, FreeBSD ELF64, bind), rip = 0x482bff, rsp = 0x87058fa50, rbp = 0x87058fa50 ---

Message buffer is still spammed by arp movement logs hiding anything that might be useful. You should really think about just disabling that logging if those MACs are known:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/logs-arp-moved.html

Can you upload the ifconfig output from that hardware?

DrAg0n141

@stephenw10 Files is uploaded.
I dont understand exactaly where i can disable the settings for that and why i have that messages.

EDIT: Got it dont read the last line of the URL.

stephenw10

Hmm pretty much as before then. The only anomaly there is that one link in the lagg is not participating/active:

        laggport: igc0 flags=8<COLLECTING> state=1f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,COLLECTING>
                [(8000,7C-2B-E1-13-62-5B,01E6,8000,0001),
                 (FFFF,74-4D-28-07-F0-08,0007,00FF,0004)]

DrAg0n141

@stephenw10 I know that, i think the cable is broken. I have now set from SpeedShift to PowerD and since that no more crashes. Before the setting and the first crash in the morning i have a crash every hour.

stephenw10

Huh, well that's..... unexpected! There was some speculation that it could be a race condition between multiple processes accessing the same socket. Changing the CPU frequency could affect that.

DrAg0n141

Good morning,

i had another Crash in the morning now with powerd, then thats not the resolution for the crashes. I uploaded the logs bug i think the crash report is the same.