pfSense Crash "Fatal trap 12: page fault while in kernel mode"

DrAg0n141

@stephenw10 Dont know if it showes tailscaled in both. Have missed to save the logs.

I have from the latest crash the info.0 and textdump.tar.0, what do you need?

stephenw10

Please upload them here: https://nc.netgate.com/nextcloud/s/bCLGN7bCwC7Rszf

DrAg0n141

@stephenw10 Done :)

stephenw10

Backtrace:

db:0:kdb.enter.default>  bt
Tracing pid 92224 tid 103104 td 0xfffffe006938b3a0
kdb_enter() at kdb_enter+0x32/frame 0xfffffe008a9e8960
vpanic() at vpanic+0x163/frame 0xfffffe008a9e8a90
panic() at panic+0x43/frame 0xfffffe008a9e8af0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe008a9e8b50
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008a9e8bb0
calltrap() at calltrap+0x8/frame 0xfffffe008a9e8bb0
--- trap 0xc, rip = 0xffffffff80f44300, rsp = 0xfffffe008a9e8c80, rbp = 0xfffffe008a9e8d00 ---
in6_pcbbind() at in6_pcbbind+0x440/frame 0xfffffe008a9e8d00
udp6_bind() at udp6_bind+0x13c/frame 0xfffffe008a9e8d60
sobind() at sobind+0x32/frame 0xfffffe008a9e8d80
kern_bindat() at kern_bindat+0x96/frame 0xfffffe008a9e8dc0
sys_bind() at sys_bind+0x9b/frame 0xfffffe008a9e8e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe008a9e8f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe008a9e8f30
--- syscall (104, FreeBSD ELF64, bind), rip = 0x482bff, rsp = 0x86cd09a50, rbp = 0x86cd09a50 ---

That does look very similar to a crash we saw recently where the Bind package was trying to bind to linklocal IPv6 addresses and for some reason failing.

I don't see any IPv6 in your logs. Do you have tailscale set to use any interface/address?

DrAg0n141

@stephenw10 No i dont have assigned the tailscale interface. And dont have IPv6 enabled on my system.

stephenw10

Hmm, maybe FRR then?

You shouldn't assign the tailscale interface so that's correct.

Does it crash often? Comparing that with another report would confirm it.

DrAg0n141

@stephenw10 I have the FRR Packages installed and using BGP.
On the other hardware the crash where daily, now on my test vm it crashes 2-3 times a week.

stephenw10

Ok let's see if a second crash is also in tailscaled then.

DrAg0n141

I have now after a longer time my next crash. Its again with tailscale.

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0xb8
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f44300
stack pointer	        = 0x28:0xfffffe008b818c80
frame pointer	        = 0x28:0xfffffe008b818d00
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 39102 (tailscaled)
rdi: ffffffff82d62a40 rsi: 000000000000c8cb rdx: 0000000000000000
rcx: 0000000000000000  r8: fffff8000745fb00  r9: 0000000000000000
rax: 0000000000000030 rbx: fffff80006b44540 rbp: fffffe008b818d00
r10: 0000000000000000 r11: fffffe0069391e20 r12: fffff800198e1360
r13: 000000000000c8cb r14: 0000000000000001 r15: fffff8000745fb00
trap number		= 12
panic: page fault
cpuid = 0
time = 1710249378
KDB: enter: panic

stephenw10

Do you have the bacltrace? Full crash report? You can upload it to the same NextCloud link above.

DrAg0n141

@stephenw10 Files are uploaded.

stephenw10

Backtrace:

db:0:kdb.enter.default>  bt
Tracing pid 39102 tid 110696 td 0xfffffe0069391900
kdb_enter() at kdb_enter+0x32/frame 0xfffffe008b818960
vpanic() at vpanic+0x163/frame 0xfffffe008b818a90
panic() at panic+0x43/frame 0xfffffe008b818af0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe008b818b50
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008b818bb0
calltrap() at calltrap+0x8/frame 0xfffffe008b818bb0
--- trap 0xc, rip = 0xffffffff80f44300, rsp = 0xfffffe008b818c80, rbp = 0xfffffe008b818d00 ---
in6_pcbbind() at in6_pcbbind+0x440/frame 0xfffffe008b818d00
udp6_bind() at udp6_bind+0x13c/frame 0xfffffe008b818d60
sobind() at sobind+0x32/frame 0xfffffe008b818d80
kern_bindat() at kern_bindat+0x96/frame 0xfffffe008b818dc0
sys_bind() at sys_bind+0x9b/frame 0xfffffe008b818e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe008b818f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe008b818f30
--- syscall (104, FreeBSD ELF64, bind), rip = 0x482bff, rsp = 0x871d42a50, rbp = 0x871d42a50 ---

The message buffer is spammed with ARP movement logs. If that is expected from something you should consider disabling those logs. It may be hiding other useful entries:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/logs-arp-moved.html

That backtrace looks identical though so that looks like a software bug at this point.

What interfaces do you have? That error is something trying to listen on IPv6 and hitting something unexpected.

DrAg0n141

I had another Crash before an hour, with tailscale too.

That are my interfaces, none of them has ipv6 configured.

7952b964-6263-4c62-b005-e2f1c5fd5f8f-12.03.2024-370.png

PS: I have uploaded the logs again if you need them, but i think they are identical.

stephenw10

Yup same crash but at least here it happened soon enough the message buffer still has useful data in it.

So I can see that you're running virtualised with vtnet NICs, a bunch of VLANs, and a PPPoE interface.

One of those things probably has something unusual about the v6 linklocal address. Can you send me the output of: ifcnbfig -vma to the nc folder?

DrAg0n141

@stephenw10
Done.

stephenw10

Ok two things:
Your tailscale interface has a valid IPv6 address which is probably why the error is happening there.

But more likely you somehow have a lagg interface that doesn't have any member interfaces. I'm not sure how you might have that. Did you have a lagg configured previously? Is there any lagg config left over?

DrAg0n141

@stephenw10

The LAGG is from my pfsense box, i use the vm at the moment to check if i have a hardware problem.

Disable IPV6 on Tailscale is not possible, should i then enable ipv6 on the pfsense again?

stephenw10

Nope there should be no problem having IPv6 only on tailscale.

More likely it's trying to listen on all interfaces including lagg0 but lagg0 is invalid. Remove the lagg entirely.

DrAg0n141

@stephenw10

Ok LAGG Interface is removed, then i am waiting and check i have another crash.

DrAg0n141

Hi,

i changed back to my primary hardware last week. Now i got my first crash.


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address	= 0xb8
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80f44300
stack pointer	        = 0x28:0xffffffff83796c80
frame pointer	        = 0x28:0xffffffff83796d00
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 6121 (tailscaled)
rdi: ffffffff82d62a40 rsi: 0000000000005ce5 rdx: 0000000000000000
rcx: 0000000000000000  r8: fffff8001dd2f700  r9: 0000000000000000
rax: 0000000000000030 rbx: fffff8001da28380 rbp: ffffffff83796d00
r10: 0000000000000000 r11: fffffe006b33a8c0 r12: fffff80123e9bb80
r13: 0000000000005ce5 r14: 0000000000000001 r15: fffff8001dd2f700
trap number		= 12
panic: page fault
cpuid = 0
time = 1711432782
KDB: enter: panic

PS: Have uploaded the dump.