pfSense Crash "Fatal trap 12: page fault while in kernel mode"
-
Hi,
since i have 2.7.2 installed i had randomly reboots. After that i see a crash report. I have tested it now on two different hardware devices, both have the crashes.
Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0xb8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f44300 stack pointer = 0x28:0xfffffe008a9e8c80 frame pointer = 0x28:0xfffffe008a9e8d00 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 92224 (tailscaled) rdi: ffffffff82d62a40 rsi: 000000000000b8ef rdx: 0000000000000000 rcx: 0000000000000000 r8: fffff800055f5800 r9: 0000000000000000 rax: 0000000000000030 rbx: fffff8004da9ea80 rbp: fffffe008a9e8d00 r10: 0000000000000000 r11: fffffe006938b8c0 r12: fffff8004d6491a0 r13: 000000000000b8ef r14: 0000000000000001 r15: fffff800055f5800 trap number = 12 panic: page fault cpuid = 3 time = 1709129280 KDB: enter: panic
-
Both in tailscaled?
Do you have the backtrace(s) from the crash report?
-
@stephenw10 Dont know if it showes tailscaled in both. Have missed to save the logs.
I have from the latest crash the info.0 and textdump.tar.0, what do you need?
-
Please upload them here: https://nc.netgate.com/nextcloud/s/bCLGN7bCwC7Rszf
-
@stephenw10 Done :)
-
Backtrace:
db:0:kdb.enter.default> bt Tracing pid 92224 tid 103104 td 0xfffffe006938b3a0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe008a9e8960 vpanic() at vpanic+0x163/frame 0xfffffe008a9e8a90 panic() at panic+0x43/frame 0xfffffe008a9e8af0 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe008a9e8b50 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008a9e8bb0 calltrap() at calltrap+0x8/frame 0xfffffe008a9e8bb0 --- trap 0xc, rip = 0xffffffff80f44300, rsp = 0xfffffe008a9e8c80, rbp = 0xfffffe008a9e8d00 --- in6_pcbbind() at in6_pcbbind+0x440/frame 0xfffffe008a9e8d00 udp6_bind() at udp6_bind+0x13c/frame 0xfffffe008a9e8d60 sobind() at sobind+0x32/frame 0xfffffe008a9e8d80 kern_bindat() at kern_bindat+0x96/frame 0xfffffe008a9e8dc0 sys_bind() at sys_bind+0x9b/frame 0xfffffe008a9e8e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe008a9e8f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe008a9e8f30 --- syscall (104, FreeBSD ELF64, bind), rip = 0x482bff, rsp = 0x86cd09a50, rbp = 0x86cd09a50 ---
That does look very similar to a crash we saw recently where the Bind package was trying to bind to linklocal IPv6 addresses and for some reason failing.
I don't see any IPv6 in your logs. Do you have tailscale set to use any interface/address?
-
@stephenw10 No i dont have assigned the tailscale interface. And dont have IPv6 enabled on my system.
-
Hmm, maybe FRR then?
You shouldn't assign the tailscale interface so that's correct.
Does it crash often? Comparing that with another report would confirm it.
-
@stephenw10 I have the FRR Packages installed and using BGP.
On the other hardware the crash where daily, now on my test vm it crashes 2-3 times a week. -
Ok let's see if a second crash is also in tailscaled then.
-
I have now after a longer time my next crash. Its again with tailscale.
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xb8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f44300 stack pointer = 0x28:0xfffffe008b818c80 frame pointer = 0x28:0xfffffe008b818d00 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 39102 (tailscaled) rdi: ffffffff82d62a40 rsi: 000000000000c8cb rdx: 0000000000000000 rcx: 0000000000000000 r8: fffff8000745fb00 r9: 0000000000000000 rax: 0000000000000030 rbx: fffff80006b44540 rbp: fffffe008b818d00 r10: 0000000000000000 r11: fffffe0069391e20 r12: fffff800198e1360 r13: 000000000000c8cb r14: 0000000000000001 r15: fffff8000745fb00 trap number = 12 panic: page fault cpuid = 0 time = 1710249378 KDB: enter: panic
-
Do you have the bacltrace? Full crash report? You can upload it to the same NextCloud link above.
-
@stephenw10 Files are uploaded.
-
Backtrace:
db:0:kdb.enter.default> bt Tracing pid 39102 tid 110696 td 0xfffffe0069391900 kdb_enter() at kdb_enter+0x32/frame 0xfffffe008b818960 vpanic() at vpanic+0x163/frame 0xfffffe008b818a90 panic() at panic+0x43/frame 0xfffffe008b818af0 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe008b818b50 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008b818bb0 calltrap() at calltrap+0x8/frame 0xfffffe008b818bb0 --- trap 0xc, rip = 0xffffffff80f44300, rsp = 0xfffffe008b818c80, rbp = 0xfffffe008b818d00 --- in6_pcbbind() at in6_pcbbind+0x440/frame 0xfffffe008b818d00 udp6_bind() at udp6_bind+0x13c/frame 0xfffffe008b818d60 sobind() at sobind+0x32/frame 0xfffffe008b818d80 kern_bindat() at kern_bindat+0x96/frame 0xfffffe008b818dc0 sys_bind() at sys_bind+0x9b/frame 0xfffffe008b818e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe008b818f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe008b818f30 --- syscall (104, FreeBSD ELF64, bind), rip = 0x482bff, rsp = 0x871d42a50, rbp = 0x871d42a50 ---
The message buffer is spammed with ARP movement logs. If that is expected from something you should consider disabling those logs. It may be hiding other useful entries:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/logs-arp-moved.htmlThat backtrace looks identical though so that looks like a software bug at this point.
What interfaces do you have? That error is something trying to listen on IPv6 and hitting something unexpected.
-
I had another Crash before an hour, with tailscale too.
That are my interfaces, none of them has ipv6 configured.
PS: I have uploaded the logs again if you need them, but i think they are identical.
-
Yup same crash but at least here it happened soon enough the message buffer still has useful data in it.
So I can see that you're running virtualised with vtnet NICs, a bunch of VLANs, and a PPPoE interface.
One of those things probably has something unusual about the v6 linklocal address. Can you send me the output of:
ifcnbfig -vma
to the nc folder? -
@stephenw10
Done. -
Ok two things:
Your tailscale interface has a valid IPv6 address which is probably why the error is happening there.But more likely you somehow have a lagg interface that doesn't have any member interfaces. I'm not sure how you might have that. Did you have a lagg configured previously? Is there any lagg config left over?
-
The LAGG is from my pfsense box, i use the vm at the moment to check if i have a hardware problem.
Disable IPV6 on Tailscale is not possible, should i then enable ipv6 on the pfsense again?
-
Nope there should be no problem having IPv6 only on tailscale.
More likely it's trying to listen on all interfaces including lagg0 but lagg0 is invalid. Remove the lagg entirely.