pfSense Crash "Fatal trap 12: page fault while in kernel mode"
-
Hmm pretty much as before then. The only anomaly there is that one link in the lagg is not participating/active:
laggport: igc0 flags=8<COLLECTING> state=1f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,COLLECTING> [(8000,7C-2B-E1-13-62-5B,01E6,8000,0001), (FFFF,74-4D-28-07-F0-08,0007,00FF,0004)]
-
@stephenw10 I know that, i think the cable is broken. I have now set from SpeedShift to PowerD and since that no more crashes. Before the setting and the first crash in the morning i have a crash every hour.
-
Huh, well that's..... unexpected! There was some speculation that it could be a race condition between multiple processes accessing the same socket. Changing the CPU frequency could affect that.
-
Good morning,
i had another Crash in the morning now with powerd, then thats not the resolution for the crashes. I uploaded the logs bug i think the crash report is the same.
-
Yes identical crash.
What's connected to igc0? It flapping a lot:
<6>igc0: link state changed to DOWN <6>igc0: link state changed to UP <6>igc0: link state changed to DOWN <6>igc0: link state changed to UP <6>igc0: link state changed to DOWN <6>igc0: link state changed to UP <6>igc0: link state changed to DOWN <6>igc0: link state changed to UP
-
That's one of the LAGG ports. I have disabled the port for the moment.
-
Hmm, I can find no way of disabling IPv6 as a source address in tailscale.
One thing you could try is disabling IPv6 link-local addresses on the interface. Of course that breaks IPv6 if you need it. It also doesn't disable it on localhost so tailscale can still try to bind to that.
-
I get today another crash now again with tailscaled.
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xb8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f44300 stack pointer = 0x28:0xffffffff8377fc80 frame pointer = 0x28:0xffffffff8377fd00 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 90406 (tailscaled) rdi: ffffffff82d62a40 rsi: 00000000000040f9 rdx: 0000000000000000 rcx: 0000000000000000 r8: fffff80020114900 r9: 0000000000000000 rax: 0000000000000030 rbx: fffff80109d95700 rbp: ffffffff8377fd00 r10: 0000000000000000 r11: fffffe007abb98c0 r12: fffff8000b5e3a40 r13: 00000000000040f9 r14: 0000000000000001 r15: fffff80020114900 trap number = 12 panic: page fault cpuid = 0 time = 1712133936 KDB: enter: panic
-
Same backtrace?
Are you able to test disabling link-local IPv6 addresses?
-
Thats the backtrace. I dont not find where i can disable the link-local IPv6 address.
db:0:kdb.enter.default> bt Tracing pid 90406 tid 101352 td 0xfffffe007abb93a0 kdb_enter() at kdb_enter+0x32/frame 0xffffffff8377f960 vpanic() at vpanic+0x163/frame 0xffffffff8377fa90 panic() at panic+0x43/frame 0xffffffff8377faf0 trap_fatal() at trap_fatal+0x40c/frame 0xffffffff8377fb50 trap_pfault() at trap_pfault+0x4f/frame 0xffffffff8377fbb0 calltrap() at calltrap+0x8/frame 0xffffffff8377fbb0 --- trap 0xc, rip = 0xffffffff80f44300, rsp = 0xffffffff8377fc80, rbp = 0xffffffff8377fd00 --- in6_pcbbind() at in6_pcbbind+0x440/frame 0xffffffff8377fd00 udp6_bind() at udp6_bind+0x13c/frame 0xffffffff8377fd60 sobind() at sobind+0x32/frame 0xffffffff8377fd80 kern_bindat() at kern_bindat+0x96/frame 0xffffffff8377fdc0 sys_bind() at sys_bind+0x9b/frame 0xffffffff8377fe00 amd64_syscall() at amd64_syscall+0x109/frame 0xffffffff8377ff30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xffffffff8377ff30 --- syscall (104, FreeBSD ELF64, bind), rip = 0x482bff, rsp = 0x86cadaa50, rbp = 0x86cadaa50 ---
-
So, yes, it's identical.
You have to add a loader value so run:
echo net.inet6.ip6.auto_linklocal=0 >> /boot/loader.conf.local
Then reboot.
Then check the output from ifconfig again. You should find no link-local IPv6 addresses. Only the tailscale interface itself should have any IPv6 address.
-
Hey guys, I know this is quite an old topic, but I may have experienced the same crash due to a tailscaled process on my PfSense. Were there any resolutions to this issue? I can share a backtrace of the crash if that would help, but from a brief comparison of others here, it seems that it is the same issue.
-
Identical backtrace?
https://redmine.pfsense.org/issues/15503
Do you have something listening on IPv6 that doesn't have to?
-
@stephenw10 Hi, yeah, I would say so.
This was at the end of the dump file after I had to restart the pfsense router manually.
Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0xb8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f44300 stack pointer = 0x28:0xfffffe00c9ce5c80 frame pointer = 0x28:0xfffffe00c9ce5d00 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 32674 (tailscaled) rdi: ffffffff82d62a40 rsi: 0000000000008ac8 rdx: 0000000000000000 rcx: 0000000000000000 r8: fffff8024daa1900 r9: 0000000000000000 rax: 0000000000000030 rbx: fffff8018b861540 rbp: fffffe00c9ce5d00 r10: 0000000000000000 r11: fffffe00c6b30520 r12: fffff8012a9e54c0 r13: 0000000000008ac8 r14: 0000000000000001 r15: fffff8024daa1900 trap number = 12 panic: page fault cpuid = 2 time = 1730973890 KDB: enter: panic
and backtrace here:
Tracing pid 32674 tid 876607 td 0xfffffe00c6b30000 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c9ce5960 vpanic() at vpanic+0x163/frame 0xfffffe00c9ce5a90 panic() at panic+0x43/frame 0xfffffe00c9ce5af0 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00c9ce5b50 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c9ce5bb0 calltrap() at calltrap+0x8/frame 0xfffffe00c9ce5bb0 --- trap 0xc, rip = 0xffffffff80f44300, rsp = 0xfffffe00c9ce5c80, rbp = 0xfffffe00c9ce5d00 --- in6_pcbbind() at in6_pcbbind+0x440/frame 0xfffffe00c9ce5d00 udp6_bind() at udp6_bind+0x13c/frame 0xfffffe00c9ce5d60 sobind() at sobind+0x32/frame 0xfffffe00c9ce5d80 kern_bindat() at kern_bindat+0x96/frame 0xfffffe00c9ce5dc0 sys_bind() at sys_bind+0x9b/frame 0xfffffe00c9ce5e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00c9ce5f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c9ce5f30 --- syscall (104, FreeBSD ELF64, bind), rip = 0x482bff, rsp = 0x86dac5a50, rbp = 0x86dac5a50 ---
I don't believe we have anything listening on IPv6 that doesn't have to. We have IPv6 enabled on the WAN interface and also on the VLAN that office users are connected to. And to clarify, by 'listening on IPV6' do you mean some interfaces in the pfsense have IPV6 enabled or some other service connected to pfsense is listening on IPV6?
-
I mean some service that's listening on IPv6 addresses when it doesn't need to.
So, for example, when we saw this before there were services set to simply listen on 'all' interfaces and all IP addresses, specifically Bind. And that included IPv6 link-local and localhost addresses where it would never actually see any connections.
However if that's tailscale I don't think there are any interface binding options. Yet. -
@stephenw10 I don't think so. While I do have Bind running, it is not set to listen on all interfaces and is in IPV4 mode only. I'm not aware of anything else that might fit the description. Isn't tailscale listening on all interfaces by default?
-
@dovh said in pfSense Crash "Fatal trap 12: page fault while in kernel mode":
Isn't tailscale listening on all interfaces by default?
Yes it is. And that's a problem because I don't believe there's any way to limit what it listens on.
-
How often are you seeing this? Can you replicate it on demand?
We don't yet have a way to replicate it locally which makes debugging difficult.
-
Are you actually using IPv6? Otherwise you can try disabling IPv6 link-local addresses as I outlined above.
-
@stephenw10 I have had it happen once now. I have a hunch that it happened when a new user joined our Tailnet from a local LAN network behind pfsense and tried accessing some route advertised by the Tailscale package on pfsense. I will try replicating it, but I think it's rather random as it only happened once, and it has been up and running for some months before.