Periodic Panic on CE 2.8.0 - DHCP6 Client (I Think)
-
Hi All
First time posting in the Netgate forums so LMK if I've done this in the wrong place. I've been encountering an issue for as long as I can remember where my PFSense firewall, running on a Lenovo M46KT27A (somewhat overkill sure) that I've installed a 2 SFP port Intel X520 with the following things plugged in:
E.C.I. NETWORKS PN: ENXGSFPPOMACV2 - SFP/SFP+/SFP28 10G Base-LR (SC)
Ubiquiti Inc. PN: DAC-SFP10-0.5M SN: BA22093023861 DATE: 2022-09-26 - SFP/SFP+/SFP28 1X Copper Passive (No separable connector)The starting point seems to be the following which seems to be DHCP6 related. Beyond that I'm not familiar with debugging these things. It's been happening reasonably regularly (at least once per week) for as long as I can remember and I've only now decided to dig into it.
db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 52781 tid 100414 td 0xfffff800126df740 kdb_enter() at kdb_enter+0x33/frame 0xfffffe00d3de67f0 panic() at panic+0x43/frame 0xfffffe00d3de6850 trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00d3de68b0 trap_pfault() at trap_pfault+0x46/frame 0xfffffe00d3de6900 calltrap() at calltrap+0x8/frame 0xfffffe00d3de6900 --- trap 0xc, rip = 0xffffffff80f5b213, rsp = 0xfffffe00d3de69d0, rbp = 0xfffffe00d3de6a20 --- in6_unlink_ifa() at in6_unlink_ifa+0x53/frame 0xfffffe00d3de6a20 in6_purgeaddr() at in6_purgeaddr+0x366/frame 0xfffffe00d3de6b40 in6_purgeifaddr() at in6_purgeifaddr+0x13/frame 0xfffffe00d3de6b60 in6_control_ioctl() at in6_control_ioctl+0x5e1/frame 0xfffffe00d3de6bd0 ifioctl() at ifioctl+0x8b0/frame 0xfffffe00d3de6cd0 kern_ioctl() at kern_ioctl+0x255/frame 0xfffffe00d3de6d40 sys_ioctl() at sys_ioctl+0x117/frame 0xfffffe00d3de6e00 amd64_syscall() at amd64_syscall+0x115/frame 0xfffffe00d3de6f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00d3de6f30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x822a2bcca, rsp = 0x820280e58, rbp = 0x820280f50 --- db:1:pfs> show registers cs 0x20 ds 0x3b es 0x3b fs 0x13 gs 0x1b ss 0x28 rax 0x12 rcx 0xb671471b9956e201 rdx 0xfffffe00d3de6310 rbx 0x100 rsp 0xfffffe00d3de66c8 rbp 0xfffffe00d3de67f0 rsi 0xfffffe00d3de6580 rdi 0xffffffff82740878 vt_conswindow+0x10 r8 0x3c r9 0x3c r10 0 r11 0 r12 0 r13 0 r14 0xffffffff8145d99f r15 0xfffff800126df740 rip 0xffffffff80d457b3 kdb_enter+0x33 rflags 0x82 kdb_enter+0x33: movq $0,0x1d76cd2(%rip) db:1:pfs> show pcpu cpuid = 6 dynamic pcpu = 0xfffffe009b4325c0 curthread = 0xfffff800126df740: pid 52781 tid 100414 critnest 1 "dhcp6c" curpcb = 0xfffff800126dfc60 fpcurthread = 0xfffff800126df740: pid 52781 "dhcp6c" idlethread = 0xfffff800027e5740: tid 100009 "idle: cpu6" self = 0xffffffff83a16000 curpmap = 0xfffff800126f0358 tssp = 0xffffffff83a16384 rsp0 = 0xfffffe00d3de7000 kcr3 = 0xffffffffffffffff ucr3 = 0xffffffffffffffff scr3 = 0x0 gs32p = 0xffffffff83a16404 ldt = 0xffffffff83a16444 tss = 0xffffffff83a16434 curvnet = 0xfffff80001288840 db:1:pfs> run lockinfo db:2:lockinfo> show locks No such command; use "help" to list available commands db:2:lockinfo> show alllocks No such command; use "help" to list available commands db:2:lockinfo> show lockedvnods Locked vnodes -
So as a follow on, I have noticed that the gateway monitors are tripping fairly regularly on my AT&T Fiber IPv6 which is probably what is causing the DHCPv6 client to jump into action which occasionally leads to this situation. I've found similar issues from older releases where there was a race between interface reconfiguration and disablement.
I've disabled the IPv6 monitor from taking action (but still logging) so will see if that eliminates the panics. But the fact that it can happen is still concerning.
-
@davefinster said in Periodic Panic on CE 2.8.0 - DHCP6 Client (I Think):
in6_unlink_ifa
Hmm, that looks like this: https://redmine.pfsense.org/issues/14164 But that should be resolved in 2.8.0.
In both crashes crashes the log is spammed by something trying to use a linklocal IPv6 address for public routing which is not allowed.
I would guess it's an issue with the tailscale interface though since that's the only other thing showing much activity. That has been shown to cause the related bug: https://redmine.pfsense.org/issues/14431
I was never able to replicate that locally but it could be a timing issue that only a fast WAN connection hits. I see you're using ixl NICs, what speed is your WAN that tailscale is using?
-
I see you're using ixl NICs, what speed is your WAN that tailscale is using?
I've got 5Gbps/5Gbps through AT&T Fiber using a WAS-110 in one of the SFP ports as the GPON endpoint. This SFP does all the network/GPON specific bits such that PFSense just performs DHCP(v6) over the interface. That is my WAN side and then on the LAN side it's just a 10Gbps Twinax into an aggregation switch.
To at least prevent the issue from happening, I've been doing a bit more study on the prefix delegation expectations of the AT&T service and I've arrived at a point where I've set the DHCPv6 client on the WAN interface to only ask for a prefix delegation and not for an address for itself. When it asked for such an address the /128 provided by AT&T is non-routable anyway. This also seemed to cause significant instability in IPv6 networking where the gateway pinging and v6 routing in general would periodically break which opened up an opportunity for this race presumably. By not requesting the /128 the gateway pinger for v6 is purely using its link-local address.
The end result is that the WAN interface only ends up with its link-local address and everything IPv6 related that originates from the router (e.g. Tailscale) is now using the routers IP from the PD'd IPv6 range on the LAN interface which unlike the /128 that AT&T provides is routable. Since making these changes I've not had any issues for 2 days.
-
Ah, interesting. Yup AT&T expect to see their own router at the end of GPON/XPON and pfSense could well be doing something that doesn't play well. Obviously it still shouldn't panic like that.
The panic appears to be caused by a race condition during removal of an IPv6 address. If the WAN was renewing a lease repeatedly that seems likely.