Netgate 6100 Crash On Interface Change - Not Resolved (IPv6 + PPPoE)
-
@stephenw10 said in Netgate 6100 Crash On Interface Change:
Ok, we replicated it here. Digging now....
Fine work, fine work indeed.
-
A fix for this has now gone in upstream: https://redmine.pfsense.org/issues/14164
That's not something that can be patched at run time though.
Steve
-
@stephenw10 - Thanks Steve, from your comment I guess this will percolate down for a version update window at an unknown date?
️
-
Yes it should be in the next version, 23.05.
In fact it's in our repo now. It should be available for testing in todays snapshots:
https://github.com/pfsense/FreeBSD-src/commit/f5a365e51feea75d1e5ebc86c53808d8cae7b6d7Steve
-
@stephenw10
Thanks again and I am slightly embarrassed to find a bug so soon into my Netgate journey. I'll keep my head down for a bit!️
-
Don't be. If everyone reported bugs as soon as they found them with the details you did there would be far fewer to find!
Steve
-
Would this bug affect a 4100 also?
It appears that I stumbled across this issue the other day when I was messing with traffic shaping on an interface that had ipv6 enabled. The moment I clicked on 'save', the blue LED on the 4100 stopped blinking, UI became unresponsive and all clients lost internet access.
I had to cycle power on the 4100 to restore normal operation. -
@azdeltawye
Sounds very similar, to say the least. Hopefully the fix will solve all.️
-
Yeah, in fact I don't think it's system specific, or even NIC specific. But even if it is the 4100 is similar enough I'd expect it's possible to hit it there too.
Steve
-
@stephenw10 said in Netgate 6100 Crash On Interface Change:
Yes it should be in the next version, 23.05.
In fact it's in our repo now. It should be available for testing in todays snapshots:
https://github.com/pfsense/FreeBSD-src/commit/f5a365e51feea75d1e5ebc86c53808d8cae7b6d7Steve
Hi Steve,
Unfortunately the issue persists in 23.05. A change in interface state can still trigger a crash.I ran 7 simple and repeatable tests today - bringing the WAN interface down and back up again (this is enough to trigger the fault) via the
disconnect
button on Status/Interfaces.7 tests - 4 failures with hard crashes, 3 did not trigger a crash.
All 4 failures produced a full crash report, info dump and textdump.tar. All available on request.
The back-trace for the 4 crashes are as follows.
First:
Filename: /var/crash/info.0 Dump header from device: /dev/nvd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 231424 Blocksize: 512 Compression: none Dumptime: 2023-05-28 14:48:50 +0100 Hostname: Router-8.*******.me Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j Panic String: page fault Dump Parity: 4132315394 Bounds: 0 Dump Status: good Filename: /var/crash/textdump.tar.0 ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������610534������14434655702� 7122� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 93402 tid 103857 td 0xfffffe00cf7cac80 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cf8a0800 vpanic() at vpanic+0x183/frame 0xfffffe00cf8a0850 panic() at panic+0x43/frame 0xfffffe00cf8a08b0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cf8a0910 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cf8a0970 calltrap() at calltrap+0x8/frame 0xfffffe00cf8a0970 --- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00cf8a0a40, rbp = 0xfffffe00cf8a0a70 --- in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00cf8a0a70 tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00cf8a0c60 tcp_output() at tcp_output+0x14/frame 0xfffffe00cf8a0c80 tcp6_usr_connect() at tcp6_usr_connect+0x2f4/frame 0xfffffe00cf8a0d10 soconnectat() at soconnectat+0x9e/frame 0xfffffe00cf8a0d60 kern_connectat() at kern_connectat+0xc9/frame 0xfffffe00cf8a0dc0 sys_connect() at sys_connect+0x75/frame 0xfffffe00cf8a0e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00cf8a0f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cf8a0f30 --- syscall (98, FreeBSD ELF64, connect), rip = 0x800fddc8a, rsp = 0x7fffdf5f8c98, rbp = 0x7fffdf5f8cd0 --- db:1:pfs>
Second:
Filename: /var/crash/info.0 Dump header from device: /dev/nvd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 226304 Blocksize: 512 Compression: none Dumptime: 2023-05-28 14:51:49 +0100 Hostname: Router-8.*******.me Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j Panic String: page fault Dump Parity: 1095311618 Bounds: 0 Dump Status: good Filename: /var/crash/textdump.tar.0 ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������577521������14434656165� 7136� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 68614 tid 100330 td 0xfffffe00cf325720 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c7d955f0 vpanic() at vpanic+0x183/frame 0xfffffe00c7d95640 panic() at panic+0x43/frame 0xfffffe00c7d956a0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00c7d95700 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c7d95760 calltrap() at calltrap+0x8/frame 0xfffffe00c7d95760 --- trap 0xc, rip = 0xffffffff80f63aa4, rsp = 0xfffffe00c7d95830, rbp = 0xfffffe00c7d95a50 --- ip6_output() at ip6_output+0xb74/frame 0xfffffe00c7d95a50 udp6_send() at udp6_send+0x78e/frame 0xfffffe00c7d95c10 sosend_dgram() at sosend_dgram+0x357/frame 0xfffffe00c7d95c70 sousrsend() at sousrsend+0x5f/frame 0xfffffe00c7d95cd0 kern_sendit() at kern_sendit+0x132/frame 0xfffffe00c7d95d60 sendit() at sendit+0xb7/frame 0xfffffe00c7d95db0 sys_sendto() at sys_sendto+0x4d/frame 0xfffffe00c7d95e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00c7d95f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c7d95f30 --- syscall (133, FreeBSD ELF64, sendto), rip = 0x823f95f2a, rsp = 0x8202cea88, rbp = 0x8202cead0 --- db:1:pfs>
Third:
Crash report details: No PHP errors found. Filename: /var/crash/info.0 Dump header from device: /dev/nvd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 229888 Blocksize: 512 Compression: none Dumptime: 2023-05-28 15:11:48 +0100 Hostname: Router-8.*******.me Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j Panic String: page fault Dump Parity: 276046082 Bounds: 0 Dump Status: good Filename: /var/crash/textdump.tar.0 ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������605706������14434660444� 7126� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 3281 tid 100913 td 0xfffffe00cfe3e3a0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cfdc4800 vpanic() at vpanic+0x183/frame 0xfffffe00cfdc4850 panic() at panic+0x43/frame 0xfffffe00cfdc48b0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cfdc4910 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cfdc4970 calltrap() at calltrap+0x8/frame 0xfffffe00cfdc4970 --- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00cfdc4a40, rbp = 0xfffffe00cfdc4a70 --- in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00cfdc4a70 tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00cfdc4c60 tcp_output() at tcp_output+0x14/frame 0xfffffe00cfdc4c80 tcp6_usr_connect() at tcp6_usr_connect+0x2f4/frame 0xfffffe00cfdc4d10 soconnectat() at soconnectat+0x9e/frame 0xfffffe00cfdc4d60 kern_connectat() at kern_connectat+0xc9/frame 0xfffffe00cfdc4dc0 sys_connect() at sys_connect+0x75/frame 0xfffffe00cfdc4e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00cfdc4f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cfdc4f30 --- syscall (98, FreeBSD ELF64, connect), rip = 0x800fddc8a, rsp = 0x7fffdfbfbc98, rbp = 0x7fffdfbfbcd0 --- db:1:pfs>
Fourth:
Crash report details: No PHP errors found. Filename: /var/crash/info.0 Dump header from device: /dev/nvd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 230400 Blocksize: 512 Compression: none Dumptime: 2023-05-28 15:17:27 +0100 Hostname: Router-8.*******.me Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j Panic String: page fault Dump Parity: 1131880706 Bounds: 0 Dump Status: good Filename: /var/crash/textdump.tar.0 ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������607520������14434661167� 7125� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 2 tid 100041 td 0xfffffe0085264560 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00850ad910 vpanic() at vpanic+0x183/frame 0xfffffe00850ad960 panic() at panic+0x43/frame 0xfffffe00850ad9c0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00850ada20 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00850ada80 calltrap() at calltrap+0x8/frame 0xfffffe00850ada80 --- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00850adb50, rbp = 0xfffffe00850adb80 --- in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00850adb80 tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00850add70 tcp_timer_rexmt() at tcp_timer_rexmt+0x514/frame 0xfffffe00850addd0 tcp_timer_enter() at tcp_timer_enter+0x102/frame 0xfffffe00850ade10 softclock_call_cc() at softclock_call_cc+0x13c/frame 0xfffffe00850adec0 softclock_thread() at softclock_thread+0xe9/frame 0xfffffe00850adef0 fork_exit() at fork_exit+0x7d/frame 0xfffffe00850adf30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00850adf30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db:1:pfs>
The 'solved' status on Redline may need revising:
https://redmine.pfsense.org/issues/14164Sorry to be the bearer of this news. It is an awkward fault to have as even a small interrupt from my ISP can trigger the router to crash.
-
Urgh, that's disappointing. That looks like two slightly different crashes though. Do you see anything other than the two backtraces shown above?
I reopened it.
Steve
-
@stephenw10
There should be 4 back-traces shown above but I have full captures for all 4 events that I can send to you, if that would help.I hope you remember that it's a Bank Holiday Steve, so feel free to enjoy it instead.
Rob
-
-
We think we have found the cause here and have a test kernel to confirm it. It is only a test at this point though.
Are you able to test that? So far we have not been able to replicate this locally.
-
@stephenw10 said in Netgate 6100 Crash On Interface Change - Not Resolved (IPv6 + PPPoE):
Are you able to test that?
Yes of course. If you tell me how to load it, at what logging verbosity and which log extracts you would like to receive I will test it as soon as the network traffic allows.
Do you have an overview of the kernel changes?
️
-
The changes here are to the interface handling in kernel. So mostly to sys/net/if.c and sys/netinet6/in6.c
They are a test only at this point and would need streamlining before including upstream. But running this should prove if this is what you're hitting.
Test Kernel:
Removed after finding a new bug.Move /boot/kernel to a backup:
mv /boot/kernel /boot/kernel.old2
Upload the tgz to /boot and then extract it there:
tar -xzf kernel-amd64-inet6-panic.tgz
Then reboot into that. Confirm you're running that after booting:
[23.05-RELEASE][root@6100-3.stevew.lan]/root: uname -a FreeBSD 6100-3.stevew.lan 14.0-CURRENT FreeBSD 14.0-CURRENT inet6_backport-n256104-f5556386d38 pfsense-NODEBUG amd64
I've been running it here no problem but I couldn't hit the issue before hand so have a recovery solution in place!
If you can test that and it no-longer hits that issue though we can get a long-term solution upstreamed.
Steve
-
We found a bug in that test code and are working on something new.
I don't recommend running that yet.
-
@stephenw10 said in Netgate 6100 Crash On Interface Change - Not Resolved (IPv6 + PPPoE):
I don't recommend running that yet.
Ok, understood.
️
-
Hi Steve,
Any progress with this issue as I am close to removing the Netgate 6100 from production use?The recent changes do not appear to have changed anything of note, albeit the more random reboots no longer produce a crash log. There has not been much encouraging news on the associated redmine tracker either.
I'm not looking to dispose or return the 6100 when I change to a different vendor so it will be available for testing. I'm hoping to remain with Netgate but I am sure you appreciate that as a UK user the issue with PPPoE / IPv6 cannot be sustained.
️
-
Were you able to test a 23.09 snapshot?
Indeed I agree that here in the UK that's combo many, many people are running. Including me.