Netgate 6100 Crash On Interface Change - Not Resolved (IPv6 + PPPoE)
-
Netgate 6100 Crash On Interface Change
I've split this out from the previous topic as it proved to be unrelated. ONT Interface Negotiation
@stephenw10 said in Netgate 6100 and ONT Link Negotiation:
We'd need to see the actual crash report not just the info file to know more there. It should be linked in the same place. Obviously it shouldn't happen though.
Steve
The crash today was precipitated by an unplanned interface state change (prime network switch had a firmware update) but the effect was the same.
I did capture fully-populated crash reports. Hopefully I have attached them to this post (!).
-
ok, adding a file didn't work
-
same here…
From the report I noticed an error on my FQ_CodDel settings; I have addressed that now and is probably not relevant to the crash but mentioned for completeness.
Crash reports, now as spoilers:
Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 0c fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80fd9293 stack pointer = 0x28:0xfffffe00cd68c9d0 frame pointer = 0x28:0xfffffe00cd68ca20 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 4585 (ifconfig) rdi: fffff80148fd0180 rsi: fffff80001bca800 rdx: 0 rcx: 0 r8: fffffe00cd4ba700 r9: fffffe00cd68d000 rax: 0 rbx: fffff80148fd0180 rbp: fffffe00cd68ca20 r10: 20 r11: 10 r12: fffff80148fd0180 r13: ffffffffbffffff8 r14: fffffe00cd4ba1e0 r15: fffff80001bca800 trap number = 12 panic: page fault cpuid = 1 time = 1679565575 KDB: enter: panic panic.txt0600001214407021407 7130 ustarrootwheelpage faultversion.txt06000045714407021407 7620 ustarrootwheelFreeBSD 14.0-CURRENT #0 plus-RELENG_23_01-n256037-6e914874a5e: Fri Feb 10 20:30:29 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/obj/amd64/VDZvZksF/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/sources/FreeBSD-src-plus-RELENG_23_01/amd64.amd64/sys/pfSense
Crash report begins. Anonymous machine information: amd64 14.0-CURRENT FreeBSD 14.0-CURRENT #0 plus-RELENG_23_01-n256037-6e914874a5e: Fri Feb 10 20:30:29 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/obj/amd64/VDZvZkdead:feed:deadsF/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/sources/FreeBS Crash report details: No PHP errors found. Filename: /var/crash/info.0 Dump header from device: /dev/nvd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 254976 Blocksize: 512 Compression: none Dumptime: 2023-03-23 09:59:35 +0000 Hostname: Router-8.redacted.me Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #0 plus-RELENG_23_01-n256037-6e914874a5e: Fri Feb 10 20:30:29 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/obj/amd64/VDZvZksF/var/j Panic String: page fault Dump Parity: 594112824 Bounds: 0 Dump Status: good Filename: /var/crash/textdump.tar.0 ddb.txt06000064243314407021407 7112 ustarrootwheeldb:0:kdb.enter.default> run pfs db:1:pfs> bt [snip] <6>ix1: link state changed to UP <6>ix1.1003: link state changed to UP <6>ix1: link state changed to DOWN <6>ix1.1003: link state changed to DOWN <6>ix1: link state changed to UP <6>ix1.1003: link state changed to UP <3>nd6_dad_timer: called with non-tentative address fe80:d::1:1(ix1.1003) <6>ix1: link state changed to DOWN <6>ix1.1003: link state changed to DOWN <6>ix1: link state changed to UP <6>ix1.1003: link state changed to UP <6>ix1: link state changed to DOWN <6>ix1.1003: link state changed to DOWN config_aqm Unable to configure flowset, flowset busy! config_aqm Unable to configure flowset, flowset busy! <6>ix1: link state changed to UP <6>ix1.1003: link state changed to UP <6>ix1: link state changed to DOWN <6>ix1.1003: link state changed to DOWN config_aqm Unable to configure flowset, flowset busy! config_aqm Unable to configure flowset, flowset busy! <6>ix1: link state changed to UP <6>ix1.1003: link state changed to UP <6>ix1: link state changed to DOWN <6>ix1.1003: link state changed to DOWN <6>ix1: link state changed to UP <6>ix1.1003: link state changed to UP <3>nd6_dad_timer: called with non-tentative address fe80:d::1:1(ix1.1003) <6>ix1: link state changed to DOWN <6>ix1.1003: link state changed to DOWN <6>ix1: link state changed to UP <6>ix1.1003: link state changed to UP [snip] Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 0c fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80fd9293 stack pointer = 0x28:0xfffffe00cd68c9d0 frame pointer = 0x28:0xfffffe00cd68ca20 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 4585 (ifconfig) rdi: fffff80148fd0180 rsi: fffff80001bca800 rdx: 0 rcx: 0 r8: fffffe00cd4ba700 r9: fffffe00cd68d000 rax: 0 rbx: fffff80148fd0180 rbp: fffffe00cd68ca20 r10: 20 r11: 10 r12: fffff80148fd0180 r13: ffffffffbffffff8 r14: fffffe00cd4ba1e0 r15: fffff80001bca800 trap number = 12 panic: page fault cpuid = 1 time = 1679565575 KDB: enter: panic panic.txt0600001214407021407 7130 ustarrootwheelpage faultversion.txt06000045714407021407 7620 ustarrootwheelFreeBSD 14.0-CURRENT #0 plus-RELENG_23_01-n256037-6e914874a5e: Fri Feb 10 20:30:29 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/obj/amd64/VDZvZksF/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/sources/FreeBSD-src-plus-RELENG_23_01/amd64.amd64/sys/pfSense
Dump header from device: /dev/nvd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 254976 Blocksize: 512 Compression: none Dumptime: 2023-03-23 09:59:35 +0000 Hostname: Router-8.redacted.me Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #0 plus-RELENG_23_01-n256037-6e914874a5e: Fri Feb 10 20:30:29 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/obj/amd64/VDZvZksF/var/j Panic String: page fault Dump Parity: 594112824 Bounds: 0 Dump Status: good
It's weird that a planned (or unplanned) interface change can prompt a crash but I am still learning pfSense so an user-error is a clear possibility.
I think I may need a prod on how to properly add crash reports to the forum too...
️
-
-
@robbiett said in Netgate 6100 Crash On Interface Change:
db:1:pfs> bt
[snip]The backtrace is usually the most telling part. It appears to have been snipped, do you have it?
-
@stephenw10
I have it all, I just ran into a forum size limit - crude cuts were made.db:1:pfs> bt Tracing pid 4585 tid 100445 td 0xfffffe00cd4ba1e0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cd68c790 vpanic() at vpanic+0x182/frame 0xfffffe00cd68c7e0 panic() at panic+0x43/frame 0xfffffe00cd68c840 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cd68c8a0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cd68c900 calltrap() at calltrap+0x8/frame 0xfffffe00cd68c900 --- trap 0xc, rip = 0xffffffff80fd9293, rsp = 0xfffffe00cd68c9d0, rbp = 0xfffffe00cd68ca20 --- in6_unlink_ifa() at in6_unlink_ifa+0x63/frame 0xfffffe00cd68ca20 in6_purgeaddr() at in6_purgeaddr+0x367/frame 0xfffffe00cd68cb40 in6_purgeifaddr() at in6_purgeifaddr+0x13/frame 0xfffffe00cd68cb60 in6_control() at in6_control+0x532/frame 0xfffffe00cd68cbc0 ifioctl() at ifioctl+0x7bc/frame 0xfffffe00cd68ccc0 kern_ioctl() at kern_ioctl+0x26d/frame 0xfffffe00cd68cd30 sys_ioctl() at sys_ioctl+0x101/frame 0xfffffe00cd68ce00 amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe00cd68cf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cd68cf30 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x18f47cd96e4a, rsp = 0x18f478021f28, rbp = 0x18f478021f70 ---
Is the above what you are looking for?
️
-
Yes, checking....
-
Hmm, so you're losing link on ix1 there when the switch rebooted?
And you have IPv6 addresses on ix1 dircetly or just on VLAN 1003 on that?
-
Are you able to replicate that manually?
That is a bug, it should not happen. We probably need to replicate it locally to solve though.
-
@stephenw10
Yes, lost link on firmware update (thanks UniFi) on this occasion but it can happen when changing interface settings on pfSense.On ix1 I have 1 x LAN and 1 x VLAN; these are my main networks. They both have IPv4 & IPv6.
I have one other LAN interface on igc0 that acts as a management interface but it is not normally connected.
️
-
@stephenw10 said in Netgate 6100 Crash On Interface Change:
Are you able to replicate that manually?
That is a bug, it should not happen. We probably need to replicate it locally to solve though.
It is repeatable and happens roughly 60% of the time if settings are adjusted when the interface is enabled. If the interface is disabled first, then settings changed, then re-enabled it is much less prevalent, under 20% of the time at a guess.
The example given for this thread is clearly just 1 in 1 event, albeit with the same undesirable result.
️
-
Ok, we replicated it here. Digging now....
-
@stephenw10 said in Netgate 6100 Crash On Interface Change:
Ok, we replicated it here. Digging now....
Fine work, fine work indeed.
-
A fix for this has now gone in upstream: https://redmine.pfsense.org/issues/14164
That's not something that can be patched at run time though.
Steve
-
@stephenw10 - Thanks Steve, from your comment I guess this will percolate down for a version update window at an unknown date?
️
-
Yes it should be in the next version, 23.05.
In fact it's in our repo now. It should be available for testing in todays snapshots:
https://github.com/pfsense/FreeBSD-src/commit/f5a365e51feea75d1e5ebc86c53808d8cae7b6d7Steve
-
@stephenw10
Thanks again and I am slightly embarrassed to find a bug so soon into my Netgate journey. I'll keep my head down for a bit!️
-
Don't be. If everyone reported bugs as soon as they found them with the details you did there would be far fewer to find!
Steve
-
Would this bug affect a 4100 also?
It appears that I stumbled across this issue the other day when I was messing with traffic shaping on an interface that had ipv6 enabled. The moment I clicked on 'save', the blue LED on the 4100 stopped blinking, UI became unresponsive and all clients lost internet access.
I had to cycle power on the 4100 to restore normal operation. -
@azdeltawye
Sounds very similar, to say the least. Hopefully the fix will solve all.️
-
Yeah, in fact I don't think it's system specific, or even NIC specific. But even if it is the 4100 is similar enough I'd expect it's possible to hit it there too.
Steve
-
@stephenw10 said in Netgate 6100 Crash On Interface Change:
Yes it should be in the next version, 23.05.
In fact it's in our repo now. It should be available for testing in todays snapshots:
https://github.com/pfsense/FreeBSD-src/commit/f5a365e51feea75d1e5ebc86c53808d8cae7b6d7Steve
Hi Steve,
Unfortunately the issue persists in 23.05. A change in interface state can still trigger a crash.I ran 7 simple and repeatable tests today - bringing the WAN interface down and back up again (this is enough to trigger the fault) via the
disconnect
button on Status/Interfaces.7 tests - 4 failures with hard crashes, 3 did not trigger a crash.
All 4 failures produced a full crash report, info dump and textdump.tar. All available on request.
The back-trace for the 4 crashes are as follows.
First:
Filename: /var/crash/info.0 Dump header from device: /dev/nvd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 231424 Blocksize: 512 Compression: none Dumptime: 2023-05-28 14:48:50 +0100 Hostname: Router-8.*******.me Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j Panic String: page fault Dump Parity: 4132315394 Bounds: 0 Dump Status: good Filename: /var/crash/textdump.tar.0 ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������610534������14434655702� 7122� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 93402 tid 103857 td 0xfffffe00cf7cac80 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cf8a0800 vpanic() at vpanic+0x183/frame 0xfffffe00cf8a0850 panic() at panic+0x43/frame 0xfffffe00cf8a08b0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cf8a0910 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cf8a0970 calltrap() at calltrap+0x8/frame 0xfffffe00cf8a0970 --- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00cf8a0a40, rbp = 0xfffffe00cf8a0a70 --- in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00cf8a0a70 tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00cf8a0c60 tcp_output() at tcp_output+0x14/frame 0xfffffe00cf8a0c80 tcp6_usr_connect() at tcp6_usr_connect+0x2f4/frame 0xfffffe00cf8a0d10 soconnectat() at soconnectat+0x9e/frame 0xfffffe00cf8a0d60 kern_connectat() at kern_connectat+0xc9/frame 0xfffffe00cf8a0dc0 sys_connect() at sys_connect+0x75/frame 0xfffffe00cf8a0e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00cf8a0f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cf8a0f30 --- syscall (98, FreeBSD ELF64, connect), rip = 0x800fddc8a, rsp = 0x7fffdf5f8c98, rbp = 0x7fffdf5f8cd0 --- db:1:pfs>
Second:
Filename: /var/crash/info.0 Dump header from device: /dev/nvd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 226304 Blocksize: 512 Compression: none Dumptime: 2023-05-28 14:51:49 +0100 Hostname: Router-8.*******.me Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j Panic String: page fault Dump Parity: 1095311618 Bounds: 0 Dump Status: good Filename: /var/crash/textdump.tar.0 ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������577521������14434656165� 7136� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 68614 tid 100330 td 0xfffffe00cf325720 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c7d955f0 vpanic() at vpanic+0x183/frame 0xfffffe00c7d95640 panic() at panic+0x43/frame 0xfffffe00c7d956a0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00c7d95700 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c7d95760 calltrap() at calltrap+0x8/frame 0xfffffe00c7d95760 --- trap 0xc, rip = 0xffffffff80f63aa4, rsp = 0xfffffe00c7d95830, rbp = 0xfffffe00c7d95a50 --- ip6_output() at ip6_output+0xb74/frame 0xfffffe00c7d95a50 udp6_send() at udp6_send+0x78e/frame 0xfffffe00c7d95c10 sosend_dgram() at sosend_dgram+0x357/frame 0xfffffe00c7d95c70 sousrsend() at sousrsend+0x5f/frame 0xfffffe00c7d95cd0 kern_sendit() at kern_sendit+0x132/frame 0xfffffe00c7d95d60 sendit() at sendit+0xb7/frame 0xfffffe00c7d95db0 sys_sendto() at sys_sendto+0x4d/frame 0xfffffe00c7d95e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00c7d95f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c7d95f30 --- syscall (133, FreeBSD ELF64, sendto), rip = 0x823f95f2a, rsp = 0x8202cea88, rbp = 0x8202cead0 --- db:1:pfs>
Third:
Crash report details: No PHP errors found. Filename: /var/crash/info.0 Dump header from device: /dev/nvd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 229888 Blocksize: 512 Compression: none Dumptime: 2023-05-28 15:11:48 +0100 Hostname: Router-8.*******.me Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j Panic String: page fault Dump Parity: 276046082 Bounds: 0 Dump Status: good Filename: /var/crash/textdump.tar.0 ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������605706������14434660444� 7126� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 3281 tid 100913 td 0xfffffe00cfe3e3a0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cfdc4800 vpanic() at vpanic+0x183/frame 0xfffffe00cfdc4850 panic() at panic+0x43/frame 0xfffffe00cfdc48b0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cfdc4910 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cfdc4970 calltrap() at calltrap+0x8/frame 0xfffffe00cfdc4970 --- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00cfdc4a40, rbp = 0xfffffe00cfdc4a70 --- in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00cfdc4a70 tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00cfdc4c60 tcp_output() at tcp_output+0x14/frame 0xfffffe00cfdc4c80 tcp6_usr_connect() at tcp6_usr_connect+0x2f4/frame 0xfffffe00cfdc4d10 soconnectat() at soconnectat+0x9e/frame 0xfffffe00cfdc4d60 kern_connectat() at kern_connectat+0xc9/frame 0xfffffe00cfdc4dc0 sys_connect() at sys_connect+0x75/frame 0xfffffe00cfdc4e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00cfdc4f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cfdc4f30 --- syscall (98, FreeBSD ELF64, connect), rip = 0x800fddc8a, rsp = 0x7fffdfbfbc98, rbp = 0x7fffdfbfbcd0 --- db:1:pfs>
Fourth:
Crash report details: No PHP errors found. Filename: /var/crash/info.0 Dump header from device: /dev/nvd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 230400 Blocksize: 512 Compression: none Dumptime: 2023-05-28 15:17:27 +0100 Hostname: Router-8.*******.me Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j Panic String: page fault Dump Parity: 1131880706 Bounds: 0 Dump Status: good Filename: /var/crash/textdump.tar.0 ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������607520������14434661167� 7125� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 2 tid 100041 td 0xfffffe0085264560 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00850ad910 vpanic() at vpanic+0x183/frame 0xfffffe00850ad960 panic() at panic+0x43/frame 0xfffffe00850ad9c0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe00850ada20 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00850ada80 calltrap() at calltrap+0x8/frame 0xfffffe00850ada80 --- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00850adb50, rbp = 0xfffffe00850adb80 --- in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00850adb80 tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00850add70 tcp_timer_rexmt() at tcp_timer_rexmt+0x514/frame 0xfffffe00850addd0 tcp_timer_enter() at tcp_timer_enter+0x102/frame 0xfffffe00850ade10 softclock_call_cc() at softclock_call_cc+0x13c/frame 0xfffffe00850adec0 softclock_thread() at softclock_thread+0xe9/frame 0xfffffe00850adef0 fork_exit() at fork_exit+0x7d/frame 0xfffffe00850adf30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00850adf30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db:1:pfs>
The 'solved' status on Redline may need revising:
https://redmine.pfsense.org/issues/14164Sorry to be the bearer of this news. It is an awkward fault to have as even a small interrupt from my ISP can trigger the router to crash.
-
Urgh, that's disappointing. That looks like two slightly different crashes though. Do you see anything other than the two backtraces shown above?
I reopened it.
Steve