Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Netgate 6100 Crash On Interface Change - Not Resolved (IPv6 + PPPoE)

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    42 Posts 3 Posters 5.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Yeah, in fact I don't think it's system specific, or even NIC specific. But even if it is the 4100 is similar enough I'd expect it's possible to hit it there too.

      Steve

      1 Reply Last reply Reply Quote 0
      • RobbieTTR
        RobbieTT @stephenw10
        last edited by RobbieTT

        @stephenw10 said in Netgate 6100 Crash On Interface Change:

        Yes it should be in the next version, 23.05.

        In fact it's in our repo now. It should be available for testing in todays snapshots:
        https://github.com/pfsense/FreeBSD-src/commit/f5a365e51feea75d1e5ebc86c53808d8cae7b6d7

        Steve

        Hi Steve,
        Unfortunately the issue persists in 23.05. A change in interface state can still trigger a crash.

        I ran 7 simple and repeatable tests today - bringing the WAN interface down and back up again (this is enough to trigger the fault) via the disconnect button on Status/Interfaces.

        7 tests - 4 failures with hard crashes, 3 did not trigger a crash.

        All 4 failures produced a full crash report, info dump and textdump.tar. All available on request.

        The back-trace for the 4 crashes are as follows.

        First:

        Filename: /var/crash/info.0
        Dump header from device: /dev/nvd0p3
          Architecture: amd64
          Architecture Version: 4
          Dump Length: 231424
          Blocksize: 512
          Compression: none
          Dumptime: 2023-05-28 14:48:50 +0100
          Hostname: Router-8.*******.me
          Magic: FreeBSD Text Dump
          Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023
            root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j
          Panic String: page fault
          Dump Parity: 4132315394
          Bounds: 0
          Dump Status: good
        
        Filename: /var/crash/textdump.tar.0
        ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������610534������14434655702�  7122� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default>  run pfs
        db:1:pfs> bt
        Tracing pid 93402 tid 103857 td 0xfffffe00cf7cac80
        kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cf8a0800
        vpanic() at vpanic+0x183/frame 0xfffffe00cf8a0850
        panic() at panic+0x43/frame 0xfffffe00cf8a08b0
        trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cf8a0910
        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cf8a0970
        calltrap() at calltrap+0x8/frame 0xfffffe00cf8a0970
        --- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00cf8a0a40, rbp = 0xfffffe00cf8a0a70 ---
        in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00cf8a0a70
        tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00cf8a0c60
        tcp_output() at tcp_output+0x14/frame 0xfffffe00cf8a0c80
        tcp6_usr_connect() at tcp6_usr_connect+0x2f4/frame 0xfffffe00cf8a0d10
        soconnectat() at soconnectat+0x9e/frame 0xfffffe00cf8a0d60
        kern_connectat() at kern_connectat+0xc9/frame 0xfffffe00cf8a0dc0
        sys_connect() at sys_connect+0x75/frame 0xfffffe00cf8a0e00
        amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00cf8a0f30
        fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cf8a0f30
        --- syscall (98, FreeBSD ELF64, connect), rip = 0x800fddc8a, rsp = 0x7fffdf5f8c98, rbp = 0x7fffdf5f8cd0 ---
        db:1:pfs> 
        

        Second:

        Filename: /var/crash/info.0
        Dump header from device: /dev/nvd0p3
          Architecture: amd64
          Architecture Version: 4
          Dump Length: 226304
          Blocksize: 512
          Compression: none
          Dumptime: 2023-05-28 14:51:49 +0100
          Hostname: Router-8.*******.me
          Magic: FreeBSD Text Dump
          Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023
            root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j
          Panic String: page fault
          Dump Parity: 1095311618
          Bounds: 0
          Dump Status: good
        
        Filename: /var/crash/textdump.tar.0
        ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������577521������14434656165�  7136� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default>  run pfs
        db:1:pfs> bt
        Tracing pid 68614 tid 100330 td 0xfffffe00cf325720
        kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c7d955f0
        vpanic() at vpanic+0x183/frame 0xfffffe00c7d95640
        panic() at panic+0x43/frame 0xfffffe00c7d956a0
        trap_fatal() at trap_fatal+0x409/frame 0xfffffe00c7d95700
        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c7d95760
        calltrap() at calltrap+0x8/frame 0xfffffe00c7d95760
        --- trap 0xc, rip = 0xffffffff80f63aa4, rsp = 0xfffffe00c7d95830, rbp = 0xfffffe00c7d95a50 ---
        ip6_output() at ip6_output+0xb74/frame 0xfffffe00c7d95a50
        udp6_send() at udp6_send+0x78e/frame 0xfffffe00c7d95c10
        sosend_dgram() at sosend_dgram+0x357/frame 0xfffffe00c7d95c70
        sousrsend() at sousrsend+0x5f/frame 0xfffffe00c7d95cd0
        kern_sendit() at kern_sendit+0x132/frame 0xfffffe00c7d95d60
        sendit() at sendit+0xb7/frame 0xfffffe00c7d95db0
        sys_sendto() at sys_sendto+0x4d/frame 0xfffffe00c7d95e00
        amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00c7d95f30
        fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c7d95f30
        --- syscall (133, FreeBSD ELF64, sendto), rip = 0x823f95f2a, rsp = 0x8202cea88, rbp = 0x8202cead0 ---
        db:1:pfs>
        

        Third:

        Crash report details:
        
        No PHP errors found.
        
        Filename: /var/crash/info.0
        Dump header from device: /dev/nvd0p3
          Architecture: amd64
          Architecture Version: 4
          Dump Length: 229888
          Blocksize: 512
          Compression: none
          Dumptime: 2023-05-28 15:11:48 +0100
          Hostname: Router-8.*******.me
          Magic: FreeBSD Text Dump
          Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023
            root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j
          Panic String: page fault
          Dump Parity: 276046082
          Bounds: 0
          Dump Status: good
        
        Filename: /var/crash/textdump.tar.0
        ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������605706������14434660444�  7126� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default>  run pfs
        db:1:pfs> bt
        Tracing pid 3281 tid 100913 td 0xfffffe00cfe3e3a0
        kdb_enter() at kdb_enter+0x32/frame 0xfffffe00cfdc4800
        vpanic() at vpanic+0x183/frame 0xfffffe00cfdc4850
        panic() at panic+0x43/frame 0xfffffe00cfdc48b0
        trap_fatal() at trap_fatal+0x409/frame 0xfffffe00cfdc4910
        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00cfdc4970
        calltrap() at calltrap+0x8/frame 0xfffffe00cfdc4970
        --- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00cfdc4a40, rbp = 0xfffffe00cfdc4a70 ---
        in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00cfdc4a70
        tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00cfdc4c60
        tcp_output() at tcp_output+0x14/frame 0xfffffe00cfdc4c80
        tcp6_usr_connect() at tcp6_usr_connect+0x2f4/frame 0xfffffe00cfdc4d10
        soconnectat() at soconnectat+0x9e/frame 0xfffffe00cfdc4d60
        kern_connectat() at kern_connectat+0xc9/frame 0xfffffe00cfdc4dc0
        sys_connect() at sys_connect+0x75/frame 0xfffffe00cfdc4e00
        amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00cfdc4f30
        fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00cfdc4f30
        --- syscall (98, FreeBSD ELF64, connect), rip = 0x800fddc8a, rsp = 0x7fffdfbfbc98, rbp = 0x7fffdfbfbcd0 ---
        db:1:pfs>
        

        Fourth:

        Crash report details:
        
        No PHP errors found.
        
        Filename: /var/crash/info.0
        Dump header from device: /dev/nvd0p3
          Architecture: amd64
          Architecture Version: 4
          Dump Length: 230400
          Blocksize: 512
          Compression: none
          Dumptime: 2023-05-28 15:17:27 +0100
          Hostname: Router-8.*******.me
          Magic: FreeBSD Text Dump
          Version String: FreeBSD 14.0-CURRENT #1 plus-RELENG_23_05-n256102-7cd3d043045: Mon May 22 15:33:52 UTC 2023
            root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_05-main/obj/amd64/LkEyii3W/var/j
          Panic String: page fault
          Dump Parity: 1131880706
          Bounds: 0
          Dump Status: good
        
        Filename: /var/crash/textdump.tar.0
        ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������607520������14434661167�  7125� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default>  run pfs
        db:1:pfs> bt
        Tracing pid 2 tid 100041 td 0xfffffe0085264560
        kdb_enter() at kdb_enter+0x32/frame 0xfffffe00850ad910
        vpanic() at vpanic+0x183/frame 0xfffffe00850ad960
        panic() at panic+0x43/frame 0xfffffe00850ad9c0
        trap_fatal() at trap_fatal+0x409/frame 0xfffffe00850ada20
        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00850ada80
        calltrap() at calltrap+0x8/frame 0xfffffe00850ada80
        --- trap 0xc, rip = 0xffffffff80f5a036, rsp = 0xfffffe00850adb50, rbp = 0xfffffe00850adb80 ---
        in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00850adb80
        tcp_default_output() at tcp_default_output+0x1ded/frame 0xfffffe00850add70
        tcp_timer_rexmt() at tcp_timer_rexmt+0x514/frame 0xfffffe00850addd0
        tcp_timer_enter() at tcp_timer_enter+0x102/frame 0xfffffe00850ade10
        softclock_call_cc() at softclock_call_cc+0x13c/frame 0xfffffe00850adec0
        softclock_thread() at softclock_thread+0xe9/frame 0xfffffe00850adef0
        fork_exit() at fork_exit+0x7d/frame 0xfffffe00850adf30
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00850adf30
        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
        db:1:pfs>
        

        The 'solved' status on Redline may need revising:
        https://redmine.pfsense.org/issues/14164

        Sorry to be the bearer of this news. It is an awkward fault to have as even a small interrupt from my ISP can trigger the router to crash.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Urgh, that's disappointing. That looks like two slightly different crashes though. Do you see anything other than the two backtraces shown above?

          I reopened it.

          Steve

          RobbieTTR 1 Reply Last reply Reply Quote 0
          • RobbieTTR
            RobbieTT @stephenw10
            last edited by

            @stephenw10
            There should be 4 back-traces shown above but I have full captures for all 4 events that I can send to you, if that would help.

            I hope you remember that it's a Bank Holiday Steve, so feel free to enjoy it instead. 👍

            Rob

            1 Reply Last reply Reply Quote 0
            • RobbieTTR
              RobbieTT
              last edited by

              New tracking ID on Redmine:

              https://redmine.pfsense.org/issues/14431

              ☕️

              1 Reply Last reply Reply Quote 1
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                We think we have found the cause here and have a test kernel to confirm it. It is only a test at this point though.

                Are you able to test that? So far we have not been able to replicate this locally.

                RobbieTTR 1 Reply Last reply Reply Quote 1
                • RobbieTTR
                  RobbieTT @stephenw10
                  last edited by

                  @stephenw10 said in Netgate 6100 Crash On Interface Change - Not Resolved (IPv6 + PPPoE):

                  Are you able to test that?

                  Yes of course. If you tell me how to load it, at what logging verbosity and which log extracts you would like to receive I will test it as soon as the network traffic allows.

                  Do you have an overview of the kernel changes?

                  ☕️

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by stephenw10

                    The changes here are to the interface handling in kernel. So mostly to sys/net/if.c and sys/netinet6/in6.c

                    They are a test only at this point and would need streamlining before including upstream. But running this should prove if this is what you're hitting.

                    Test Kernel:
                    Removed after finding a new bug.

                    Move /boot/kernel to a backup:

                    mv /boot/kernel /boot/kernel.old2
                    

                    Upload the tgz to /boot and then extract it there:

                    tar -xzf kernel-amd64-inet6-panic.tgz
                    

                    Then reboot into that. Confirm you're running that after booting:

                    [23.05-RELEASE][root@6100-3.stevew.lan]/root: uname -a
                    FreeBSD 6100-3.stevew.lan 14.0-CURRENT FreeBSD 14.0-CURRENT inet6_backport-n256104-f5556386d38 pfsense-NODEBUG amd64
                    

                    I've been running it here no problem but I couldn't hit the issue before hand so have a recovery solution in place! 😉

                    If you can test that and it no-longer hits that issue though we can get a long-term solution upstreamed.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      We found a bug in that test code and are working on something new.

                      I don't recommend running that yet.

                      RobbieTTR 2 Replies Last reply Reply Quote 0
                      • RobbieTTR
                        RobbieTT @stephenw10
                        last edited by

                        @stephenw10 said in Netgate 6100 Crash On Interface Change - Not Resolved (IPv6 + PPPoE):

                        I don't recommend running that yet.

                        Ok, understood. 👍

                        ☕️

                        1 Reply Last reply Reply Quote 0
                        • RobbieTTR
                          RobbieTT @stephenw10
                          last edited by

                          @stephenw10

                          Hi Steve,
                          Any progress with this issue as I am close to removing the Netgate 6100 from production use?

                          The recent changes do not appear to have changed anything of note, albeit the more random reboots no longer produce a crash log. There has not been much encouraging news on the associated redmine tracker either.

                          I'm not looking to dispose or return the 6100 when I change to a different vendor so it will be available for testing. I'm hoping to remain with Netgate but I am sure you appreciate that as a UK user the issue with PPPoE / IPv6 cannot be sustained.

                          ☕️

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Were you able to test a 23.09 snapshot?

                            Indeed I agree that here in the UK that's combo many, many people are running. Including me.

                            RobbieTTR 1 Reply Last reply Reply Quote 0
                            • RobbieTTR
                              RobbieTT @stephenw10
                              last edited by

                              @stephenw10 said in Netgate 6100 Crash On Interface Change - Not Resolved (IPv6 + PPPoE):

                              Were you able to test a 23.09 snapshot?

                              No I didn't, for a couple of reasons. The first is a general reluctance to run snapshots on a live production network and, secondly, there being no changes listed in v23.09 that suggest a possible fix.

                              A month ago Kristof Provost posted that he was unable to reproduce the fault, which was a concern.

                              That said, if you think a particular snapshot might help or at least add useful data please drop me a link and I will give it a go.

                              ☕️

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Mainly because if you can replicate it in 23.09 then we can can test with that to try and replicate it here. Otherwise we need to keep trying in 23.05.1 and that makes it more difficult because all the development is in 23.09 at the moment. And there's a chance it's already fixed in 23.09. Though I agree there hasn't been anything specific gone in.

                                RobbieTTR 1 Reply Last reply Reply Quote 0
                                • RobbieTTR
                                  RobbieTT @stephenw10
                                  last edited by

                                  @stephenw10

                                  Ok, that all makes sense.

                                  A bit of a pain to achieve though. It took countless refresh attempts to get beyond this stage, until pfSense 'found' the things it needed:

                                   2023-08-31 at 16.31.49.png

                                  Still, it eventually 'just worked' and all loaded just fine:

                                   2023-08-31 at 16.51.51.png

                                  It didn't crash on first interface change, so at least there is that. 🙃

                                  ☕️

                                  1 Reply Last reply Reply Quote 1
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Ok, cool. There's a routing issue with v6 at the pkg server our guys are working on. And we know you're using v6! Should be resolved shortly.

                                    Let us know if you hit it in 23.09. Thanks 👍

                                    RobbieTTR 2 Replies Last reply Reply Quote 0
                                    • RobbieTTR
                                      RobbieTT @stephenw10
                                      last edited by RobbieTT

                                      @stephenw10 said in Netgate 6100 Crash On Interface Change - Not Resolved (IPv6 + PPPoE):

                                      Let us know if you hit it in 23.09. Thanks 👍

                                      Will do.

                                      It survived 5 interface changes with no issue, before I ran out of time / actively complained at.

                                      ☕️

                                      1 Reply Last reply Reply Quote 1
                                      • RobbieTTR
                                        RobbieTT @stephenw10
                                        last edited by

                                        @stephenw10

                                        The issue persists with the latest dev snapshot. This crash was triggered by taking the WAN interface down & up again:

                                        db:1:pfs> bt
                                        Tracing pid 2 tid 100041 td 0xfffffe0085272560
                                        kdb_enter() at kdb_enter+0x32/frame 0xfffffe00850c5840
                                        vpanic() at vpanic+0x163/frame 0xfffffe00850c5970
                                        panic() at panic+0x43/frame 0xfffffe00850c59d0
                                        trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00850c5a30
                                        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00850c5a90
                                        calltrap() at calltrap+0x8/frame 0xfffffe00850c5a90
                                        --- trap 0xc, rip = 0xffffffff80f4d9e6, rsp = 0xfffffe00850c5b60, rbp = 0xfffffe00850c5b90 ---
                                        in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00850c5b90
                                        tcp_default_output() at tcp_default_output+0x1d97/frame 0xfffffe00850c5d70
                                        tcp_timer_rexmt() at tcp_timer_rexmt+0x52f/frame 0xfffffe00850c5dd0
                                        tcp_timer_enter() at tcp_timer_enter+0x101/frame 0xfffffe00850c5e10
                                        softclock_call_cc() at softclock_call_cc+0x134/frame 0xfffffe00850c5ec0
                                        softclock_thread() at softclock_thread+0xe9/frame 0xfffffe00850c5ef0
                                        fork_exit() at fork_exit+0x7f/frame 0xfffffe00850c5f30
                                        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00850c5f30
                                        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
                                        

                                        All hopes of an accidental fix were suddenly dashed...

                                        I have full logs, should you need them.

                                        ☕️

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Urgh. Ok, thanks. Let me see if ours guys would like to review....

                                          RobbieTTR 1 Reply Last reply Reply Quote 0
                                          • RobbieTTR
                                            RobbieTT @stephenw10
                                            last edited by

                                            @stephenw10

                                            Today, on 23.09.a.20230921.1219:

                                            db:1:pfs> bt
                                            Tracing pid 2 tid 100041 td 0xfffffe0085274560
                                            kdb_enter() at kdb_enter+0x32/frame 0xfffffe00850f9840
                                            vpanic() at vpanic+0x163/frame 0xfffffe00850f9970
                                            panic() at panic+0x43/frame 0xfffffe00850f99d0
                                            trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00850f9a30
                                            trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00850f9a90
                                            calltrap() at calltrap+0x8/frame 0xfffffe00850f9a90
                                            --- trap 0xc, rip = 0xffffffff80f4e066, rsp = 0xfffffe00850f9b60, rbp = 0xfffffe00850f9b90 ---
                                            in6_selecthlim() at in6_selecthlim+0x96/frame 0xfffffe00850f9b90
                                            tcp_default_output() at tcp_default_output+0x1d97/frame 0xfffffe00850f9d70
                                            tcp_timer_rexmt() at tcp_timer_rexmt+0x52f/frame 0xfffffe00850f9dd0
                                            tcp_timer_enter() at tcp_timer_enter+0x101/frame 0xfffffe00850f9e10
                                            softclock_call_cc() at softclock_call_cc+0x134/frame 0xfffffe00850f9ec0
                                            softclock_thread() at softclock_thread+0xe9/frame 0xfffffe00850f9ef0
                                            fork_exit() at fork_exit+0x7f/frame 0xfffffe00850f9f30
                                            fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00850f9f30
                                            --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
                                            

                                            Humbug. On to the next one...

                                            ☕️

                                            RobbieTTR 1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.