Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Crash when switching interface OFF and ON again

    Scheduled Pinned Locked Moved 2.5 Development Snapshots (Retired)
    27 Posts 6 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • L
      louis2
      last edited by

      Hello,

      Hereby a crash report related to switching OFF and ON one of my vlan-interfaces. I can reproduce this fairly easily and did already report it at least twice earlier. I hope Netgate takes it serious this time 😊.

      Note that I had PIMD installed at the moment of this crash, but I doubt, if it is related.

      Louis
      Note that PIMD is not yet working (what is a kown issue)

      info(4).0 textdump.tar(4).0

      viktor_gV 1 Reply Last reply Reply Quote 0
      • viktor_gV
        viktor_g Netgate @louis2
        last edited by

        @louis2 I see that you use 2.5-devel branch,
        can you reproduce it on the latest stable 2.4.5-p1 version?
        where I can see previous reports?

        1 Reply Last reply Reply Quote 0
        • L
          louis2
          last edited by louis2

          E.g. in the bug report below, I have comparable issues

          https://redmine.pfsense.org/issues/10702#change-46965

          In the beginning I thought that there was a direct relation with actual PIMD-issues, which I have been testing, but I am not so sure about that any more.
          The reason that I had that idea is that, I have been switching interfaces OFF and ON a lot “to OS internally restart interfaces, necessary because of the faulty PIMD start (should not start and certainly not two times in parallel, “the OS cannot handle that”).

          As said, at this moment I think … it is a separate problem.

          I had this crash many times always related to at least swithing ON and OFF an interface, where pfSense freezes at the moment I reactivate the interface or shortly after.

          Most of the cases the system recovered with a crash dump

          Louis

          1 Reply Last reply Reply Quote 0
          • L
            louis2
            last edited by

            Sorry,

            I forgot to answer your first question "can you reproduce it on the latest stable 2.4.5-p1 version?".

            The answer is that I cannot remember seeing it on 2.4.5, however I did switch to 2.5.0 dev weeks ago and did almost all tests on 2.5.

            Note that IMHO it is not unlikely that this is an 2.5 only issue since it seems to be "near to the OS".

            Louis

            1 Reply Last reply Reply Quote 0
            • L
              louis2
              last edited by

              I did some more testing using latest snapshot.

              My actual conclusion is that the problem is, as original assumed, related to the known PIMD-issue(s).

              Note that apart from crashes I also notice boot loops related to etc/inc/config,lib.inc on line 383 related to using a scalar as array (already reported earlier)

              So I will remover PIMD for now, hoping that that will "solve" the issue.

              Louis

              1 Reply Last reply Reply Quote 0
              • w0wW
                w0w
                last edited by w0w

                I think it can be all related to same RADIX_MPATH since this feature was enabled in pfSense.
                https://redmine.pfsense.org/issues/9544
                It seems that this feature was implemented in 2011 on freebsd 8, but it does not look to be stable even in 2020, there are some tickets on freebsd opened years ago, maybe Netgate guys can push some fixes, don't know.
                @jimp said in pf_test: kif == NULL, if_xname on multi-WAN and "Reset all states if WAN IP Address changes":

                It's possible. There are other problems with RADIX_MPATH as well but I'm not sure if we're going to look into fixing them or back that out.

                1 Reply Last reply Reply Quote 0
                • L
                  louis2
                  last edited by

                  Yep,

                  Could be. My feeling is that there might be more problems below the surface as well ... IGMP-proxy and now PIMD have never been working in a decent way for years(!) now.

                  But I hope that will change soon. The really good news is that Viktor did a patch today!
                  As soon as it is in the snapshot, I will test it.

                  But speaking in general terms, the problem I have with Netgate is that at the moment I discover an issue and notify something is not working, probably including some findings
                  like:

                  • a) not ok
                  • b) not ok
                  • c) not ok

                  Please have a look!
                  They shoot me down ☺

                  Netgate want an exact bug report related to a) etc.

                  And after fixing that it is not unlikely that it is still not working because of "b) and c)" and perhaps even other thinks.

                  So what I would prefer is that they take this as an package, fix it all, including the communication with the FreeBSD developers, if necessary. Not saying that that does not happen, but it is not the default attitude.

                  Louis

                  1 Reply Last reply Reply Quote 0
                  • w0wW
                    w0w
                    last edited by

                    There are a lot of other problems, priorities and clients they are focused on, so be patient :)
                    Anyway, if those crashes are related to radix_mpath than there are solutions already, if no other way found — they will recompile the kernel without this option enabled. If Netgate really want to use it in some installations they can do some custom kernel, snapshot or some GUI selection that will boot RADIX_MPATH enabled kernel. At least three solutions I can take out of my head and they definitely can add own solution.

                    1 Reply Last reply Reply Quote 0
                    • L
                      louis2
                      last edited by

                      Viktor,

                      A question and a remark.

                      Starting with the question, I noticed that your patch is related to the pimd-package and not to some generic startup file, what I (without related knowledge) had expected. So it will be a package update (to 0.3) and not a system update. Could you briefly describe what that implies in terms of process steps?

                      A remark related to a finding today. I installed pimd on a system which was not started with pimd. So the startup problems you solved, where not there .... I suppose ..... Never the less ...... it was (unexpected) not working correctly .... (vif issue still there)

                      I did some forced tests (after the feebsd fix) two weeks ago, using a procedure like this:

                      • pimd -q (stop)
                      • stopping one of my interfaces
                      • starting the the interface again (which does force some os-internal reset, I think ..... but what often also lead to a crash)
                      • pimd -c /var/etc/pimd/pimd.conf -d -f -N (start)
                      • pimd -r (show routes)

                      By then that (forced test) worked!

                      So I am really curious and hoping that the patch fixes "all issues".
                      If not at least it fix some of them ☺

                      Louis

                      1 Reply Last reply Reply Quote 0
                      • L
                        louis2
                        last edited by

                        I just successful repeated my 23/6 experiment:

                        • pimd not working correct only seeing 3 out of many vlans
                        • testing more or less as described above disabling two (of the recognised) interfaces, switching them on one by one) and see ......
                        • pimd does recognizing all of them .....
                        • no crash ......

                        I did the test again because, by then I did assume that the problem was correlated with the probably repaired boot problem, where my today tests where pimd was installed after the boot, raise again the verdict that there is another fix to go.

                        To be sure, we / I have to test the actual patch first

                        Louis

                        1 Reply Last reply Reply Quote 0
                        • JeGrJ
                          JeGr LAYER 8 Moderator
                          last edited by

                          I see a similar/related problem. For the last few 2.5 snapshots (around 2-3 weeks) every time I modify my WAN interface (especially in IPv6 settings), the whole box crashes and reboots. Immediatly.

                          Last crash happend a few seconds ago after updating to the latest snapshot and trying to modify WAN with disabling a DHCP6 option (don't release prefix option). It panic'ed, crashed and rebooted.

                          crashdump.zip

                          Box showed no problems running 2.4.x stable. HW is a network appliance (Lanner FW7525) with Atom C2558 so nothing out of the ordinary, exotic or unsupported.

                          \jens

                          Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

                          If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                          1 Reply Last reply Reply Quote 0
                          • jimpJ
                            jimp Rebel Alliance Developer Netgate
                            last edited by

                            The first panic on this thread appears to be related to PIMD traffic:

                            Fatal trap 12: page fault while in kernel mode
                            cpuid = 3; apic id = 03
                            fault virtual address	= 0x3000
                            fault code		= supervisor write data, page not present
                            instruction pointer	= 0x20:0xffffffff80e934f5
                            stack pointer	        = 0x28:0xfffffe00004de7f0
                            frame pointer	        = 0x28:0xfffffe00004de7f0
                            code segment		= base 0x0, limit 0xfffff, type 0x1b
                            			= DPL 0, pres 1, long 1, def32 0, gran 1
                            processor eflags	= interrupt enabled, resume, IOPL = 0
                            current process		= 12 (swi1: netisr 2)
                            trap number		= 12
                            panic: page fault
                            cpuid = 3
                            time = 1593767353
                            KDB: enter: panic
                            
                            db:0:kdb.enter.default>  bt
                            Tracing pid 12 tid 100040 td 0xfffff800043a7000
                            kdb_enter() at kdb_enter+0x37/frame 0xfffffe00004de4b0
                            vpanic() at vpanic+0x197/frame 0xfffffe00004de500
                            panic() at panic+0x43/frame 0xfffffe00004de560
                            trap_fatal() at trap_fatal+0x391/frame 0xfffffe00004de5c0
                            trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004de610
                            trap() at trap+0x286/frame 0xfffffe00004de720
                            calltrap() at calltrap+0x8/frame 0xfffffe00004de720
                            --- trap 0xc, rip = 0xffffffff80e934f5, rsp = 0xfffffe00004de7f0, rbp = 0xfffffe00004de7f0 ---
                            if_inc_counter() at if_inc_counter+0x15/frame 0xfffffe00004de7f0
                            if_simloop() at if_simloop+0xd1/frame 0xfffffe00004de830
                            pim_input() at pim_input+0x409/frame 0xfffffe00004de890
                            encap_input() at encap_input+0xd1/frame 0xfffffe00004de900
                            encap4_input() at encap4_input+0x28/frame 0xfffffe00004de930
                            ip_input() at ip_input+0x168/frame 0xfffffe00004de9e0
                            swi_net() at swi_net+0x12b/frame 0xfffffe00004dea50
                            ithread_loop() at ithread_loop+0x23c/frame 0xfffffe00004deab0
                            fork_exit() at fork_exit+0x7e/frame 0xfffffe00004deaf0
                            fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004deaf0
                            --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
                            

                            The second appears to be a crash when attempting to send an IPv6 packet

                            Fatal trap 12: page fault while in kernel mode
                            cpuid = 0; apic id = 00
                            fault virtual address	= 0x0
                            fault code		= supervisor read data, page not present
                            instruction pointer	= 0x20:0xffffffff8102e56a
                            stack pointer	        = 0x28:0xfffffe004ba27550
                            frame pointer	        = 0x28:0xfffffe004ba277b0
                            code segment		= base 0x0, limit 0xfffff, type 0x1b
                            			= DPL 0, pres 1, long 1, def32 0, gran 1
                            processor eflags	= interrupt enabled, resume, IOPL = 0
                            current process		= 6861 (dpinger)
                            trap number		= 12
                            panic: page fault
                            cpuid = 2
                            time = 1594204579
                            KDB: enter: panic
                            
                            db:0:kdb.enter.default>  bt
                            Tracing pid 6861 tid 100357 td 0xfffff8011dd1f000
                            kdb_enter() at kdb_enter+0x37/frame 0xfffffe004ba27210
                            vpanic() at vpanic+0x197/frame 0xfffffe004ba27260
                            panic() at panic+0x43/frame 0xfffffe004ba272c0
                            trap_fatal() at trap_fatal+0x391/frame 0xfffffe004ba27320
                            trap_pfault() at trap_pfault+0x4f/frame 0xfffffe004ba27370
                            trap() at trap+0x286/frame 0xfffffe004ba27480
                            calltrap() at calltrap+0x8/frame 0xfffffe004ba27480
                            --- trap 0xc, rip = 0xffffffff8102e56a, rsp = 0xfffffe004ba27550, rbp = 0xfffffe004ba277b0 ---
                            ip6_output() at ip6_output+0xd3a/frame 0xfffffe004ba277b0
                            rip6_output() at rip6_output+0x507/frame 0xfffffe004ba27980
                            rip6_send() at rip6_send+0x10d/frame 0xfffffe004ba279e0
                            sosend_generic() at sosend_generic+0x4ca/frame 0xfffffe004ba27a90
                            sosend() at sosend+0x50/frame 0xfffffe004ba27ac0
                            kern_sendit() at kern_sendit+0x19d/frame 0xfffffe004ba27b60
                            sendit() at sendit+0x19c/frame 0xfffffe004ba27bb0
                            sys_sendto() at sys_sendto+0x4d/frame 0xfffffe004ba27c00
                            amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe004ba27d30
                            fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe004ba27d30
                            --- syscall (133, FreeBSD ELF64, sys_sendto), rip = 0x8003cde0a, rsp = 0x7fffdfdfcf48, rbp = 0x7fffdfdfcf90 ---
                            

                            I suppose it's possible they are both related to routing in some way but there isn't any solid evidence pointing to that specifically.

                            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            1 Reply Last reply Reply Quote 0
                            • JeGrJ
                              JeGr LAYER 8 Moderator
                              last edited by

                              @jimp said in Crash when switching interface OFF and ON again:

                              The second appears to be a crash when attempting to send an IPv6 packet

                              That would've been mine. And yes that always/very often happens when I edit any IP6 related settings on WAN or other VLANs. Started only when upgrading to 2.5dev to test those upnp changes and help testing though, with 2.4.5 had no such problems.

                              Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

                              If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                              1 Reply Last reply Reply Quote 0
                              • JeGrJ
                                JeGr LAYER 8 Moderator
                                last edited by

                                @jimp also had 3-4 crashes in a week again. More than the whole last year with 2.4.x stable. Something is definitely off or making trouble driver-vise. And it's always something interface (maybe v6) related as it is every time my ISP does some "work" (meh) and the internet's gone.

                                Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

                                If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                                1 Reply Last reply Reply Quote 0
                                • L
                                  louis2
                                  last edited by

                                  @JeGR , @jimp,

                                  In the past month my system was crashing like hel as soon if was even looking at the switch interface "off and on" button. Since a couple of days that seems a bit better, but it is not OK yet.

                                  I still can easily provoke a crash, but it is more in specific situations now, where switching an interface was almost a certain crash in the past 2 month.

                                  Note that:

                                  • I have native IPV6 so that can be related!
                                  • there are also multicast related issues, which are causing interface crashes

                                  I desperately tried to get PIMD to work, simply impossible on my system/configuration, but that also has to do with interfaces and vifs:

                                  • switching off and on the corresponding vip ==> crash
                                  • interfaces not being process correctly: "Jul 22 19:19:51 pfSense pimd[35276]: /var/etc/pimd/pimd.conf:7 - Invalid phyint address 'ix1.116'

                                  These issues are probably not caused by the pfSense software itself, but it definitively affect it. There are probably things in the FreeBSD-kernel not 100% OK,

                                  I did generate a FreeBSD-bug report (248103). I can see the wrong behavoir, however

                                  I do not have testing facilities, debug kernels etc, to tell what is technically exactly wrong (which module, which function call).

                                  So, where I am hoping for! is that Netgate having far better testing facilities and the skilled engineers, are willing to point more exact to the “deeper down problem cause", so that the FreeBSD engineers can fix it!

                                  Quite sure that there is at least one sevire bug related to interface processing down there in the kernel (interrupt handling, iflib, multicast routing or so).

                                  Below a small part of a crash report I often see.

                                  Louis
                                  PS note that I amj using an amd64 system and the interfaces I am swithing off and on are all intel igb or ix based, and all using the iflib driver type.

                                  Fatal trap 12: page fault while in kernel mode
                                  cpuid = 1; apic id = 01
                                  fault virtual address = 0x1000
                                  fault code = supervisor write data, page not present
                                  instruction pointer = 0x20:0xffffffff80e934f5
                                  stack pointer = 0x28:0xfffffe00004de7f0
                                  frame pointer = 0x28:0xfffffe00004de7f0
                                  code segment = base 0x0, limit 0xfffff, type 0x1b
                                  = DPL 0, pres 1, long 1, def32 0, gran 1
                                  processor eflags = interrupt enabled, resume, IOPL = 0
                                  current process = 12 (swi1: netisr 2)
                                  trap number = 12
                                  panic: page fault
                                  cpuid = 1
                                  time = 1594920087
                                  KDB: enter: panic

                                  1 Reply Last reply Reply Quote 0
                                  • L
                                    louis2
                                    last edited by

                                    This post is deleted!
                                    1 Reply Last reply Reply Quote 0
                                    • w0wW
                                      w0w
                                      last edited by

                                      I don't think it's related to IPv6, but to RADIX_MPATH option.
                                      https://forum.netgate.com/topic/150986/pf_test-kif-null-if_xname-on-multi-wan-and-reset-all-states-if-wan-ip-address-changes/33
                                      I've tested different variations and also disabled IPv6, nothing.
                                      My current crashes are:

                                      Filename: /var/crash/info.0
                                      Dump header from device: /dev/ada1p3
                                        Architecture: amd64
                                        Architecture Version: 4
                                        Dump Length: 74752
                                        Blocksize: 512
                                        Compression: none
                                        Dumptime: Fri Jul 10 22:17:47 2020
                                        Hostname: -
                                        Magic: FreeBSD Text Dump
                                        Version String: FreeBSD 12.1-STABLE df4360fdf61(devel-12) pfSense
                                        Panic String: sleeping thread
                                        Dump Parity: 4138356775
                                        Bounds: 0
                                        Dump Status: good
                                      
                                      Filename: /var/crash/info.1
                                      Dump header from device: /dev/ada1p3
                                        Architecture: amd64
                                        Architecture Version: 4
                                        Dump Length: 157696
                                        Blocksize: 512
                                        Compression: none
                                        Dumptime: Fri Jul 10 22:28:12 2020
                                        Hostname: -
                                        Magic: FreeBSD Text Dump
                                        Version String: FreeBSD 12.1-STABLE df4360fdf61(devel-12) pfSense
                                        Panic String: page fault
                                        Dump Parity: 30683997
                                        Bounds: 1
                                        Dump Status: good
                                      
                                      Filename: /var/crash/info.2
                                      Dump header from device: /dev/ada1p3
                                        Architecture: amd64
                                        Architecture Version: 4
                                        Dump Length: 157696
                                        Blocksize: 512
                                        Compression: none
                                        Dumptime: Tue Jul 21 13:41:55 2020
                                        Hostname: -
                                        Magic: FreeBSD Text Dump
                                        Version String: FreeBSD 12.1-STABLE fcbc0f88231(devel-12) pfSense
                                        Panic String: general protection fault
                                        Dump Parity: 1753744959
                                        Bounds: 2
                                        Dump Status: good
                                      
                                      Filename: /var/crash/info.3
                                      Dump header from device: /dev/ada1p3
                                        Architecture: amd64
                                        Architecture Version: 4
                                        Dump Length: 157696
                                        Blocksize: 512
                                        Compression: none
                                        Dumptime: Tue Jul 21 13:58:07 2020
                                        Hostname: -
                                        Magic: FreeBSD Text Dump
                                        Version String: FreeBSD 12.1-STABLE cf48cd75cf5(devel-12) pfSense
                                        Panic String: page fault
                                        Dump Parity: 680143114
                                        Bounds: 3
                                        Dump Status: good
                                      
                                      Filename: /var/crash/info.4
                                      Dump header from device: /dev/ada1p3
                                        Architecture: amd64
                                        Architecture Version: 4
                                        Dump Length: 75776
                                        Blocksize: 512
                                        Compression: none
                                        Dumptime: Wed Jul 22 12:01:14 2020
                                        Hostname: -
                                        Magic: FreeBSD Text Dump
                                        Version String: FreeBSD 12.1-STABLE cf48cd75cf5(devel-12) pfSense
                                        Panic String: general protection fault
                                        Dump Parity: 387144248
                                        Bounds: 4
                                        Dump Status: good
                                      

                                      I have ix card also. Played with drivers some time ago. Nothing. Since I've backup firewall I've moved backup to 2.5 and primary to 2.4. The fatal errors are still coming only on 2.5 version even on absolutely different hardware.

                                      1 Reply Last reply Reply Quote 0
                                      • JeGrJ
                                        JeGr LAYER 8 Moderator
                                        last edited by

                                        Then it may be another problem for you @w0w as my appliance uses igb NICs and it's crash-dumping immediatly if something is triggering the NIC like an "internet down" or "interface down" on the WAN. The last few weeks if my ISP has problems and reboots my front router (AVM box) I can wait for the "beep" of my pfSense box shortly after as it crashdumps and reboots.

                                        Don't forget to upvote 👍 those who kindly offered their time and brainpower to help you!

                                        If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                                        w0wW 1 Reply Last reply Reply Quote 0
                                        • w0wW
                                          w0w @JeGr
                                          last edited by

                                          @JeGr
                                          No, my backup firewall uses igb and moreover, I have tested em card, same crashes. I do not think it is hardware related, I am pretty sure it is not.

                                          1 Reply Last reply Reply Quote 0
                                          • L
                                            louis2
                                            last edited by

                                            @w0w said in Crash when switching interface OFF and ON again:

                                            @JeGr
                                            No, my backup firewall uses igb and moreover, I have tested em card, same crashes. I do not think it is hardware related, I am pretty sure it is not.

                                            We simply do not know what is causing the crashes. Can be one bug very deep down or multiple bugs.

                                            However be aware that FreeBSD is in FreeBSD12 redesigning its driver structure.

                                            Where each interface type used to have its own driver, they are transforming that in the direction of "a mini driver per device type" and doing the upperlayer driver things in a new library called "iflib"

                                            I know that on my intel64 system all three IO-ports em, igb and ix are using this principle / the "iflib".

                                            Louis

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.