Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Crash when switching interface OFF and ON again

    Scheduled Pinned Locked Moved 2.5 Development Snapshots (Retired)
    27 Posts 6 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jimpJ
      jimp Rebel Alliance Developer Netgate
      last edited by

      The first panic on this thread appears to be related to PIMD traffic:

      Fatal trap 12: page fault while in kernel mode
      cpuid = 3; apic id = 03
      fault virtual address	= 0x3000
      fault code		= supervisor write data, page not present
      instruction pointer	= 0x20:0xffffffff80e934f5
      stack pointer	        = 0x28:0xfffffe00004de7f0
      frame pointer	        = 0x28:0xfffffe00004de7f0
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 12 (swi1: netisr 2)
      trap number		= 12
      panic: page fault
      cpuid = 3
      time = 1593767353
      KDB: enter: panic
      
      db:0:kdb.enter.default>  bt
      Tracing pid 12 tid 100040 td 0xfffff800043a7000
      kdb_enter() at kdb_enter+0x37/frame 0xfffffe00004de4b0
      vpanic() at vpanic+0x197/frame 0xfffffe00004de500
      panic() at panic+0x43/frame 0xfffffe00004de560
      trap_fatal() at trap_fatal+0x391/frame 0xfffffe00004de5c0
      trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004de610
      trap() at trap+0x286/frame 0xfffffe00004de720
      calltrap() at calltrap+0x8/frame 0xfffffe00004de720
      --- trap 0xc, rip = 0xffffffff80e934f5, rsp = 0xfffffe00004de7f0, rbp = 0xfffffe00004de7f0 ---
      if_inc_counter() at if_inc_counter+0x15/frame 0xfffffe00004de7f0
      if_simloop() at if_simloop+0xd1/frame 0xfffffe00004de830
      pim_input() at pim_input+0x409/frame 0xfffffe00004de890
      encap_input() at encap_input+0xd1/frame 0xfffffe00004de900
      encap4_input() at encap4_input+0x28/frame 0xfffffe00004de930
      ip_input() at ip_input+0x168/frame 0xfffffe00004de9e0
      swi_net() at swi_net+0x12b/frame 0xfffffe00004dea50
      ithread_loop() at ithread_loop+0x23c/frame 0xfffffe00004deab0
      fork_exit() at fork_exit+0x7e/frame 0xfffffe00004deaf0
      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004deaf0
      --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
      

      The second appears to be a crash when attempting to send an IPv6 packet

      Fatal trap 12: page fault while in kernel mode
      cpuid = 0; apic id = 00
      fault virtual address	= 0x0
      fault code		= supervisor read data, page not present
      instruction pointer	= 0x20:0xffffffff8102e56a
      stack pointer	        = 0x28:0xfffffe004ba27550
      frame pointer	        = 0x28:0xfffffe004ba277b0
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 6861 (dpinger)
      trap number		= 12
      panic: page fault
      cpuid = 2
      time = 1594204579
      KDB: enter: panic
      
      db:0:kdb.enter.default>  bt
      Tracing pid 6861 tid 100357 td 0xfffff8011dd1f000
      kdb_enter() at kdb_enter+0x37/frame 0xfffffe004ba27210
      vpanic() at vpanic+0x197/frame 0xfffffe004ba27260
      panic() at panic+0x43/frame 0xfffffe004ba272c0
      trap_fatal() at trap_fatal+0x391/frame 0xfffffe004ba27320
      trap_pfault() at trap_pfault+0x4f/frame 0xfffffe004ba27370
      trap() at trap+0x286/frame 0xfffffe004ba27480
      calltrap() at calltrap+0x8/frame 0xfffffe004ba27480
      --- trap 0xc, rip = 0xffffffff8102e56a, rsp = 0xfffffe004ba27550, rbp = 0xfffffe004ba277b0 ---
      ip6_output() at ip6_output+0xd3a/frame 0xfffffe004ba277b0
      rip6_output() at rip6_output+0x507/frame 0xfffffe004ba27980
      rip6_send() at rip6_send+0x10d/frame 0xfffffe004ba279e0
      sosend_generic() at sosend_generic+0x4ca/frame 0xfffffe004ba27a90
      sosend() at sosend+0x50/frame 0xfffffe004ba27ac0
      kern_sendit() at kern_sendit+0x19d/frame 0xfffffe004ba27b60
      sendit() at sendit+0x19c/frame 0xfffffe004ba27bb0
      sys_sendto() at sys_sendto+0x4d/frame 0xfffffe004ba27c00
      amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe004ba27d30
      fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe004ba27d30
      --- syscall (133, FreeBSD ELF64, sys_sendto), rip = 0x8003cde0a, rsp = 0x7fffdfdfcf48, rbp = 0x7fffdfdfcf90 ---
      

      I suppose it's possible they are both related to routing in some way but there isn't any solid evidence pointing to that specifically.

      Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      1 Reply Last reply Reply Quote 0
      • JeGrJ
        JeGr LAYER 8 Moderator
        last edited by

        @jimp said in Crash when switching interface OFF and ON again:

        The second appears to be a crash when attempting to send an IPv6 packet

        That would've been mine. And yes that always/very often happens when I edit any IP6 related settings on WAN or other VLANs. Started only when upgrading to 2.5dev to test those upnp changes and help testing though, with 2.4.5 had no such problems.

        Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

        If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

        1 Reply Last reply Reply Quote 0
        • JeGrJ
          JeGr LAYER 8 Moderator
          last edited by

          @jimp also had 3-4 crashes in a week again. More than the whole last year with 2.4.x stable. Something is definitely off or making trouble driver-vise. And it's always something interface (maybe v6) related as it is every time my ISP does some "work" (meh) and the internet's gone.

          Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

          If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

          1 Reply Last reply Reply Quote 0
          • L
            louis2
            last edited by

            @JeGR , @jimp,

            In the past month my system was crashing like hel as soon if was even looking at the switch interface "off and on" button. Since a couple of days that seems a bit better, but it is not OK yet.

            I still can easily provoke a crash, but it is more in specific situations now, where switching an interface was almost a certain crash in the past 2 month.

            Note that:

            • I have native IPV6 so that can be related!
            • there are also multicast related issues, which are causing interface crashes

            I desperately tried to get PIMD to work, simply impossible on my system/configuration, but that also has to do with interfaces and vifs:

            • switching off and on the corresponding vip ==> crash
            • interfaces not being process correctly: "Jul 22 19:19:51 pfSense pimd[35276]: /var/etc/pimd/pimd.conf:7 - Invalid phyint address 'ix1.116'

            These issues are probably not caused by the pfSense software itself, but it definitively affect it. There are probably things in the FreeBSD-kernel not 100% OK,

            I did generate a FreeBSD-bug report (248103). I can see the wrong behavoir, however

            I do not have testing facilities, debug kernels etc, to tell what is technically exactly wrong (which module, which function call).

            So, where I am hoping for! is that Netgate having far better testing facilities and the skilled engineers, are willing to point more exact to the โ€œdeeper down problem cause", so that the FreeBSD engineers can fix it!

            Quite sure that there is at least one sevire bug related to interface processing down there in the kernel (interrupt handling, iflib, multicast routing or so).

            Below a small part of a crash report I often see.

            Louis
            PS note that I amj using an amd64 system and the interfaces I am swithing off and on are all intel igb or ix based, and all using the iflib driver type.

            Fatal trap 12: page fault while in kernel mode
            cpuid = 1; apic id = 01
            fault virtual address = 0x1000
            fault code = supervisor write data, page not present
            instruction pointer = 0x20:0xffffffff80e934f5
            stack pointer = 0x28:0xfffffe00004de7f0
            frame pointer = 0x28:0xfffffe00004de7f0
            code segment = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
            processor eflags = interrupt enabled, resume, IOPL = 0
            current process = 12 (swi1: netisr 2)
            trap number = 12
            panic: page fault
            cpuid = 1
            time = 1594920087
            KDB: enter: panic

            1 Reply Last reply Reply Quote 0
            • L
              louis2
              last edited by

              This post is deleted!
              1 Reply Last reply Reply Quote 0
              • w0wW
                w0w
                last edited by

                I don't think it's related to IPv6, but to RADIX_MPATH option.
                https://forum.netgate.com/topic/150986/pf_test-kif-null-if_xname-on-multi-wan-and-reset-all-states-if-wan-ip-address-changes/33
                I've tested different variations and also disabled IPv6, nothing.
                My current crashes are:

                Filename: /var/crash/info.0
                Dump header from device: /dev/ada1p3
                  Architecture: amd64
                  Architecture Version: 4
                  Dump Length: 74752
                  Blocksize: 512
                  Compression: none
                  Dumptime: Fri Jul 10 22:17:47 2020
                  Hostname: -
                  Magic: FreeBSD Text Dump
                  Version String: FreeBSD 12.1-STABLE df4360fdf61(devel-12) pfSense
                  Panic String: sleeping thread
                  Dump Parity: 4138356775
                  Bounds: 0
                  Dump Status: good
                
                Filename: /var/crash/info.1
                Dump header from device: /dev/ada1p3
                  Architecture: amd64
                  Architecture Version: 4
                  Dump Length: 157696
                  Blocksize: 512
                  Compression: none
                  Dumptime: Fri Jul 10 22:28:12 2020
                  Hostname: -
                  Magic: FreeBSD Text Dump
                  Version String: FreeBSD 12.1-STABLE df4360fdf61(devel-12) pfSense
                  Panic String: page fault
                  Dump Parity: 30683997
                  Bounds: 1
                  Dump Status: good
                
                Filename: /var/crash/info.2
                Dump header from device: /dev/ada1p3
                  Architecture: amd64
                  Architecture Version: 4
                  Dump Length: 157696
                  Blocksize: 512
                  Compression: none
                  Dumptime: Tue Jul 21 13:41:55 2020
                  Hostname: -
                  Magic: FreeBSD Text Dump
                  Version String: FreeBSD 12.1-STABLE fcbc0f88231(devel-12) pfSense
                  Panic String: general protection fault
                  Dump Parity: 1753744959
                  Bounds: 2
                  Dump Status: good
                
                Filename: /var/crash/info.3
                Dump header from device: /dev/ada1p3
                  Architecture: amd64
                  Architecture Version: 4
                  Dump Length: 157696
                  Blocksize: 512
                  Compression: none
                  Dumptime: Tue Jul 21 13:58:07 2020
                  Hostname: -
                  Magic: FreeBSD Text Dump
                  Version String: FreeBSD 12.1-STABLE cf48cd75cf5(devel-12) pfSense
                  Panic String: page fault
                  Dump Parity: 680143114
                  Bounds: 3
                  Dump Status: good
                
                Filename: /var/crash/info.4
                Dump header from device: /dev/ada1p3
                  Architecture: amd64
                  Architecture Version: 4
                  Dump Length: 75776
                  Blocksize: 512
                  Compression: none
                  Dumptime: Wed Jul 22 12:01:14 2020
                  Hostname: -
                  Magic: FreeBSD Text Dump
                  Version String: FreeBSD 12.1-STABLE cf48cd75cf5(devel-12) pfSense
                  Panic String: general protection fault
                  Dump Parity: 387144248
                  Bounds: 4
                  Dump Status: good
                

                I have ix card also. Played with drivers some time ago. Nothing. Since I've backup firewall I've moved backup to 2.5 and primary to 2.4. The fatal errors are still coming only on 2.5 version even on absolutely different hardware.

                1 Reply Last reply Reply Quote 0
                • JeGrJ
                  JeGr LAYER 8 Moderator
                  last edited by

                  Then it may be another problem for you @w0w as my appliance uses igb NICs and it's crash-dumping immediatly if something is triggering the NIC like an "internet down" or "interface down" on the WAN. The last few weeks if my ISP has problems and reboots my front router (AVM box) I can wait for the "beep" of my pfSense box shortly after as it crashdumps and reboots.

                  Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

                  If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                  w0wW 1 Reply Last reply Reply Quote 0
                  • w0wW
                    w0w @JeGr
                    last edited by

                    @JeGr
                    No, my backup firewall uses igb and moreover, I have tested em card, same crashes. I do not think it is hardware related, I am pretty sure it is not.

                    1 Reply Last reply Reply Quote 0
                    • L
                      louis2
                      last edited by

                      @w0w said in Crash when switching interface OFF and ON again:

                      @JeGr
                      No, my backup firewall uses igb and moreover, I have tested em card, same crashes. I do not think it is hardware related, I am pretty sure it is not.

                      We simply do not know what is causing the crashes. Can be one bug very deep down or multiple bugs.

                      However be aware that FreeBSD is in FreeBSD12 redesigning its driver structure.

                      Where each interface type used to have its own driver, they are transforming that in the direction of "a mini driver per device type" and doing the upperlayer driver things in a new library called "iflib"

                      I know that on my intel64 system all three IO-ports em, igb and ix are using this principle / the "iflib".

                      Louis

                      1 Reply Last reply Reply Quote 0
                      • w0wW
                        w0w
                        last edited by

                        @louis2 Yep. But those or similar crashes you have now, I have had when 2.5 was based on 12.0-RELEASE using standard driver. The situation was exactly the same, any movements like up/down, unplugging-replugging interface caused the crash, not always and sometimes it was immediately, sometimes not. After some trial and error, googling, digging and so on I think now that it is radix_mpath option and maybe this feature is not buggy by itself but it shows some system errors that never otherwise appear. The only way I see to test it is to compile the kernel without radix_mpath option enabled.

                        1 Reply Last reply Reply Quote 0
                        • L
                          louis2
                          last edited by

                          @w0w said in Crash when switching interface OFF and ON again:

                          radix_mpath

                          radix_mpath is related to multipath rooting. I did not dig into that, but it sounds like route redundancy, something I am not using (what not necessary implies that I am not affected).

                          I also noticed that there is already a pfSense feature request, noting that there are known issues inside radix_mpath (Feature #9544).
                          However my personal feeling is that in my case it is more likely to expect the problem in some multicast module or in some generic interface handler.

                          If โ€ฆ. I just had an "in circuit emulator" here attached to my system / your system and all sources, we would know within hours, ..... since most issues are reproduceable ....... but of course we do not have that kind of very advanced tooling.

                          If some one would compile a pfSence version having a lot of extra debug statements .... that would probably help as well ....

                          Louis

                          1 Reply Last reply Reply Quote 0
                          • w0wW
                            w0w
                            last edited by

                            May be @rschell can help, at least with pfSense version compiled without radix_mpath. ๐Ÿ˜Š
                            Unfortunately, my knowledge is not enough to build the kernel myself.

                            1 Reply Last reply Reply Quote 0
                            • L
                              louis2
                              last edited by

                              @w0w, @rschell

                              I did open bug at FreeBSD Bugzilla โ€“ Bug 248243
                              https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248243

                              It would help if you support me there by entering info about the issues you noticed (and perhaps your verdict). Of cause supported by facts as far as possible.

                              Louis

                              ? 1 Reply Last reply Reply Quote 0
                              • ?
                                A Former User @louis2
                                last edited by

                                @louis2 I have a 2.5 development version without the RADIX_MPATH option enabled (no additional debug statements) that I could share with you. You have a preference ISO or memstick?

                                w0wW 1 Reply Last reply Reply Quote 0
                                • L
                                  louis2
                                  last edited by

                                  @rschell, @w0w,

                                  I am not sure RADIX_MPATH is casing the trouble for me, at least not if I do interpret the reason RADIX_MPATH correctly (I do not use multiple paths)

                                  However to be sure I could try a memstick version, if it is based on a very recent generic 2.5 build.

                                  What I really would love to have !!! , is a pfSense debug version tracking all communication between pimd and the freebsd kernel and saving that with timestamps in a file which is not lost in case of a crash.

                                  Louis

                                  1 Reply Last reply Reply Quote 0
                                  • w0wW
                                    w0w @A Former User
                                    last edited by

                                    @rschell
                                    Memstick would be great. Thanks!

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.