Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    New Kernel Panic ... on Boot. pimd and/or interface-change related?

    Scheduled Pinned Locked Moved Virtualization
    8 Posts 3 Posters 804 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • MrPeteM
      MrPete
      last edited by MrPete

      Here's a slightly strange one.

      I've been running pfSense stably for months now in a "new" context: as VM in ProxMox.

      I'm using CARP, and my Backup box is working fine.

      My primary box is suddenly crashing just about on boot (not quite... most packages etc get configured. The crash generally is when it's about to "go live". Or... it runs for 1-2 hours then crashes. Never more than that for the last 18 hours.

      Of course I'd like any ideas on diagnosing.
      Here's my question and thought:

      • What changed: I was trying to get SR-IOV virtual NICs to work, and failed, so I undid that config in Proxmox (no changes to pfSense).
      • The result of that: the PCI bus for my third (CARP/HA) interface got shifted from 00:02... to 00:03...

      QUESTION: Is it possible that pfSense has some internal...something, that doesn't like a changed PCI bus number for an interface?

      All thoughts MOST welcome.
      Pete

      Starting package acme...done.
      <6>ng0: changing name to 'pppoe0'
      <6>stf0: changing name to 'wan_stf'
      
      Fatal trap 12: page fault while in kernel mode
      cpuid = 2; apic id = 02
      fault virtual address	= 0x2000
      fault code		= supervisor write data, page not present
      instruction pointer	= 0x20:0xffffffff80eafeb5
      stack pointer	        = 0x28:0xfffffe00020995a0
      frame pointer	        = 0x28:0xfffffe00020995a0
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 12 (irq263: virtio_pci3)
      trap number		= 12
      panic: page fault
      cpuid = 2
      time = 1645732301
      KDB: enter: panic
      

      And a bit of the panic dump:

      Tracing pid 12 tid 100117 td 0xfffff8003004d000
      kdb_enter() at kdb_enter+0x37/frame 0xfffffe0002099260
      vpanic() at vpanic+0x197/frame 0xfffffe00020992b0
      panic() at panic+0x43/frame 0xfffffe0002099310
      trap_fatal() at trap_fatal+0x391/frame 0xfffffe0002099370
      trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00020993c0
      trap() at trap+0x286/frame 0xfffffe00020994d0
      calltrap() at calltrap+0x8/frame 0xfffffe00020994d0
      --- trap 0xc, rip = 0xffffffff80eafeb5, rsp = 0xfffffe00020995a0, rbp = 0xfffffe00020995a0 ---
      if_inc_counter() at if_inc_counter+0x15/frame 0xfffffe00020995a0
      if_simloop() at if_simloop+0xd1/frame 0xfffffe00020995e0
      pim_input() at pim_input+0x409/frame 0xfffffe0002099640
      encap_input() at encap_input+0xd1/frame 0xfffffe00020996b0
      encap4_input() at encap4_input+0x28/frame 0xfffffe00020996e0
      ip_input() at ip_input+0x168/frame 0xfffffe0002099790
      netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00020997e0
      ether_demux() at ether_demux+0x16a/frame 0xfffffe0002099810
      ether_nh_input() at ether_nh_input+0x330/frame 0xfffffe0002099870
      netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe00020998c0
      ether_input() at ether_input+0x4b/frame 0xfffffe00020998f0
      vlan_input() at vlan_input+0x1f3/frame 0xfffffe0002099940
      ether_demux() at ether_demux+0x153/frame 0xfffffe0002099970
      ether_nh_input() at ether_nh_input+0x330/frame 0xfffffe00020999d0
      netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe0002099a20
      ether_input() at ether_input+0x4b/frame 0xfffffe0002099a50
      vtnet_rxq_eof() at vtnet_rxq_eof+0x7a5/frame 0xfffffe0002099b10
      vtnet_rx_vq_process() at vtnet_rx_vq_process+0xb7/frame 0xfffffe0002099b50
      ithread_loop() at ithread_loop+0x23c/frame 0xfffffe0002099bb0
      fork_exit() at fork_exit+0x7e/frame 0xfffffe0002099bf0
      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0002099bf0
      --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
      
      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        pfSense doesn't care about bus locations only the driver names, but that wouldn't cause a panic it would make the interfaces mismatch if that were a problem.

        That panic doesn't look familiar.

        Did you have a snapshot of the VM before you started making changes? Rolling back to an old snapshot would undo any changes made to the VM config as well I believe.

        It could still be related to a change in the VM settings at fault though I've not seen anything in my Proxmox VMs crash like that with any combination of settings I've messed with in the past.

        That or re-create the VM config from scratch (including the MAC addresses) and import the config and see if the panic still happens in a fresh VM.

        Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        MrPeteM 1 Reply Last reply Reply Quote 0
        • jimpJ jimp moved this topic from General pfSense Questions on
        • MrPeteM
          MrPete @jimp
          last edited by

          Thanks, @jimp ...
          I did a full VM restore from before the issues, and yes that guarantees a restoration of per-VM config. The panic above was from that restored VM.

          However, the changed PCI bus is still in place, since that's in "virtual" hardware.

          I just realized, there's a linux host boot-time setting I can remove (pci=assign-busses) that enables the PCI bus shift. I'll try again with that removed to see if it makes any difference.

          MrPeteM 1 Reply Last reply Reply Quote 0
          • MrPeteM
            MrPete @MrPete
            last edited by MrPete

            @jimd --

            1. I have eliminated a few distractions
            • (Intense memory test just in case ;) )

            • Restored PCI bus number

              It still crashes

            1. I then looked at the panic info, and also the logs on my (nicely functioning) backup CARP box...

              I am running pfSense 2.52 stable, with pimd 0.0.3_4

            Observations and Questions:

            • 0.0.3_5 is now out. Apparently this is about ensuring there's no rc start file if PIMD is disabled. I've got it enabled...

            • The log says pimd is starting twice at boot. Clearly that can cause some trouble. I thought this was fixed long before 0.0.3_4?

            • Perhaps more telling: I have not touched my PIMD configuration in quite some time. I notice (in my running backup CARP, and in config.xml) that two VLAN interfaces I deleted a year ago are still listed in the pimd config...

            • ...in fact, looking at a config.xml, those two nonexistent VLAN interfaces are still present in a number of places:

              • active firewall rules
              • several package configs
              • and more

            I notice that this panic seems to involve kernel pim_sm code... which makes me think perhaps that such invalid config info can be a bit more dangerous than one might imagine?

            Happy to do some testing on this.... (Will keep the WAN/LAN port on this box disconnected from everything if possible. I don't want to kill our live connection... Hmm: can I manually from single-user-mode ensure this box doesn't become Primary CARP on boot?)

            (If it would help, I can privately get a config.xml to you, with notes on the two missing interfaces. If deleting a VLAN interface should be "clean", this is a gold mine of MISbSINuG FEgATUsRES ๐Ÿคฃ

            MrPeteM 1 Reply Last reply Reply Quote 0
            • MrPeteM
              MrPete @MrPete
              last edited by

              @jimp Any thoughts on:

              • Whether having invalid VLAN interfaces referenced in a config could cause various trouble, even Kernel panic?
              • Whether it is known/expected that removal of VLAN interfaces in the GUI does not remove them from the various configuration sections?

              I just want to understand how to move forward from here.

              1 Reply Last reply Reply Quote 0
              • P
                potata_netgato
                last edited by potata_netgato

                Hello,
                Similar experience while prototyping pimd with IPSEC and wireguard. I didn't add any VLANs but had added and removed some interfaces. Attaching config file (nothing confidential) and a copy paste of the error log window in pfsense. Let me know if there's something else I can provide. pfsense_crash_pimd.7z

                Fatal trap 12: page fault while in kernel mode
                cpuid = 0; apic id = 00
                fault virtual address = 0x0
                fault code = supervisor write data, page not present
                instruction pointer = 0x20:0xffffffff80ea0fd5
                stack pointer = 0x0:0xfffffe00004d3960
                frame pointer = 0x0:0xfffffe00004d3960
                code segment = base 0x0, limit 0xfffff, type 0x1b
                = DPL 0, pres 1, long 1, def32 0, gran 1
                processor eflags = interrupt enabled, resume, IOPL = 0
                current process = 0 (wg_tqg_0)
                trap number = 12
                panic: page fault
                cpuid = 0
                time = 1646999897
                KDB: enter: panic

                MrPeteM 1 Reply Last reply Reply Quote 0
                • MrPeteM
                  MrPete @potata_netgato
                  last edited by

                  @potata_netgato @jimp
                  I have discussed with the pimd author.

                  He's convinced that, since pimd is a userland app, there ought to be no way his app can cause a kernel panic.

                  The panic is in the "pim" area of the kernel...

                  What this indicates is the kernel is not fully vetting parameters for some kind of system call or setting. I'll try to come up with a bug report for BSD.

                  1 Reply Last reply Reply Quote 0
                  • jimpJ
                    jimp Rebel Alliance Developer Netgate
                    last edited by

                    While that is true that a userland program should not be able to trigger a kernel panic, pimd should still ignore interfaces that don't exist. One doesn't absolve the other of having a bug in this case, they are both doing something they shouldn't.

                    Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                    Need help fast? Netgate Global Support!

                    Do not Chat/PM for help!

                    1 Reply Last reply Reply Quote 1
                    • First post
                      Last post
                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.