Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Kernel Panic on Temporarily Disable CARP with ixgbe driver

    Scheduled Pinned Locked Moved 2.3-RC Snapshot Feedback and Issues - ARCHIVED
    12 Posts 2 Posters 3.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fastpilot
      last edited by

      Hi,

      I've setup a couple of new pfSense boxes and am hitting a kernel panic each time I click on the Temporarily Disable CARP button on the primary while it's under load. I'm running iperf3 on two machines (one on LAN side, other on the WAN side) with approximately a 1.8Gbps flow going in each direction. The panic doesn't occur when I'm not load testing. Panic seems to be related to the ix driver.

      Hardware is HP DL360 Gen9 with HP-561T (2 port - Intel X540-T2) 10 Gbps PCI NIC, and HP 331i (4 port - Broadcom 5719 chip) onboard 1Gbps NIC. The Intel ix driver is configured on the WAN side in a dual-port LACP LAGG, and the broadcom bge driver is configured on the LAN side in a dual-port LACP LAGG. Running the latest BIOS and NIC firmware from HP.

      BIOS configuration includes running in legacy BIOS mode, disabled x2APIC, power profile and regulator in OS Control Mode with powerd running (crashes with or without powerd running and with or without BIOS managed power profile).

      Only sysctl's that I've changed are:

      • Set kern.ipc.nmbclusters to 131072

      • Set kern.ipc.nmbjumbop to 524288

      I've submitted a crash report today at approx. 12:30 EST (UTC-4) from WAN IP of 216.220.x.x (running in my lab, may appear as 207.34.x.x).

      Update: tried with latest 2.3-BETA from today, same problem. Also found this which is a similar error, https://forum.pfsense.org/index.php?topic=55433.0

      Can anyone help me fix this? Thanks!

      Fatal trap 12: page fault while in kernel mode
      cpuid = 2; Fatal trap 12: page fault while in kernel mode
      apic id = 04
      cpuid = 4; apic id = 08
      fault virtual address	= 0x378
      fault virtual address	= 0x378
      fault code		= supervisor read data, page not present
      fault code		= supervisor read data, page not present
      instruction pointer	= 0x20:0xffffffff80abf3d9
      stack pointer	        = 0x28:0xfffffe00003cb740
      instruction pointer	= 0x20:0xffffffff80abf3d9
      stack pointer	        = 0x28:0xfffffe00003df740
      frame pointer	        = 0x28:0xfffffe00003cb7d0
      frame pointer	        = 0x28:0xfffffe00003df7d0
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, processor eflags	= IOPL = 0
      interrupt enabled, resume, IOPL = 0
      current process		= 12 (irq267: ix0:que 2)
      c[ thread pid 12 tid 100047 ]
      Stopped at      __rw_rlock+0x1c9:       movl    0x378(%r14),%eax
      
      db:0:kdb.enter.default>  show pcpu
      cpuid        = 3
      dynamic pcpu = 0xfffffe00d7127780
      curthread    = 0xfffff80003649000: pid 12 "irq275: ix1:que 3"
      curpcb       = 0xfffffe0061266b80
      fpcurthread  = none
      idlethread   = 0xfffff80003388000: tid 100006 "idle: cpu3"
      curpmap      = 0xffffffff82182058
      tssp         = 0xffffffff8219d148
      commontssp   = 0xffffffff8219d148
      rsp0         = 0xfffffe0061266b80
      gs32p        = 0xffffffff8219eba0
      ldt          = 0xffffffff8219ebe0
      tss          = 0xffffffff8219ebd0
      db:0:kdb.enter.default>  bt
      Tracing pid 12 tid 100063 td 0xfffff80003649000
      __rw_rlock() at __rw_rlock+0x1c9/frame 0xfffffe00612667d0
      carp_forus() at carp_forus+0x49/frame 0xfffffe0061266800
      ether_nh_input() at ether_nh_input+0x2cc/frame 0xfffffe0061266860
      netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00612668d0
      ixgbe_rxeof() at ixgbe_rxeof+0x618/frame 0xfffffe0061266990
      ixgbe_msix_que() at ixgbe_msix_que+0xbe/frame 0xfffffe00612669e0
      intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe0061266a20
      ithread_loop() at ithread_loop+0x96/frame 0xfffffe0061266a70
      fork_exit() at fork_exit+0x9a/frame 0xfffffe0061266ab0
      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0061266ab0
      --- trap 0, rip = 0, rsp = 0xfffffe0061266b70, rbp = 0 ---
      
      1 Reply Last reply Reply Quote 0
      • PerforadoP
        Perforado Rebel Alliance
        last edited by

        …
        fault code = supervisor read data, page not present
        ...

        Could be an use-after-free.

        pfsense 2.2.6 is FreeBSD 10.1-RELEASE-p25 which has mainly security fixes from the 10-Branch but is missing some other fixes.

        https://svnweb.freebsd.org/base?view=revision&revision=277625 should fix it.

        Could someone of the pfsense-folks check if that's in their tree?

        1 Reply Last reply Reply Quote 0
        • F
          fastpilot
          last edited by

          @Perforado:

          Could be an use-after-free.

          pfsense 2.2.6 is FreeBSD 10.1-RELEASE-p25 which has mainly security fixes from the 10-Branch but is missing some other fixes.

          https://svnweb.freebsd.org/base?view=revision&revision=277625 should fix it.

          Yes, that's what I thought. Although that patch appears to be in the devel branch https://github.com/pfsense/FreeBSD-src/commit/f72184af7f1b19f99893f951a64a22f22ec344ba. I tried a beta build of 2.3 last week and same problem. Are the beta snapshots taken off the devel branch?

          1 Reply Last reply Reply Quote 0
          • PerforadoP
            Perforado Rebel Alliance
            last edited by

            I guess it is.

            Can you ssh into the pfsense and do an "uname -a" on the shell?

            1 Reply Last reply Reply Quote 0
            • F
              fastpilot
              last edited by

              @Perforado:

              Can you ssh into the pfsense and do an "uname -a" on the shell?

              FreeBSD <redacted> 10.2-STABLE FreeBSD 10.2-STABLE #317 58b7eab(devel): Fri Jan 15 04:28:46 CST 2016     root@pfs23-amd64-builder:/usr/home/pfsense/pfsense/tmp/obj/usr/home/pfsense/pfsense/tmp/FreeBSD-src/sys/pfSense  amd64</redacted>
              
              1 Reply Last reply Reply Quote 0
              • PerforadoP
                Perforado Rebel Alliance
                last edited by

                https://github.com/pfsense/FreeBSD-src/commit/f72184af7f1b19f99893f951a64a22f22ec344ba#diff-2a75ab8f3cf1e4838de5abd9c14a1870

                seems to be in there. If thats the tree the beta is built from.

                1 Reply Last reply Reply Quote 0
                • F
                  fastpilot
                  last edited by

                  Yeah, it looks like it is. The commit hash in uname (58b7eab) is from the devel branch.

                  As I mentioned earlier, this sounds similar to https://forum.pfsense.org/index.php?topic=55433.0, which wasn't actually solved, just worked around by using a different NIC. I've traced the code back from carp_forus() which attemps to grab the lock, but going back to the ixgbe driver it just gets too complicated for me and I haven't managed to find what may be freeing the ifp pointer https://github.com/pfsense/FreeBSD-src/blob/945ed01c4bae06169f63978e43029c04d4abd731/sys/netinet/ip_carp.c#L1126.

                  1 Reply Last reply Reply Quote 0
                  • F
                    fastpilot
                    last edited by

                    I should add that ether_input() does check if ifp isn't a NULL pointer, but maybe there's a race condition here where something else is clearing it. https://github.com/pfsense/FreeBSD-src/blob/5aba7ffcfb97d9b6f4ce464de77b02ad4d7b8ad3/sys/net/if_ethersubr.c#L628.

                    1 Reply Last reply Reply Quote 0
                    • PerforadoP
                      Perforado Rebel Alliance
                      last edited by

                      Did you try

                      hw.pci.enable_msix=0

                      btw?

                      1 Reply Last reply Reply Quote 0
                      • F
                        fastpilot
                        last edited by

                        @Perforado:

                        Did you try

                        hw.pci.enable_msix=0

                        Yep, that worked! What's the impact of disabling MSI-X though? Would be nice to not have to disable MSI-X and get to the bottom of the bug.

                        1 Reply Last reply Reply Quote 0
                        • PerforadoP
                          Perforado Rebel Alliance
                          last edited by

                          MSI-X is an extension to MSI which afaik implements separate capabilty structure, offers more vectors:

                          https://en.wikipedia.org/wiki/Message_Signaled_Interrupts

                          Now that it works without MSI-X you could try a different Slot for the ixgbe-card (HP should have a best practice document for that)

                          And you could try to update the Servers Bios. Maybe MSI-X Setup is somewhat borked.

                          1 Reply Last reply Reply Quote 0
                          • F
                            fastpilot
                            last edited by

                            @Perforado:

                            Now that it works without MSI-X you could try a different Slot for the ixgbe-card (HP should have a best practice document for that)

                            And you could try to update the Servers Bios. Maybe MSI-X Setup is somewhat borked.

                            Server BIOS is up to date, running the latest release from HP which came out last month. I'll see if I can try a different slot. On a somewhat related note, I had to disable x2APIC in the BIOS for the machine to boot. Not sure if that's a BIOS or FreeBSD issue.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.