Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Proper procedure for adding a NIC kernel module? (qlnxe)

    Scheduled Pinned Locked Moved Hardware
    20 Posts 3 Posters 1.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G
      GeorgePatches
      last edited by

      I have a server with a Qlogic FastLinQ 41000. For some reason it was not detected out of the box when I initially installed CE 2.7.0. I setup our network using a LAGG of 2 Intel I350 instead and got things running. Fast forward a couple months to now on CE 2.7.2 and I'm looking into why the Qlogic card isn't being detected. Check pciconf and it's there at the bottom as a "none" device. Do some googling and find the qlnxe driver (https://man.freebsd.org/cgi/man.cgi?query=qlnxe&apropos=0&sektion=4&manpath=FreeBSD+14.0-RELEASE&arch=default&format=html). Looks like I just need a kernel module to get this thing going. "kldload if_qlnxe" to see if it gets detected, but instead it crashes with a page fault. More googling and I found this page on the forum (https://forum.netgate.com/post/1037980) and it says I need to create a loader.conf.local after install, before reboot.

      My question boils down to did I cause the crash by using kldload after boot, or is there a greater issue with my system and the qlnxe module? Is there a specific procedure for adding a module like this?

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Did it fail to load the module when you ran it from the CLI?

        If you run kldstat you should see it loaded.

        What was the crash after adding the loader value? Do you have a crash report?

        Any reason you're not running 2.7.2?

        Steve

        G 2 Replies Last reply Reply Quote 0
        • G
          GeorgePatches @stephenw10
          last edited by

          @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

          Did it fail to load the module when you ran it from the CLI?

          If you run kldstat you should see it loaded.

          I think it started to load it and crashed. Locked up such that it wouldn't accept any keyboard input to check kldstat. We power cycled the machine which cleared it out and everything came back up as before.

          @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

          What was the crash after adding the loader value? Do you have a crash report?

          I have the crash report, just wanted to know if "that should have worked" before I bothered people with it.

          @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

          Any reason you're not running 2.7.2?

          We are, but we started at 2.7.0 and have upgraded a few times. Just want to include that in case the upgrade path did something weird.

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.

            G 1 Reply Last reply Reply Quote 0
            • G
              GeorgePatches @stephenw10
              last edited by

              @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

              Do you have a crash report?

              info.0 textdump.tar.0

              1 Reply Last reply Reply Quote 0
              • G
                GeorgePatches @stephenw10
                last edited by

                @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

                Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.

                I think the problem is at the ??(), That seems like a weird function name to me.

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Yup so that definitely crashed trying to attach the driver:

                  db:0:kdb.enter.default>  bt
                  Tracing pid 55563 tid 116689 td 0xfffffe0382ce93a0
                  kdb_enter() at kdb_enter+0x32/frame 0xfffffe03c0298300
                  vpanic() at vpanic+0x163/frame 0xfffffe03c0298430
                  panic() at panic+0x43/frame 0xfffffe03c0298490
                  trap_fatal() at trap_fatal+0x40c/frame 0xfffffe03c02984f0
                  trap_pfault() at trap_pfault+0x4f/frame 0xfffffe03c0298550
                  calltrap() at calltrap+0x8/frame 0xfffffe03c0298550
                  --- trap 0xc, rip = 0, rsp = 0xfffffe03c0298628, rbp = 0xfffffe03c0298650 ---
                  ??() at 0/frame 0xfffffe03c0298650
                  dump_iface() at dump_iface+0x145/frame 0xfffffe03c0298700
                  rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe03c0298780
                  if_attach_internal() at if_attach_internal+0x3cf/frame 0xfffffe03c02987d0
                  ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe03c0298810
                  qlnx_init_ifnet() at qlnx_init_ifnet+0x2c6/frame 0xfffffe03c0298860
                  qlnx_pci_attach() at qlnx_pci_attach+0x7d9/frame 0xfffffe03c0298900
                  device_attach() at device_attach+0x3be/frame 0xfffffe03c0298950
                  device_probe_and_attach() at device_probe_and_attach+0x41/frame 0xfffffe03c0298980
                  pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe03c02989c0
                  devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe03c0298a00
                  devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe03c0298a40
                  module_register_init() at module_register_init+0x85/frame 0xfffffe03c0298a70
                  linker_load_module() at linker_load_module+0xbd5/frame 0xfffffe03c0298d70
                  kern_kldload() at kern_kldload+0x16a/frame 0xfffffe03c0298dd0
                  sys_kldload() at sys_kldload+0x5c/frame 0xfffffe03c0298e00
                  amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe03c0298f30
                  fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe03c0298f30
                  --- syscall (304, FreeBSD ELF64, kldload), rip = 0x183cac2d58aa, rsp = 0x183caa53f3e8, rbp = 0x183caa53f960 ---
                  
                  ql0: <Qlogic 10GbE/25GbE/40GbE PCI CNA (AH) Adapter-Ethernet Function v2.0.112> mem 0xfb820000-0xfb83ffff,0xfb000000-0xfb7fffff,0xfb850000-0xfb85ffff at device 0.0 numa-domain 1 on pci10
                  ql0: qlnx_set_personality: ETH_IWARP
                  ql0: setting parameters required by iWARP dev
                  
                  
                  Fatal trap 12: page fault while in kernel mode
                  cpuid = 23; apic id = 34
                  fault virtual address	= 0x0
                  fault code		= supervisor read instruction, page not present
                  instruction pointer	= 0x20:0x0
                  stack pointer	        = 0x0:0xfffffe03c0298628
                  frame pointer	        = 0x0:0xfffffe03c0298650
                  code segment		= base 0x0, limit 0xfffff, type 0x1b
                  			= DPL 0, pres 1, long 1, def32 0, gran 1
                  processor eflags	= interrupt enabled, resume, IOPL = 0
                  current process		= 55563 (kldload)
                  rdi: fffff815bd17b800 rsi: fffffe03c02986a0 rdx: 00000000c0306938
                  rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000010
                  rax: 0000000000000000 rbx: fffffe03c02986a0 rbp: fffffe03c0298650
                  r10: 0000000000000000 r11: fffffe00e6ce8000 r12: 0000000000008802
                  r13: fffff81081a15810 r14: fffffe03b48fcf90 r15: 0000000000000016
                  trap number		= 12
                  panic: page fault
                  cpuid = 23
                  time = 1717620079
                  KDB: enter: panic
                  

                  That doesn't appear to be a known bug: https://bugs.freebsd.org/bugzilla/buglist.cgi?quicksearch=qlnxe

                  G K 2 Replies Last reply Reply Quote 0
                  • G
                    GeorgePatches @stephenw10
                    last edited by

                    @stephenw10 🙃 I swear I'm an edge case magnetic.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      What is that NIC exactly?

                      G 2 Replies Last reply Reply Quote 0
                      • G
                        GeorgePatches @stephenw10
                        last edited by

                        @stephenw10 Exactly? I'm not sure Qlogic FastLinQ 41000 series 2 port SFP. It's a QL41132HLCU, QL41212HLCU, or QL41262HLCU going by the Qlogic datasheet. I'm betting the QL41132HLCU as we wanted 10G cards and the other 2 models are 10G/25G cards. I'll need to dig in the firmware or the purchase orders to figure it out exactly. I will get back to you.

                        Sounds like this is a FreeBSD issue and nothing weird I did at least. Any idea why this wasn't detected on the initial install?

                        1 Reply Last reply Reply Quote 0
                        • G
                          GeorgePatches @stephenw10
                          last edited by

                          @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

                          What is that NIC exactly?

                          My speculation was correct, it is a Qlogic FastlinQ QL41132HLCU exactly.

                          1 Reply Last reply Reply Quote 0
                          • K
                            kprovost @stephenw10
                            last edited by

                            @stephenw10 I've not done any detailed digging, but there's been at least one bug fix in dump_iface() not too long ago to fix similar crashes:

                            commit 7d48224073ce14f0dd3db2d4e96876ac928b52f2
                            Author: Bjoern A. Zeeb <bz@FreeBSD.org>
                            Date:   Sat Sep 30 15:11:57 2023 +0000
                            
                                netlink: fix accessing freed memory
                            
                                The check for if_addrlen in dump_iface() is not sufficient to determine
                                if we still have a valid if_addr.  Rather than directly accessing if_addr
                                check the STAILQ (for the first entry).
                                This avoids panics when destroying cloned interfaces as experienced with
                                net80211 wlan ones.
                            
                                Sponsored by:   The FreeBSD Foundation
                                MFC after:      3 days
                                Reviewed by:    jhibbits (earlier version), kp
                                Differential Revision: https://reviews.freebsd.org/D42027
                            

                            It's certainly worth testing a 2.8 snapshot before we dig deeper.

                            G 1 Reply Last reply Reply Quote 1
                            • G
                              GeorgePatches @kprovost
                              last edited by

                              @kprovost said in Proper procedure for adding a NIC kernel module? (qlnxe):

                              It's certainly worth testing a 2.8 snapshot before we dig deeper.

                              Would that fix be in the latest PF+? This is a production machine with lots of work happening, but I'm poking my management chain about paying for support.

                              K 1 Reply Last reply Reply Quote 0
                              • K
                                kprovost @GeorgePatches
                                last edited by

                                @GeorgePatches That particular patch is in 24.03, yes.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by stephenw10

                                  Hmm, I wonder if we can do something to avoid that bug as a test. 🤔

                                  G 1 Reply Last reply Reply Quote 0
                                  • G
                                    GeorgePatches @stephenw10
                                    last edited by

                                    @stephenw10 Hmmmmm, a thought is that it blew up on the dummynet code. I can try ripping the limiters out and see it doesn't blow up.

                                    G 1 Reply Last reply Reply Quote 0
                                    • G
                                      GeorgePatches @GeorgePatches
                                      last edited by

                                      This thought was wrong, it blew up exactly the same without limiters and the dummynet modules not loaded. 🤣

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        Well one thing ruled out I guess!

                                        1 Reply Last reply Reply Quote 0
                                        • G
                                          GeorgePatches
                                          last edited by GeorgePatches

                                          There's no easy way to like try a 2.8 snap and then roll back to 2.7.2, right? You can do that with PF+, if I understand the bootloader thing correctly?

                                          I ask because management has approved our initial request for a support contract. We're currently waiting on a quote and then actual approval and purchasing. I'm ok putting a pin in this until it's easier to test a snap and roll back. This card is a nice to have, we're currently "doing fine" with our LAGG'd gigabit links.

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            You can manually create ZFS snapshots at the CLI in CE, assuming you are running ZFS. However there are no public 2.8-dev snapshots yet.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.