• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Proper procedure for adding a NIC kernel module? (qlnxe)

Scheduled Pinned Locked Moved Hardware
20 Posts 3 Posters 1.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    GeorgePatches
    last edited by Jun 6, 2024, 3:22 PM

    I have a server with a Qlogic FastLinQ 41000. For some reason it was not detected out of the box when I initially installed CE 2.7.0. I setup our network using a LAGG of 2 Intel I350 instead and got things running. Fast forward a couple months to now on CE 2.7.2 and I'm looking into why the Qlogic card isn't being detected. Check pciconf and it's there at the bottom as a "none" device. Do some googling and find the qlnxe driver (https://man.freebsd.org/cgi/man.cgi?query=qlnxe&apropos=0&sektion=4&manpath=FreeBSD+14.0-RELEASE&arch=default&format=html). Looks like I just need a kernel module to get this thing going. "kldload if_qlnxe" to see if it gets detected, but instead it crashes with a page fault. More googling and I found this page on the forum (https://forum.netgate.com/post/1037980) and it says I need to create a loader.conf.local after install, before reboot.

    My question boils down to did I cause the crash by using kldload after boot, or is there a greater issue with my system and the qlnxe module? Is there a specific procedure for adding a module like this?

    1 Reply Last reply Reply Quote 0
    • S
      stephenw10 Netgate Administrator
      last edited by Jun 6, 2024, 6:15 PM

      Did it fail to load the module when you ran it from the CLI?

      If you run kldstat you should see it loaded.

      What was the crash after adding the loader value? Do you have a crash report?

      Any reason you're not running 2.7.2?

      Steve

      G 2 Replies Last reply Jun 6, 2024, 6:22 PM Reply Quote 0
      • G
        GeorgePatches @stephenw10
        last edited by Jun 6, 2024, 6:22 PM

        @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

        Did it fail to load the module when you ran it from the CLI?

        If you run kldstat you should see it loaded.

        I think it started to load it and crashed. Locked up such that it wouldn't accept any keyboard input to check kldstat. We power cycled the machine which cleared it out and everything came back up as before.

        @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

        What was the crash after adding the loader value? Do you have a crash report?

        I have the crash report, just wanted to know if "that should have worked" before I bothered people with it.

        @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

        Any reason you're not running 2.7.2?

        We are, but we started at 2.7.0 and have upgraded a few times. Just want to include that in case the upgrade path did something weird.

        1 Reply Last reply Reply Quote 0
        • S
          stephenw10 Netgate Administrator
          last edited by Jun 6, 2024, 6:24 PM

          Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.

          G 1 Reply Last reply Jun 6, 2024, 6:43 PM Reply Quote 0
          • G
            GeorgePatches @stephenw10
            last edited by Jun 6, 2024, 6:25 PM

            @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

            Do you have a crash report?

            info.0 textdump.tar.0

            1 Reply Last reply Reply Quote 0
            • G
              GeorgePatches @stephenw10
              last edited by Jun 6, 2024, 6:43 PM

              @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

              Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.

              I think the problem is at the ??(), That seems like a weird function name to me.

              1 Reply Last reply Reply Quote 0
              • S
                stephenw10 Netgate Administrator
                last edited by Jun 6, 2024, 6:48 PM

                Yup so that definitely crashed trying to attach the driver:

                db:0:kdb.enter.default>  bt
                Tracing pid 55563 tid 116689 td 0xfffffe0382ce93a0
                kdb_enter() at kdb_enter+0x32/frame 0xfffffe03c0298300
                vpanic() at vpanic+0x163/frame 0xfffffe03c0298430
                panic() at panic+0x43/frame 0xfffffe03c0298490
                trap_fatal() at trap_fatal+0x40c/frame 0xfffffe03c02984f0
                trap_pfault() at trap_pfault+0x4f/frame 0xfffffe03c0298550
                calltrap() at calltrap+0x8/frame 0xfffffe03c0298550
                --- trap 0xc, rip = 0, rsp = 0xfffffe03c0298628, rbp = 0xfffffe03c0298650 ---
                ??() at 0/frame 0xfffffe03c0298650
                dump_iface() at dump_iface+0x145/frame 0xfffffe03c0298700
                rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe03c0298780
                if_attach_internal() at if_attach_internal+0x3cf/frame 0xfffffe03c02987d0
                ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe03c0298810
                qlnx_init_ifnet() at qlnx_init_ifnet+0x2c6/frame 0xfffffe03c0298860
                qlnx_pci_attach() at qlnx_pci_attach+0x7d9/frame 0xfffffe03c0298900
                device_attach() at device_attach+0x3be/frame 0xfffffe03c0298950
                device_probe_and_attach() at device_probe_and_attach+0x41/frame 0xfffffe03c0298980
                pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe03c02989c0
                devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe03c0298a00
                devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe03c0298a40
                module_register_init() at module_register_init+0x85/frame 0xfffffe03c0298a70
                linker_load_module() at linker_load_module+0xbd5/frame 0xfffffe03c0298d70
                kern_kldload() at kern_kldload+0x16a/frame 0xfffffe03c0298dd0
                sys_kldload() at sys_kldload+0x5c/frame 0xfffffe03c0298e00
                amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe03c0298f30
                fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe03c0298f30
                --- syscall (304, FreeBSD ELF64, kldload), rip = 0x183cac2d58aa, rsp = 0x183caa53f3e8, rbp = 0x183caa53f960 ---
                
                ql0: <Qlogic 10GbE/25GbE/40GbE PCI CNA (AH) Adapter-Ethernet Function v2.0.112> mem 0xfb820000-0xfb83ffff,0xfb000000-0xfb7fffff,0xfb850000-0xfb85ffff at device 0.0 numa-domain 1 on pci10
                ql0: qlnx_set_personality: ETH_IWARP
                ql0: setting parameters required by iWARP dev
                
                
                Fatal trap 12: page fault while in kernel mode
                cpuid = 23; apic id = 34
                fault virtual address	= 0x0
                fault code		= supervisor read instruction, page not present
                instruction pointer	= 0x20:0x0
                stack pointer	        = 0x0:0xfffffe03c0298628
                frame pointer	        = 0x0:0xfffffe03c0298650
                code segment		= base 0x0, limit 0xfffff, type 0x1b
                			= DPL 0, pres 1, long 1, def32 0, gran 1
                processor eflags	= interrupt enabled, resume, IOPL = 0
                current process		= 55563 (kldload)
                rdi: fffff815bd17b800 rsi: fffffe03c02986a0 rdx: 00000000c0306938
                rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000010
                rax: 0000000000000000 rbx: fffffe03c02986a0 rbp: fffffe03c0298650
                r10: 0000000000000000 r11: fffffe00e6ce8000 r12: 0000000000008802
                r13: fffff81081a15810 r14: fffffe03b48fcf90 r15: 0000000000000016
                trap number		= 12
                panic: page fault
                cpuid = 23
                time = 1717620079
                KDB: enter: panic
                

                That doesn't appear to be a known bug: https://bugs.freebsd.org/bugzilla/buglist.cgi?quicksearch=qlnxe

                G K 2 Replies Last reply Jun 6, 2024, 7:00 PM Reply Quote 0
                • G
                  GeorgePatches @stephenw10
                  last edited by Jun 6, 2024, 7:00 PM

                  @stephenw10 🙃 I swear I'm an edge case magnetic.

                  1 Reply Last reply Reply Quote 0
                  • S
                    stephenw10 Netgate Administrator
                    last edited by Jun 6, 2024, 7:02 PM

                    What is that NIC exactly?

                    G 2 Replies Last reply Jun 6, 2024, 7:26 PM Reply Quote 0
                    • G
                      GeorgePatches @stephenw10
                      last edited by Jun 6, 2024, 7:26 PM

                      @stephenw10 Exactly? I'm not sure Qlogic FastLinQ 41000 series 2 port SFP. It's a QL41132HLCU, QL41212HLCU, or QL41262HLCU going by the Qlogic datasheet. I'm betting the QL41132HLCU as we wanted 10G cards and the other 2 models are 10G/25G cards. I'll need to dig in the firmware or the purchase orders to figure it out exactly. I will get back to you.

                      Sounds like this is a FreeBSD issue and nothing weird I did at least. Any idea why this wasn't detected on the initial install?

                      1 Reply Last reply Reply Quote 0
                      • G
                        GeorgePatches @stephenw10
                        last edited by Jun 6, 2024, 7:30 PM

                        @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

                        What is that NIC exactly?

                        My speculation was correct, it is a Qlogic FastlinQ QL41132HLCU exactly.

                        1 Reply Last reply Reply Quote 0
                        • K
                          kprovost @stephenw10
                          last edited by Jun 6, 2024, 7:58 PM

                          @stephenw10 I've not done any detailed digging, but there's been at least one bug fix in dump_iface() not too long ago to fix similar crashes:

                          commit 7d48224073ce14f0dd3db2d4e96876ac928b52f2
                          Author: Bjoern A. Zeeb <bz@FreeBSD.org>
                          Date:   Sat Sep 30 15:11:57 2023 +0000
                          
                              netlink: fix accessing freed memory
                          
                              The check for if_addrlen in dump_iface() is not sufficient to determine
                              if we still have a valid if_addr.  Rather than directly accessing if_addr
                              check the STAILQ (for the first entry).
                              This avoids panics when destroying cloned interfaces as experienced with
                              net80211 wlan ones.
                          
                              Sponsored by:   The FreeBSD Foundation
                              MFC after:      3 days
                              Reviewed by:    jhibbits (earlier version), kp
                              Differential Revision: https://reviews.freebsd.org/D42027
                          

                          It's certainly worth testing a 2.8 snapshot before we dig deeper.

                          G 1 Reply Last reply Jun 6, 2024, 8:08 PM Reply Quote 1
                          • G
                            GeorgePatches @kprovost
                            last edited by Jun 6, 2024, 8:08 PM

                            @kprovost said in Proper procedure for adding a NIC kernel module? (qlnxe):

                            It's certainly worth testing a 2.8 snapshot before we dig deeper.

                            Would that fix be in the latest PF+? This is a production machine with lots of work happening, but I'm poking my management chain about paying for support.

                            K 1 Reply Last reply Jun 6, 2024, 8:25 PM Reply Quote 0
                            • K
                              kprovost @GeorgePatches
                              last edited by Jun 6, 2024, 8:25 PM

                              @GeorgePatches That particular patch is in 24.03, yes.

                              1 Reply Last reply Reply Quote 0
                              • S
                                stephenw10 Netgate Administrator
                                last edited by stephenw10 Jun 6, 2024, 9:38 PM Jun 6, 2024, 9:38 PM

                                Hmm, I wonder if we can do something to avoid that bug as a test. 🤔

                                G 1 Reply Last reply Jun 7, 2024, 7:56 PM Reply Quote 0
                                • G
                                  GeorgePatches @stephenw10
                                  last edited by Jun 7, 2024, 7:56 PM

                                  @stephenw10 Hmmmmm, a thought is that it blew up on the dummynet code. I can try ripping the limiters out and see it doesn't blow up.

                                  G 1 Reply Last reply Jun 7, 2024, 8:40 PM Reply Quote 0
                                  • G
                                    GeorgePatches @GeorgePatches
                                    last edited by Jun 7, 2024, 8:40 PM

                                    This thought was wrong, it blew up exactly the same without limiters and the dummynet modules not loaded. 🤣

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      stephenw10 Netgate Administrator
                                      last edited by Jun 7, 2024, 8:52 PM

                                      Well one thing ruled out I guess!

                                      1 Reply Last reply Reply Quote 0
                                      • G
                                        GeorgePatches
                                        last edited by GeorgePatches Jun 10, 2024, 4:00 PM Jun 10, 2024, 3:52 PM

                                        There's no easy way to like try a 2.8 snap and then roll back to 2.7.2, right? You can do that with PF+, if I understand the bootloader thing correctly?

                                        I ask because management has approved our initial request for a support contract. We're currently waiting on a quote and then actual approval and purchasing. I'm ok putting a pin in this until it's easier to test a snap and roll back. This card is a nice to have, we're currently "doing fine" with our LAGG'd gigabit links.

                                        1 Reply Last reply Reply Quote 0
                                        • S
                                          stephenw10 Netgate Administrator
                                          last edited by Jun 10, 2024, 4:53 PM

                                          You can manually create ZFS snapshots at the CLI in CE, assuming you are running ZFS. However there are no public 2.8-dev snapshots yet.

                                          1 Reply Last reply Reply Quote 0
                                          20 out of 20
                                          • First post
                                            20/20
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                            This community forum collects and processes your personal information.
                                            consent.not_received