Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Proper procedure for adding a NIC kernel module? (qlnxe)

    Scheduled Pinned Locked Moved Hardware
    20 Posts 3 Posters 1.5k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G Offline
      GeorgePatches @stephenw10
      last edited by

      @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

      Did it fail to load the module when you ran it from the CLI?

      If you run kldstat you should see it loaded.

      I think it started to load it and crashed. Locked up such that it wouldn't accept any keyboard input to check kldstat. We power cycled the machine which cleared it out and everything came back up as before.

      @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

      What was the crash after adding the loader value? Do you have a crash report?

      I have the crash report, just wanted to know if "that should have worked" before I bothered people with it.

      @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

      Any reason you're not running 2.7.2?

      We are, but we started at 2.7.0 and have upgraded a few times. Just want to include that in case the upgrade path did something weird.

      1 Reply Last reply Reply Quote 0
      • stephenw10S Offline
        stephenw10 Netgate Administrator
        last edited by

        Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.

        G 1 Reply Last reply Reply Quote 0
        • G Offline
          GeorgePatches @stephenw10
          last edited by

          @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

          Do you have a crash report?

          info.0 textdump.tar.0

          1 Reply Last reply Reply Quote 0
          • G Offline
            GeorgePatches @stephenw10
            last edited by

            @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

            Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.

            I think the problem is at the ??(), That seems like a weird function name to me.

            1 Reply Last reply Reply Quote 0
            • stephenw10S Offline
              stephenw10 Netgate Administrator
              last edited by

              Yup so that definitely crashed trying to attach the driver:

              db:0:kdb.enter.default>  bt
              Tracing pid 55563 tid 116689 td 0xfffffe0382ce93a0
              kdb_enter() at kdb_enter+0x32/frame 0xfffffe03c0298300
              vpanic() at vpanic+0x163/frame 0xfffffe03c0298430
              panic() at panic+0x43/frame 0xfffffe03c0298490
              trap_fatal() at trap_fatal+0x40c/frame 0xfffffe03c02984f0
              trap_pfault() at trap_pfault+0x4f/frame 0xfffffe03c0298550
              calltrap() at calltrap+0x8/frame 0xfffffe03c0298550
              --- trap 0xc, rip = 0, rsp = 0xfffffe03c0298628, rbp = 0xfffffe03c0298650 ---
              ??() at 0/frame 0xfffffe03c0298650
              dump_iface() at dump_iface+0x145/frame 0xfffffe03c0298700
              rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe03c0298780
              if_attach_internal() at if_attach_internal+0x3cf/frame 0xfffffe03c02987d0
              ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe03c0298810
              qlnx_init_ifnet() at qlnx_init_ifnet+0x2c6/frame 0xfffffe03c0298860
              qlnx_pci_attach() at qlnx_pci_attach+0x7d9/frame 0xfffffe03c0298900
              device_attach() at device_attach+0x3be/frame 0xfffffe03c0298950
              device_probe_and_attach() at device_probe_and_attach+0x41/frame 0xfffffe03c0298980
              pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe03c02989c0
              devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe03c0298a00
              devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe03c0298a40
              module_register_init() at module_register_init+0x85/frame 0xfffffe03c0298a70
              linker_load_module() at linker_load_module+0xbd5/frame 0xfffffe03c0298d70
              kern_kldload() at kern_kldload+0x16a/frame 0xfffffe03c0298dd0
              sys_kldload() at sys_kldload+0x5c/frame 0xfffffe03c0298e00
              amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe03c0298f30
              fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe03c0298f30
              --- syscall (304, FreeBSD ELF64, kldload), rip = 0x183cac2d58aa, rsp = 0x183caa53f3e8, rbp = 0x183caa53f960 ---
              
              ql0: <Qlogic 10GbE/25GbE/40GbE PCI CNA (AH) Adapter-Ethernet Function v2.0.112> mem 0xfb820000-0xfb83ffff,0xfb000000-0xfb7fffff,0xfb850000-0xfb85ffff at device 0.0 numa-domain 1 on pci10
              ql0: qlnx_set_personality: ETH_IWARP
              ql0: setting parameters required by iWARP dev
              
              
              Fatal trap 12: page fault while in kernel mode
              cpuid = 23; apic id = 34
              fault virtual address	= 0x0
              fault code		= supervisor read instruction, page not present
              instruction pointer	= 0x20:0x0
              stack pointer	        = 0x0:0xfffffe03c0298628
              frame pointer	        = 0x0:0xfffffe03c0298650
              code segment		= base 0x0, limit 0xfffff, type 0x1b
              			= DPL 0, pres 1, long 1, def32 0, gran 1
              processor eflags	= interrupt enabled, resume, IOPL = 0
              current process		= 55563 (kldload)
              rdi: fffff815bd17b800 rsi: fffffe03c02986a0 rdx: 00000000c0306938
              rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000010
              rax: 0000000000000000 rbx: fffffe03c02986a0 rbp: fffffe03c0298650
              r10: 0000000000000000 r11: fffffe00e6ce8000 r12: 0000000000008802
              r13: fffff81081a15810 r14: fffffe03b48fcf90 r15: 0000000000000016
              trap number		= 12
              panic: page fault
              cpuid = 23
              time = 1717620079
              KDB: enter: panic
              

              That doesn't appear to be a known bug: https://bugs.freebsd.org/bugzilla/buglist.cgi?quicksearch=qlnxe

              G K 2 Replies Last reply Reply Quote 0
              • G Offline
                GeorgePatches @stephenw10
                last edited by

                @stephenw10 🙃 I swear I'm an edge case magnetic.

                1 Reply Last reply Reply Quote 0
                • stephenw10S Offline
                  stephenw10 Netgate Administrator
                  last edited by

                  What is that NIC exactly?

                  G 2 Replies Last reply Reply Quote 0
                  • G Offline
                    GeorgePatches @stephenw10
                    last edited by

                    @stephenw10 Exactly? I'm not sure Qlogic FastLinQ 41000 series 2 port SFP. It's a QL41132HLCU, QL41212HLCU, or QL41262HLCU going by the Qlogic datasheet. I'm betting the QL41132HLCU as we wanted 10G cards and the other 2 models are 10G/25G cards. I'll need to dig in the firmware or the purchase orders to figure it out exactly. I will get back to you.

                    Sounds like this is a FreeBSD issue and nothing weird I did at least. Any idea why this wasn't detected on the initial install?

                    1 Reply Last reply Reply Quote 0
                    • G Offline
                      GeorgePatches @stephenw10
                      last edited by

                      @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

                      What is that NIC exactly?

                      My speculation was correct, it is a Qlogic FastlinQ QL41132HLCU exactly.

                      1 Reply Last reply Reply Quote 0
                      • K Offline
                        kprovost @stephenw10
                        last edited by

                        @stephenw10 I've not done any detailed digging, but there's been at least one bug fix in dump_iface() not too long ago to fix similar crashes:

                        commit 7d48224073ce14f0dd3db2d4e96876ac928b52f2
                        Author: Bjoern A. Zeeb <bz@FreeBSD.org>
                        Date:   Sat Sep 30 15:11:57 2023 +0000
                        
                            netlink: fix accessing freed memory
                        
                            The check for if_addrlen in dump_iface() is not sufficient to determine
                            if we still have a valid if_addr.  Rather than directly accessing if_addr
                            check the STAILQ (for the first entry).
                            This avoids panics when destroying cloned interfaces as experienced with
                            net80211 wlan ones.
                        
                            Sponsored by:   The FreeBSD Foundation
                            MFC after:      3 days
                            Reviewed by:    jhibbits (earlier version), kp
                            Differential Revision: https://reviews.freebsd.org/D42027
                        

                        It's certainly worth testing a 2.8 snapshot before we dig deeper.

                        G 1 Reply Last reply Reply Quote 1
                        • G Offline
                          GeorgePatches @kprovost
                          last edited by

                          @kprovost said in Proper procedure for adding a NIC kernel module? (qlnxe):

                          It's certainly worth testing a 2.8 snapshot before we dig deeper.

                          Would that fix be in the latest PF+? This is a production machine with lots of work happening, but I'm poking my management chain about paying for support.

                          K 1 Reply Last reply Reply Quote 0
                          • K Offline
                            kprovost @GeorgePatches
                            last edited by

                            @GeorgePatches That particular patch is in 24.03, yes.

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S Offline
                              stephenw10 Netgate Administrator
                              last edited by stephenw10

                              Hmm, I wonder if we can do something to avoid that bug as a test. 🤔

                              G 1 Reply Last reply Reply Quote 0
                              • G Offline
                                GeorgePatches @stephenw10
                                last edited by

                                @stephenw10 Hmmmmm, a thought is that it blew up on the dummynet code. I can try ripping the limiters out and see it doesn't blow up.

                                G 1 Reply Last reply Reply Quote 0
                                • G Offline
                                  GeorgePatches @GeorgePatches
                                  last edited by

                                  This thought was wrong, it blew up exactly the same without limiters and the dummynet modules not loaded. 🤣

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Well one thing ruled out I guess!

                                    1 Reply Last reply Reply Quote 0
                                    • G Offline
                                      GeorgePatches
                                      last edited by GeorgePatches

                                      There's no easy way to like try a 2.8 snap and then roll back to 2.7.2, right? You can do that with PF+, if I understand the bootloader thing correctly?

                                      I ask because management has approved our initial request for a support contract. We're currently waiting on a quote and then actual approval and purchasing. I'm ok putting a pin in this until it's easier to test a snap and roll back. This card is a nice to have, we're currently "doing fine" with our LAGG'd gigabit links.

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S Offline
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        You can manually create ZFS snapshots at the CLI in CE, assuming you are running ZFS. However there are no public 2.8-dev snapshots yet.

                                        1 Reply Last reply Reply Quote 0
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.