Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Proper procedure for adding a NIC kernel module? (qlnxe)

    Scheduled Pinned Locked Moved Hardware
    20 Posts 3 Posters 1.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.

      G 1 Reply Last reply Reply Quote 0
      • G
        GeorgePatches @stephenw10
        last edited by

        @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

        Do you have a crash report?

        info.0 textdump.tar.0

        1 Reply Last reply Reply Quote 0
        • G
          GeorgePatches @stephenw10
          last edited by

          @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

          Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.

          I think the problem is at the ??(), That seems like a weird function name to me.

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Yup so that definitely crashed trying to attach the driver:

            db:0:kdb.enter.default>  bt
            Tracing pid 55563 tid 116689 td 0xfffffe0382ce93a0
            kdb_enter() at kdb_enter+0x32/frame 0xfffffe03c0298300
            vpanic() at vpanic+0x163/frame 0xfffffe03c0298430
            panic() at panic+0x43/frame 0xfffffe03c0298490
            trap_fatal() at trap_fatal+0x40c/frame 0xfffffe03c02984f0
            trap_pfault() at trap_pfault+0x4f/frame 0xfffffe03c0298550
            calltrap() at calltrap+0x8/frame 0xfffffe03c0298550
            --- trap 0xc, rip = 0, rsp = 0xfffffe03c0298628, rbp = 0xfffffe03c0298650 ---
            ??() at 0/frame 0xfffffe03c0298650
            dump_iface() at dump_iface+0x145/frame 0xfffffe03c0298700
            rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe03c0298780
            if_attach_internal() at if_attach_internal+0x3cf/frame 0xfffffe03c02987d0
            ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe03c0298810
            qlnx_init_ifnet() at qlnx_init_ifnet+0x2c6/frame 0xfffffe03c0298860
            qlnx_pci_attach() at qlnx_pci_attach+0x7d9/frame 0xfffffe03c0298900
            device_attach() at device_attach+0x3be/frame 0xfffffe03c0298950
            device_probe_and_attach() at device_probe_and_attach+0x41/frame 0xfffffe03c0298980
            pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe03c02989c0
            devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe03c0298a00
            devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe03c0298a40
            module_register_init() at module_register_init+0x85/frame 0xfffffe03c0298a70
            linker_load_module() at linker_load_module+0xbd5/frame 0xfffffe03c0298d70
            kern_kldload() at kern_kldload+0x16a/frame 0xfffffe03c0298dd0
            sys_kldload() at sys_kldload+0x5c/frame 0xfffffe03c0298e00
            amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe03c0298f30
            fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe03c0298f30
            --- syscall (304, FreeBSD ELF64, kldload), rip = 0x183cac2d58aa, rsp = 0x183caa53f3e8, rbp = 0x183caa53f960 ---
            
            ql0: <Qlogic 10GbE/25GbE/40GbE PCI CNA (AH) Adapter-Ethernet Function v2.0.112> mem 0xfb820000-0xfb83ffff,0xfb000000-0xfb7fffff,0xfb850000-0xfb85ffff at device 0.0 numa-domain 1 on pci10
            ql0: qlnx_set_personality: ETH_IWARP
            ql0: setting parameters required by iWARP dev
            
            
            Fatal trap 12: page fault while in kernel mode
            cpuid = 23; apic id = 34
            fault virtual address	= 0x0
            fault code		= supervisor read instruction, page not present
            instruction pointer	= 0x20:0x0
            stack pointer	        = 0x0:0xfffffe03c0298628
            frame pointer	        = 0x0:0xfffffe03c0298650
            code segment		= base 0x0, limit 0xfffff, type 0x1b
            			= DPL 0, pres 1, long 1, def32 0, gran 1
            processor eflags	= interrupt enabled, resume, IOPL = 0
            current process		= 55563 (kldload)
            rdi: fffff815bd17b800 rsi: fffffe03c02986a0 rdx: 00000000c0306938
            rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000010
            rax: 0000000000000000 rbx: fffffe03c02986a0 rbp: fffffe03c0298650
            r10: 0000000000000000 r11: fffffe00e6ce8000 r12: 0000000000008802
            r13: fffff81081a15810 r14: fffffe03b48fcf90 r15: 0000000000000016
            trap number		= 12
            panic: page fault
            cpuid = 23
            time = 1717620079
            KDB: enter: panic
            

            That doesn't appear to be a known bug: https://bugs.freebsd.org/bugzilla/buglist.cgi?quicksearch=qlnxe

            G K 2 Replies Last reply Reply Quote 0
            • G
              GeorgePatches @stephenw10
              last edited by

              @stephenw10 🙃 I swear I'm an edge case magnetic.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                What is that NIC exactly?

                G 2 Replies Last reply Reply Quote 0
                • G
                  GeorgePatches @stephenw10
                  last edited by

                  @stephenw10 Exactly? I'm not sure Qlogic FastLinQ 41000 series 2 port SFP. It's a QL41132HLCU, QL41212HLCU, or QL41262HLCU going by the Qlogic datasheet. I'm betting the QL41132HLCU as we wanted 10G cards and the other 2 models are 10G/25G cards. I'll need to dig in the firmware or the purchase orders to figure it out exactly. I will get back to you.

                  Sounds like this is a FreeBSD issue and nothing weird I did at least. Any idea why this wasn't detected on the initial install?

                  1 Reply Last reply Reply Quote 0
                  • G
                    GeorgePatches @stephenw10
                    last edited by

                    @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

                    What is that NIC exactly?

                    My speculation was correct, it is a Qlogic FastlinQ QL41132HLCU exactly.

                    1 Reply Last reply Reply Quote 0
                    • K
                      kprovost @stephenw10
                      last edited by

                      @stephenw10 I've not done any detailed digging, but there's been at least one bug fix in dump_iface() not too long ago to fix similar crashes:

                      commit 7d48224073ce14f0dd3db2d4e96876ac928b52f2
                      Author: Bjoern A. Zeeb <bz@FreeBSD.org>
                      Date:   Sat Sep 30 15:11:57 2023 +0000
                      
                          netlink: fix accessing freed memory
                      
                          The check for if_addrlen in dump_iface() is not sufficient to determine
                          if we still have a valid if_addr.  Rather than directly accessing if_addr
                          check the STAILQ (for the first entry).
                          This avoids panics when destroying cloned interfaces as experienced with
                          net80211 wlan ones.
                      
                          Sponsored by:   The FreeBSD Foundation
                          MFC after:      3 days
                          Reviewed by:    jhibbits (earlier version), kp
                          Differential Revision: https://reviews.freebsd.org/D42027
                      

                      It's certainly worth testing a 2.8 snapshot before we dig deeper.

                      G 1 Reply Last reply Reply Quote 1
                      • G
                        GeorgePatches @kprovost
                        last edited by

                        @kprovost said in Proper procedure for adding a NIC kernel module? (qlnxe):

                        It's certainly worth testing a 2.8 snapshot before we dig deeper.

                        Would that fix be in the latest PF+? This is a production machine with lots of work happening, but I'm poking my management chain about paying for support.

                        K 1 Reply Last reply Reply Quote 0
                        • K
                          kprovost @GeorgePatches
                          last edited by

                          @GeorgePatches That particular patch is in 24.03, yes.

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by stephenw10

                            Hmm, I wonder if we can do something to avoid that bug as a test. 🤔

                            G 1 Reply Last reply Reply Quote 0
                            • G
                              GeorgePatches @stephenw10
                              last edited by

                              @stephenw10 Hmmmmm, a thought is that it blew up on the dummynet code. I can try ripping the limiters out and see it doesn't blow up.

                              G 1 Reply Last reply Reply Quote 0
                              • G
                                GeorgePatches @GeorgePatches
                                last edited by

                                This thought was wrong, it blew up exactly the same without limiters and the dummynet modules not loaded. 🤣

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Well one thing ruled out I guess!

                                  1 Reply Last reply Reply Quote 0
                                  • G
                                    GeorgePatches
                                    last edited by GeorgePatches

                                    There's no easy way to like try a 2.8 snap and then roll back to 2.7.2, right? You can do that with PF+, if I understand the bootloader thing correctly?

                                    I ask because management has approved our initial request for a support contract. We're currently waiting on a quote and then actual approval and purchasing. I'm ok putting a pin in this until it's easier to test a snap and roll back. This card is a nice to have, we're currently "doing fine" with our LAGG'd gigabit links.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      You can manually create ZFS snapshots at the CLI in CE, assuming you are running ZFS. However there are no public 2.8-dev snapshots yet.

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.