Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Proper procedure for adding a NIC kernel module? (qlnxe)

    Scheduled Pinned Locked Moved Hardware
    20 Posts 3 Posters 1.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G
      GeorgePatches @stephenw10
      last edited by

      @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

      Do you have a crash report?

      info.0 textdump.tar.0

      1 Reply Last reply Reply Quote 0
      • G
        GeorgePatches @stephenw10
        last edited by

        @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

        Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.

        I think the problem is at the ??(), That seems like a weird function name to me.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Yup so that definitely crashed trying to attach the driver:

          db:0:kdb.enter.default>  bt
          Tracing pid 55563 tid 116689 td 0xfffffe0382ce93a0
          kdb_enter() at kdb_enter+0x32/frame 0xfffffe03c0298300
          vpanic() at vpanic+0x163/frame 0xfffffe03c0298430
          panic() at panic+0x43/frame 0xfffffe03c0298490
          trap_fatal() at trap_fatal+0x40c/frame 0xfffffe03c02984f0
          trap_pfault() at trap_pfault+0x4f/frame 0xfffffe03c0298550
          calltrap() at calltrap+0x8/frame 0xfffffe03c0298550
          --- trap 0xc, rip = 0, rsp = 0xfffffe03c0298628, rbp = 0xfffffe03c0298650 ---
          ??() at 0/frame 0xfffffe03c0298650
          dump_iface() at dump_iface+0x145/frame 0xfffffe03c0298700
          rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe03c0298780
          if_attach_internal() at if_attach_internal+0x3cf/frame 0xfffffe03c02987d0
          ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe03c0298810
          qlnx_init_ifnet() at qlnx_init_ifnet+0x2c6/frame 0xfffffe03c0298860
          qlnx_pci_attach() at qlnx_pci_attach+0x7d9/frame 0xfffffe03c0298900
          device_attach() at device_attach+0x3be/frame 0xfffffe03c0298950
          device_probe_and_attach() at device_probe_and_attach+0x41/frame 0xfffffe03c0298980
          pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe03c02989c0
          devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe03c0298a00
          devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe03c0298a40
          module_register_init() at module_register_init+0x85/frame 0xfffffe03c0298a70
          linker_load_module() at linker_load_module+0xbd5/frame 0xfffffe03c0298d70
          kern_kldload() at kern_kldload+0x16a/frame 0xfffffe03c0298dd0
          sys_kldload() at sys_kldload+0x5c/frame 0xfffffe03c0298e00
          amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe03c0298f30
          fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe03c0298f30
          --- syscall (304, FreeBSD ELF64, kldload), rip = 0x183cac2d58aa, rsp = 0x183caa53f3e8, rbp = 0x183caa53f960 ---
          
          ql0: <Qlogic 10GbE/25GbE/40GbE PCI CNA (AH) Adapter-Ethernet Function v2.0.112> mem 0xfb820000-0xfb83ffff,0xfb000000-0xfb7fffff,0xfb850000-0xfb85ffff at device 0.0 numa-domain 1 on pci10
          ql0: qlnx_set_personality: ETH_IWARP
          ql0: setting parameters required by iWARP dev
          
          
          Fatal trap 12: page fault while in kernel mode
          cpuid = 23; apic id = 34
          fault virtual address	= 0x0
          fault code		= supervisor read instruction, page not present
          instruction pointer	= 0x20:0x0
          stack pointer	        = 0x0:0xfffffe03c0298628
          frame pointer	        = 0x0:0xfffffe03c0298650
          code segment		= base 0x0, limit 0xfffff, type 0x1b
          			= DPL 0, pres 1, long 1, def32 0, gran 1
          processor eflags	= interrupt enabled, resume, IOPL = 0
          current process		= 55563 (kldload)
          rdi: fffff815bd17b800 rsi: fffffe03c02986a0 rdx: 00000000c0306938
          rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000010
          rax: 0000000000000000 rbx: fffffe03c02986a0 rbp: fffffe03c0298650
          r10: 0000000000000000 r11: fffffe00e6ce8000 r12: 0000000000008802
          r13: fffff81081a15810 r14: fffffe03b48fcf90 r15: 0000000000000016
          trap number		= 12
          panic: page fault
          cpuid = 23
          time = 1717620079
          KDB: enter: panic
          

          That doesn't appear to be a known bug: https://bugs.freebsd.org/bugzilla/buglist.cgi?quicksearch=qlnxe

          G K 2 Replies Last reply Reply Quote 0
          • G
            GeorgePatches @stephenw10
            last edited by

            @stephenw10 🙃 I swear I'm an edge case magnetic.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              What is that NIC exactly?

              G 2 Replies Last reply Reply Quote 0
              • G
                GeorgePatches @stephenw10
                last edited by

                @stephenw10 Exactly? I'm not sure Qlogic FastLinQ 41000 series 2 port SFP. It's a QL41132HLCU, QL41212HLCU, or QL41262HLCU going by the Qlogic datasheet. I'm betting the QL41132HLCU as we wanted 10G cards and the other 2 models are 10G/25G cards. I'll need to dig in the firmware or the purchase orders to figure it out exactly. I will get back to you.

                Sounds like this is a FreeBSD issue and nothing weird I did at least. Any idea why this wasn't detected on the initial install?

                1 Reply Last reply Reply Quote 0
                • G
                  GeorgePatches @stephenw10
                  last edited by

                  @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

                  What is that NIC exactly?

                  My speculation was correct, it is a Qlogic FastlinQ QL41132HLCU exactly.

                  1 Reply Last reply Reply Quote 0
                  • K
                    kprovost @stephenw10
                    last edited by

                    @stephenw10 I've not done any detailed digging, but there's been at least one bug fix in dump_iface() not too long ago to fix similar crashes:

                    commit 7d48224073ce14f0dd3db2d4e96876ac928b52f2
                    Author: Bjoern A. Zeeb <bz@FreeBSD.org>
                    Date:   Sat Sep 30 15:11:57 2023 +0000
                    
                        netlink: fix accessing freed memory
                    
                        The check for if_addrlen in dump_iface() is not sufficient to determine
                        if we still have a valid if_addr.  Rather than directly accessing if_addr
                        check the STAILQ (for the first entry).
                        This avoids panics when destroying cloned interfaces as experienced with
                        net80211 wlan ones.
                    
                        Sponsored by:   The FreeBSD Foundation
                        MFC after:      3 days
                        Reviewed by:    jhibbits (earlier version), kp
                        Differential Revision: https://reviews.freebsd.org/D42027
                    

                    It's certainly worth testing a 2.8 snapshot before we dig deeper.

                    G 1 Reply Last reply Reply Quote 1
                    • G
                      GeorgePatches @kprovost
                      last edited by

                      @kprovost said in Proper procedure for adding a NIC kernel module? (qlnxe):

                      It's certainly worth testing a 2.8 snapshot before we dig deeper.

                      Would that fix be in the latest PF+? This is a production machine with lots of work happening, but I'm poking my management chain about paying for support.

                      K 1 Reply Last reply Reply Quote 0
                      • K
                        kprovost @GeorgePatches
                        last edited by

                        @GeorgePatches That particular patch is in 24.03, yes.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by stephenw10

                          Hmm, I wonder if we can do something to avoid that bug as a test. 🤔

                          G 1 Reply Last reply Reply Quote 0
                          • G
                            GeorgePatches @stephenw10
                            last edited by

                            @stephenw10 Hmmmmm, a thought is that it blew up on the dummynet code. I can try ripping the limiters out and see it doesn't blow up.

                            G 1 Reply Last reply Reply Quote 0
                            • G
                              GeorgePatches @GeorgePatches
                              last edited by

                              This thought was wrong, it blew up exactly the same without limiters and the dummynet modules not loaded. 🤣

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Well one thing ruled out I guess!

                                1 Reply Last reply Reply Quote 0
                                • G
                                  GeorgePatches
                                  last edited by GeorgePatches

                                  There's no easy way to like try a 2.8 snap and then roll back to 2.7.2, right? You can do that with PF+, if I understand the bootloader thing correctly?

                                  I ask because management has approved our initial request for a support contract. We're currently waiting on a quote and then actual approval and purchasing. I'm ok putting a pin in this until it's easier to test a snap and roll back. This card is a nice to have, we're currently "doing fine" with our LAGG'd gigabit links.

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    You can manually create ZFS snapshots at the CLI in CE, assuming you are running ZFS. However there are no public 2.8-dev snapshots yet.

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.