• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Proper procedure for adding a NIC kernel module? (qlnxe)

Scheduled Pinned Locked Moved Hardware
20 Posts 3 Posters 1.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    GeorgePatches @stephenw10
    last edited by Jun 6, 2024, 6:25 PM

    @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

    Do you have a crash report?

    info.0 textdump.tar.0

    1 Reply Last reply Reply Quote 0
    • G
      GeorgePatches @stephenw10
      last edited by Jun 6, 2024, 6:43 PM

      @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

      Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.

      I think the problem is at the ??(), That seems like a weird function name to me.

      1 Reply Last reply Reply Quote 0
      • S
        stephenw10 Netgate Administrator
        last edited by Jun 6, 2024, 6:48 PM

        Yup so that definitely crashed trying to attach the driver:

        db:0:kdb.enter.default>  bt
        Tracing pid 55563 tid 116689 td 0xfffffe0382ce93a0
        kdb_enter() at kdb_enter+0x32/frame 0xfffffe03c0298300
        vpanic() at vpanic+0x163/frame 0xfffffe03c0298430
        panic() at panic+0x43/frame 0xfffffe03c0298490
        trap_fatal() at trap_fatal+0x40c/frame 0xfffffe03c02984f0
        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe03c0298550
        calltrap() at calltrap+0x8/frame 0xfffffe03c0298550
        --- trap 0xc, rip = 0, rsp = 0xfffffe03c0298628, rbp = 0xfffffe03c0298650 ---
        ??() at 0/frame 0xfffffe03c0298650
        dump_iface() at dump_iface+0x145/frame 0xfffffe03c0298700
        rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe03c0298780
        if_attach_internal() at if_attach_internal+0x3cf/frame 0xfffffe03c02987d0
        ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe03c0298810
        qlnx_init_ifnet() at qlnx_init_ifnet+0x2c6/frame 0xfffffe03c0298860
        qlnx_pci_attach() at qlnx_pci_attach+0x7d9/frame 0xfffffe03c0298900
        device_attach() at device_attach+0x3be/frame 0xfffffe03c0298950
        device_probe_and_attach() at device_probe_and_attach+0x41/frame 0xfffffe03c0298980
        pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe03c02989c0
        devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe03c0298a00
        devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe03c0298a40
        module_register_init() at module_register_init+0x85/frame 0xfffffe03c0298a70
        linker_load_module() at linker_load_module+0xbd5/frame 0xfffffe03c0298d70
        kern_kldload() at kern_kldload+0x16a/frame 0xfffffe03c0298dd0
        sys_kldload() at sys_kldload+0x5c/frame 0xfffffe03c0298e00
        amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe03c0298f30
        fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe03c0298f30
        --- syscall (304, FreeBSD ELF64, kldload), rip = 0x183cac2d58aa, rsp = 0x183caa53f3e8, rbp = 0x183caa53f960 ---
        
        ql0: <Qlogic 10GbE/25GbE/40GbE PCI CNA (AH) Adapter-Ethernet Function v2.0.112> mem 0xfb820000-0xfb83ffff,0xfb000000-0xfb7fffff,0xfb850000-0xfb85ffff at device 0.0 numa-domain 1 on pci10
        ql0: qlnx_set_personality: ETH_IWARP
        ql0: setting parameters required by iWARP dev
        
        
        Fatal trap 12: page fault while in kernel mode
        cpuid = 23; apic id = 34
        fault virtual address	= 0x0
        fault code		= supervisor read instruction, page not present
        instruction pointer	= 0x20:0x0
        stack pointer	        = 0x0:0xfffffe03c0298628
        frame pointer	        = 0x0:0xfffffe03c0298650
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 55563 (kldload)
        rdi: fffff815bd17b800 rsi: fffffe03c02986a0 rdx: 00000000c0306938
        rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000010
        rax: 0000000000000000 rbx: fffffe03c02986a0 rbp: fffffe03c0298650
        r10: 0000000000000000 r11: fffffe00e6ce8000 r12: 0000000000008802
        r13: fffff81081a15810 r14: fffffe03b48fcf90 r15: 0000000000000016
        trap number		= 12
        panic: page fault
        cpuid = 23
        time = 1717620079
        KDB: enter: panic
        

        That doesn't appear to be a known bug: https://bugs.freebsd.org/bugzilla/buglist.cgi?quicksearch=qlnxe

        G K 2 Replies Last reply Jun 6, 2024, 7:00 PM Reply Quote 0
        • G
          GeorgePatches @stephenw10
          last edited by Jun 6, 2024, 7:00 PM

          @stephenw10 🙃 I swear I'm an edge case magnetic.

          1 Reply Last reply Reply Quote 0
          • S
            stephenw10 Netgate Administrator
            last edited by Jun 6, 2024, 7:02 PM

            What is that NIC exactly?

            G 2 Replies Last reply Jun 6, 2024, 7:26 PM Reply Quote 0
            • G
              GeorgePatches @stephenw10
              last edited by Jun 6, 2024, 7:26 PM

              @stephenw10 Exactly? I'm not sure Qlogic FastLinQ 41000 series 2 port SFP. It's a QL41132HLCU, QL41212HLCU, or QL41262HLCU going by the Qlogic datasheet. I'm betting the QL41132HLCU as we wanted 10G cards and the other 2 models are 10G/25G cards. I'll need to dig in the firmware or the purchase orders to figure it out exactly. I will get back to you.

              Sounds like this is a FreeBSD issue and nothing weird I did at least. Any idea why this wasn't detected on the initial install?

              1 Reply Last reply Reply Quote 0
              • G
                GeorgePatches @stephenw10
                last edited by Jun 6, 2024, 7:30 PM

                @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

                What is that NIC exactly?

                My speculation was correct, it is a Qlogic FastlinQ QL41132HLCU exactly.

                1 Reply Last reply Reply Quote 0
                • K
                  kprovost @stephenw10
                  last edited by Jun 6, 2024, 7:58 PM

                  @stephenw10 I've not done any detailed digging, but there's been at least one bug fix in dump_iface() not too long ago to fix similar crashes:

                  commit 7d48224073ce14f0dd3db2d4e96876ac928b52f2
                  Author: Bjoern A. Zeeb <bz@FreeBSD.org>
                  Date:   Sat Sep 30 15:11:57 2023 +0000
                  
                      netlink: fix accessing freed memory
                  
                      The check for if_addrlen in dump_iface() is not sufficient to determine
                      if we still have a valid if_addr.  Rather than directly accessing if_addr
                      check the STAILQ (for the first entry).
                      This avoids panics when destroying cloned interfaces as experienced with
                      net80211 wlan ones.
                  
                      Sponsored by:   The FreeBSD Foundation
                      MFC after:      3 days
                      Reviewed by:    jhibbits (earlier version), kp
                      Differential Revision: https://reviews.freebsd.org/D42027
                  

                  It's certainly worth testing a 2.8 snapshot before we dig deeper.

                  G 1 Reply Last reply Jun 6, 2024, 8:08 PM Reply Quote 1
                  • G
                    GeorgePatches @kprovost
                    last edited by Jun 6, 2024, 8:08 PM

                    @kprovost said in Proper procedure for adding a NIC kernel module? (qlnxe):

                    It's certainly worth testing a 2.8 snapshot before we dig deeper.

                    Would that fix be in the latest PF+? This is a production machine with lots of work happening, but I'm poking my management chain about paying for support.

                    K 1 Reply Last reply Jun 6, 2024, 8:25 PM Reply Quote 0
                    • K
                      kprovost @GeorgePatches
                      last edited by Jun 6, 2024, 8:25 PM

                      @GeorgePatches That particular patch is in 24.03, yes.

                      1 Reply Last reply Reply Quote 0
                      • S
                        stephenw10 Netgate Administrator
                        last edited by stephenw10 Jun 6, 2024, 9:38 PM Jun 6, 2024, 9:38 PM

                        Hmm, I wonder if we can do something to avoid that bug as a test. 🤔

                        G 1 Reply Last reply Jun 7, 2024, 7:56 PM Reply Quote 0
                        • G
                          GeorgePatches @stephenw10
                          last edited by Jun 7, 2024, 7:56 PM

                          @stephenw10 Hmmmmm, a thought is that it blew up on the dummynet code. I can try ripping the limiters out and see it doesn't blow up.

                          G 1 Reply Last reply Jun 7, 2024, 8:40 PM Reply Quote 0
                          • G
                            GeorgePatches @GeorgePatches
                            last edited by Jun 7, 2024, 8:40 PM

                            This thought was wrong, it blew up exactly the same without limiters and the dummynet modules not loaded. 🤣

                            1 Reply Last reply Reply Quote 0
                            • S
                              stephenw10 Netgate Administrator
                              last edited by Jun 7, 2024, 8:52 PM

                              Well one thing ruled out I guess!

                              1 Reply Last reply Reply Quote 0
                              • G
                                GeorgePatches
                                last edited by GeorgePatches Jun 10, 2024, 4:00 PM Jun 10, 2024, 3:52 PM

                                There's no easy way to like try a 2.8 snap and then roll back to 2.7.2, right? You can do that with PF+, if I understand the bootloader thing correctly?

                                I ask because management has approved our initial request for a support contract. We're currently waiting on a quote and then actual approval and purchasing. I'm ok putting a pin in this until it's easier to test a snap and roll back. This card is a nice to have, we're currently "doing fine" with our LAGG'd gigabit links.

                                1 Reply Last reply Reply Quote 0
                                • S
                                  stephenw10 Netgate Administrator
                                  last edited by Jun 10, 2024, 4:53 PM

                                  You can manually create ZFS snapshots at the CLI in CE, assuming you are running ZFS. However there are no public 2.8-dev snapshots yet.

                                  1 Reply Last reply Reply Quote 0
                                  20 out of 20
                                  • First post
                                    20/20
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                    This community forum collects and processes your personal information.
                                    consent.not_received