Proper procedure for adding a NIC kernel module? (qlnxe)
-
Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.
-
@stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):
Do you have a crash report?
-
@stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):
Ah OK well I'd check the backtrace in the crash report first. It may be a know bug in the driver.
I think the problem is at the ??(), That seems like a weird function name to me.
-
Yup so that definitely crashed trying to attach the driver:
db:0:kdb.enter.default> bt Tracing pid 55563 tid 116689 td 0xfffffe0382ce93a0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe03c0298300 vpanic() at vpanic+0x163/frame 0xfffffe03c0298430 panic() at panic+0x43/frame 0xfffffe03c0298490 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe03c02984f0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe03c0298550 calltrap() at calltrap+0x8/frame 0xfffffe03c0298550 --- trap 0xc, rip = 0, rsp = 0xfffffe03c0298628, rbp = 0xfffffe03c0298650 --- ??() at 0/frame 0xfffffe03c0298650 dump_iface() at dump_iface+0x145/frame 0xfffffe03c0298700 rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe03c0298780 if_attach_internal() at if_attach_internal+0x3cf/frame 0xfffffe03c02987d0 ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe03c0298810 qlnx_init_ifnet() at qlnx_init_ifnet+0x2c6/frame 0xfffffe03c0298860 qlnx_pci_attach() at qlnx_pci_attach+0x7d9/frame 0xfffffe03c0298900 device_attach() at device_attach+0x3be/frame 0xfffffe03c0298950 device_probe_and_attach() at device_probe_and_attach+0x41/frame 0xfffffe03c0298980 pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe03c02989c0 devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe03c0298a00 devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe03c0298a40 module_register_init() at module_register_init+0x85/frame 0xfffffe03c0298a70 linker_load_module() at linker_load_module+0xbd5/frame 0xfffffe03c0298d70 kern_kldload() at kern_kldload+0x16a/frame 0xfffffe03c0298dd0 sys_kldload() at sys_kldload+0x5c/frame 0xfffffe03c0298e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe03c0298f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe03c0298f30 --- syscall (304, FreeBSD ELF64, kldload), rip = 0x183cac2d58aa, rsp = 0x183caa53f3e8, rbp = 0x183caa53f960 ---
ql0: <Qlogic 10GbE/25GbE/40GbE PCI CNA (AH) Adapter-Ethernet Function v2.0.112> mem 0xfb820000-0xfb83ffff,0xfb000000-0xfb7fffff,0xfb850000-0xfb85ffff at device 0.0 numa-domain 1 on pci10 ql0: qlnx_set_personality: ETH_IWARP ql0: setting parameters required by iWARP dev Fatal trap 12: page fault while in kernel mode cpuid = 23; apic id = 34 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x0:0xfffffe03c0298628 frame pointer = 0x0:0xfffffe03c0298650 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 55563 (kldload) rdi: fffff815bd17b800 rsi: fffffe03c02986a0 rdx: 00000000c0306938 rcx: 00000000c0306938 r8: 0000000000000000 r9: 0000000000000010 rax: 0000000000000000 rbx: fffffe03c02986a0 rbp: fffffe03c0298650 r10: 0000000000000000 r11: fffffe00e6ce8000 r12: 0000000000008802 r13: fffff81081a15810 r14: fffffe03b48fcf90 r15: 0000000000000016 trap number = 12 panic: page fault cpuid = 23 time = 1717620079 KDB: enter: panic
That doesn't appear to be a known bug: https://bugs.freebsd.org/bugzilla/buglist.cgi?quicksearch=qlnxe
-
@stephenw10 I swear I'm an edge case magnetic.
-
What is that NIC exactly?
-
@stephenw10 Exactly? I'm not sure Qlogic FastLinQ 41000 series 2 port SFP. It's a QL41132HLCU, QL41212HLCU, or QL41262HLCU going by the Qlogic datasheet. I'm betting the QL41132HLCU as we wanted 10G cards and the other 2 models are 10G/25G cards. I'll need to dig in the firmware or the purchase orders to figure it out exactly. I will get back to you.
Sounds like this is a FreeBSD issue and nothing weird I did at least. Any idea why this wasn't detected on the initial install?
-
@stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):
What is that NIC exactly?
My speculation was correct, it is a Qlogic FastlinQ QL41132HLCU exactly.
-
@stephenw10 I've not done any detailed digging, but there's been at least one bug fix in dump_iface() not too long ago to fix similar crashes:
commit 7d48224073ce14f0dd3db2d4e96876ac928b52f2 Author: Bjoern A. Zeeb <bz@FreeBSD.org> Date: Sat Sep 30 15:11:57 2023 +0000 netlink: fix accessing freed memory The check for if_addrlen in dump_iface() is not sufficient to determine if we still have a valid if_addr. Rather than directly accessing if_addr check the STAILQ (for the first entry). This avoids panics when destroying cloned interfaces as experienced with net80211 wlan ones. Sponsored by: The FreeBSD Foundation MFC after: 3 days Reviewed by: jhibbits (earlier version), kp Differential Revision: https://reviews.freebsd.org/D42027
It's certainly worth testing a 2.8 snapshot before we dig deeper.
-
@kprovost said in Proper procedure for adding a NIC kernel module? (qlnxe):
It's certainly worth testing a 2.8 snapshot before we dig deeper.
Would that fix be in the latest PF+? This is a production machine with lots of work happening, but I'm poking my management chain about paying for support.
-
@GeorgePatches That particular patch is in 24.03, yes.
-
Hmm, I wonder if we can do something to avoid that bug as a test.
-
@stephenw10 Hmmmmm, a thought is that it blew up on the dummynet code. I can try ripping the limiters out and see it doesn't blow up.
-
This thought was wrong, it blew up exactly the same without limiters and the dummynet modules not loaded.
-
Well one thing ruled out I guess!
-
There's no easy way to like try a 2.8 snap and then roll back to 2.7.2, right? You can do that with PF+, if I understand the bootloader thing correctly?
I ask because management has approved our initial request for a support contract. We're currently waiting on a quote and then actual approval and purchasing. I'm ok putting a pin in this until it's easier to test a snap and roll back. This card is a nice to have, we're currently "doing fine" with our LAGG'd gigabit links.
-
You can manually create ZFS snapshots at the CLI in CE, assuming you are running ZFS. However there are no public 2.8-dev snapshots yet.