Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Proper procedure for adding a NIC kernel module? (qlnxe)

    Scheduled Pinned Locked Moved Hardware
    20 Posts 3 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Yup so that definitely crashed trying to attach the driver:

      db:0:kdb.enter.default>  bt
      Tracing pid 55563 tid 116689 td 0xfffffe0382ce93a0
      kdb_enter() at kdb_enter+0x32/frame 0xfffffe03c0298300
      vpanic() at vpanic+0x163/frame 0xfffffe03c0298430
      panic() at panic+0x43/frame 0xfffffe03c0298490
      trap_fatal() at trap_fatal+0x40c/frame 0xfffffe03c02984f0
      trap_pfault() at trap_pfault+0x4f/frame 0xfffffe03c0298550
      calltrap() at calltrap+0x8/frame 0xfffffe03c0298550
      --- trap 0xc, rip = 0, rsp = 0xfffffe03c0298628, rbp = 0xfffffe03c0298650 ---
      ??() at 0/frame 0xfffffe03c0298650
      dump_iface() at dump_iface+0x145/frame 0xfffffe03c0298700
      rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe03c0298780
      if_attach_internal() at if_attach_internal+0x3cf/frame 0xfffffe03c02987d0
      ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe03c0298810
      qlnx_init_ifnet() at qlnx_init_ifnet+0x2c6/frame 0xfffffe03c0298860
      qlnx_pci_attach() at qlnx_pci_attach+0x7d9/frame 0xfffffe03c0298900
      device_attach() at device_attach+0x3be/frame 0xfffffe03c0298950
      device_probe_and_attach() at device_probe_and_attach+0x41/frame 0xfffffe03c0298980
      pci_driver_added() at pci_driver_added+0xf2/frame 0xfffffe03c02989c0
      devclass_driver_added() at devclass_driver_added+0x39/frame 0xfffffe03c0298a00
      devclass_add_driver() at devclass_add_driver+0x11e/frame 0xfffffe03c0298a40
      module_register_init() at module_register_init+0x85/frame 0xfffffe03c0298a70
      linker_load_module() at linker_load_module+0xbd5/frame 0xfffffe03c0298d70
      kern_kldload() at kern_kldload+0x16a/frame 0xfffffe03c0298dd0
      sys_kldload() at sys_kldload+0x5c/frame 0xfffffe03c0298e00
      amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe03c0298f30
      fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe03c0298f30
      --- syscall (304, FreeBSD ELF64, kldload), rip = 0x183cac2d58aa, rsp = 0x183caa53f3e8, rbp = 0x183caa53f960 ---
      
      ql0: <Qlogic 10GbE/25GbE/40GbE PCI CNA (AH) Adapter-Ethernet Function v2.0.112> mem 0xfb820000-0xfb83ffff,0xfb000000-0xfb7fffff,0xfb850000-0xfb85ffff at device 0.0 numa-domain 1 on pci10
      ql0: qlnx_set_personality: ETH_IWARP
      ql0: setting parameters required by iWARP dev
      
      
      Fatal trap 12: page fault while in kernel mode
      cpuid = 23; apic id = 34
      fault virtual address	= 0x0
      fault code		= supervisor read instruction, page not present
      instruction pointer	= 0x20:0x0
      stack pointer	        = 0x0:0xfffffe03c0298628
      frame pointer	        = 0x0:0xfffffe03c0298650
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 55563 (kldload)
      rdi: fffff815bd17b800 rsi: fffffe03c02986a0 rdx: 00000000c0306938
      rcx: 00000000c0306938  r8: 0000000000000000  r9: 0000000000000010
      rax: 0000000000000000 rbx: fffffe03c02986a0 rbp: fffffe03c0298650
      r10: 0000000000000000 r11: fffffe00e6ce8000 r12: 0000000000008802
      r13: fffff81081a15810 r14: fffffe03b48fcf90 r15: 0000000000000016
      trap number		= 12
      panic: page fault
      cpuid = 23
      time = 1717620079
      KDB: enter: panic
      

      That doesn't appear to be a known bug: https://bugs.freebsd.org/bugzilla/buglist.cgi?quicksearch=qlnxe

      G K 2 Replies Last reply Reply Quote 0
      • G
        GeorgePatches @stephenw10
        last edited by

        @stephenw10 🙃 I swear I'm an edge case magnetic.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          What is that NIC exactly?

          G 2 Replies Last reply Reply Quote 0
          • G
            GeorgePatches @stephenw10
            last edited by

            @stephenw10 Exactly? I'm not sure Qlogic FastLinQ 41000 series 2 port SFP. It's a QL41132HLCU, QL41212HLCU, or QL41262HLCU going by the Qlogic datasheet. I'm betting the QL41132HLCU as we wanted 10G cards and the other 2 models are 10G/25G cards. I'll need to dig in the firmware or the purchase orders to figure it out exactly. I will get back to you.

            Sounds like this is a FreeBSD issue and nothing weird I did at least. Any idea why this wasn't detected on the initial install?

            1 Reply Last reply Reply Quote 0
            • G
              GeorgePatches @stephenw10
              last edited by

              @stephenw10 said in Proper procedure for adding a NIC kernel module? (qlnxe):

              What is that NIC exactly?

              My speculation was correct, it is a Qlogic FastlinQ QL41132HLCU exactly.

              1 Reply Last reply Reply Quote 0
              • K
                kprovost @stephenw10
                last edited by

                @stephenw10 I've not done any detailed digging, but there's been at least one bug fix in dump_iface() not too long ago to fix similar crashes:

                commit 7d48224073ce14f0dd3db2d4e96876ac928b52f2
                Author: Bjoern A. Zeeb <bz@FreeBSD.org>
                Date:   Sat Sep 30 15:11:57 2023 +0000
                
                    netlink: fix accessing freed memory
                
                    The check for if_addrlen in dump_iface() is not sufficient to determine
                    if we still have a valid if_addr.  Rather than directly accessing if_addr
                    check the STAILQ (for the first entry).
                    This avoids panics when destroying cloned interfaces as experienced with
                    net80211 wlan ones.
                
                    Sponsored by:   The FreeBSD Foundation
                    MFC after:      3 days
                    Reviewed by:    jhibbits (earlier version), kp
                    Differential Revision: https://reviews.freebsd.org/D42027
                

                It's certainly worth testing a 2.8 snapshot before we dig deeper.

                G 1 Reply Last reply Reply Quote 1
                • G
                  GeorgePatches @kprovost
                  last edited by

                  @kprovost said in Proper procedure for adding a NIC kernel module? (qlnxe):

                  It's certainly worth testing a 2.8 snapshot before we dig deeper.

                  Would that fix be in the latest PF+? This is a production machine with lots of work happening, but I'm poking my management chain about paying for support.

                  K 1 Reply Last reply Reply Quote 0
                  • K
                    kprovost @GeorgePatches
                    last edited by

                    @GeorgePatches That particular patch is in 24.03, yes.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by stephenw10

                      Hmm, I wonder if we can do something to avoid that bug as a test. 🤔

                      G 1 Reply Last reply Reply Quote 0
                      • G
                        GeorgePatches @stephenw10
                        last edited by

                        @stephenw10 Hmmmmm, a thought is that it blew up on the dummynet code. I can try ripping the limiters out and see it doesn't blow up.

                        G 1 Reply Last reply Reply Quote 0
                        • G
                          GeorgePatches @GeorgePatches
                          last edited by

                          This thought was wrong, it blew up exactly the same without limiters and the dummynet modules not loaded. 🤣

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Well one thing ruled out I guess!

                            1 Reply Last reply Reply Quote 0
                            • G
                              GeorgePatches
                              last edited by GeorgePatches

                              There's no easy way to like try a 2.8 snap and then roll back to 2.7.2, right? You can do that with PF+, if I understand the bootloader thing correctly?

                              I ask because management has approved our initial request for a support contract. We're currently waiting on a quote and then actual approval and purchasing. I'm ok putting a pin in this until it's easier to test a snap and roll back. This card is a nice to have, we're currently "doing fine" with our LAGG'd gigabit links.

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                You can manually create ZFS snapshots at the CLI in CE, assuming you are running ZFS. However there are no public 2.8-dev snapshots yet.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.