Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Kernel Panic - bxe Driver - Broadcom 10Gb/s NIC

    Scheduled Pinned Locked Moved General pfSense Questions
    6 Posts 2 Posters 713 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      MarkFarr
      last edited by

      Hello, I recently installed a dual port Broadcom chip based PCI-X network card in my hardware firewall. This was approximately 1 week ago.

      Initially I was using only one port connected to my WAN and the machine ran stable for over one week. I was also running a custom compiled kernel module for the network card. Two days ago I configured the second port to connect to my LAN and it ran stable for one day. Today I have experienced three kernel panics so far. After the first kernel panic I removed the line that loads the module in /boot/loader.conf.local and confirmed the module was loaded and unloaded using a kldstat. The second and third kernel panic was using the default kernel driver for this Broadcom chipset.

      This is part of the msgbuf.txt from the dump files.

      Sleeping thread (tid 100120, pid 18361) owns a non-sleepable lock
      KDB: stack backtrace of thread 100120:
      sched_switch() at sched_switch+0x8ad/frame 0xfffffe04617932e0
      mi_switch() at mi_switch+0xe6/frame 0xfffffe0461793310
      sleepq_wait() at sleepq_wait+0x2c/frame 0xfffffe0461793340
      _sx_xlock_hard() at _sx_xlock_hard+0x306/frame 0xfffffe04617933f0
      bxe_ioctl() at bxe_ioctl+0x689/frame 0xfffffe0461793440
      if_delmulti() at if_delmulti+0x125/frame 0xfffffe0461793480
      vlan_setmulti() at vlan_setmulti+0x43/frame 0xfffffe04617934c0
      vlan_ioctl() at vlan_ioctl+0x8c/frame 0xfffffe0461793540
      inp_setmoptions() at inp_setmoptions+0x1711/frame 0xfffffe0461793710
      ip_ctloutput() at ip_ctloutput+0x11d/frame 0xfffffe0461793760
      rip_ctloutput() at rip_ctloutput+0x133/frame 0xfffffe0461793790
      sosetopt() at sosetopt+0xb2/frame 0xfffffe04617937f0
      kern_setsockopt() at kern_setsockopt+0xca/frame 0xfffffe0461793860
      sys_setsockopt() at sys_setsockopt+0x24/frame 0xfffffe0461793880
      amd64_syscall() at amd64_syscall+0xa38/frame 0xfffffe04617939b0
      fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe04617939b0
      --- syscall (105, FreeBSD ELF64, sys_setsockopt), rip = 0x80093195a, rsp = 0x7fffffffea28, rbp = 0x7fffffffea70 ---
      panic: sleeping thread
      cpuid = 2
      KDB: enter: panic
      

      From my limited understanding of the log, it seems I am experiencing the same issues in these threads from four years ago.

      https://redmine.pfsense.org/issues/4685

      https://forum.netgate.com/topic/87506/pfsense-2-2-x-panics-with-sleeping-thread-owns-a-non-sleepable-lock

      As far as I can tell I am not running an ARP Proxy, and the bug was resolved in the 2.2.x branch of pfSense.

      Can anyone provide any insight into what may have caused this?

      Attached are the two set of dump files with the custom kernel module (0) and the default kernel driver (2).

      textdump.0.tar
      textdump.2.tar

      Thank you in advance for any help provided.
      Mark.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by stephenw10

        Hmm, identical backtraces, definitely looks like a software issue:

        db:0:kdb.enter.default>  show pcpu
        cpuid        = 2
        dynamic pcpu = 0xfffffe045c2a8380
        curthread    = 0xfffff80007465620: pid 12 "swi1: netisr 4"
        curpcb       = 0xfffffe03db1c3a80
        fpcurthread  = none
        idlethread   = 0xfffff800073ac000: tid 100005 "idle: cpu2"
        curpmap      = 0xffffffff82b85998
        tssp         = 0xffffffff82bb68e0
        commontssp   = 0xffffffff82bb68e0
        rsp0         = 0xfffffe03db1c3a80
        gs32p        = 0xffffffff82bbd138
        ldt          = 0xffffffff82bbd178
        tss          = 0xffffffff82bbd168
        db:0:kdb.enter.default>  bt
        Tracing pid 12 tid 100032 td 0xfffff80007465620
        kdb_enter() at kdb_enter+0x3b/frame 0xfffffe03db1c3510
        vpanic() at vpanic+0x194/frame 0xfffffe03db1c3570
        panic() at panic+0x43/frame 0xfffffe03db1c35d0
        propagate_priority() at propagate_priority+0x2b2/frame 0xfffffe03db1c3600
        turnstile_wait() at turnstile_wait+0x319/frame 0xfffffe03db1c3650
        __rw_rlock_hard() at __rw_rlock_hard+0x292/frame 0xfffffe03db1c36e0
        rip_input() at rip_input+0x2bb/frame 0xfffffe03db1c3750
        igmp_input() at igmp_input+0x173/frame 0xfffffe03db1c3810
        ip_input() at ip_input+0x139/frame 0xfffffe03db1c3870
        swi_net() at swi_net+0x143/frame 0xfffffe03db1c38e0
        intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe03db1c3920
        ithread_loop() at ithread_loop+0xe7/frame 0xfffffe03db1c3970
        fork_exit() at fork_exit+0x83/frame 0xfffffe03db1c39b0
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe03db1c39b0
        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
        db:0:kdb.enter.default>  ps
        

        And you only see that when both ports are assigned and in use?

        It only links at 2.5G with the custom driver I assume? What was the second port being used for?

        Steve

        1 Reply Last reply Reply Quote 0
        • M
          MarkFarr
          last edited by MarkFarr

          Hi Stephen, thank you for replying.

          And you only see that when both ports are assigned and in use?

          Yes today was the first time I have ever experienced a kernel panic with pfSense. and I have run the distribution now for about 5 years.

          It only links at 2.5G with the custom driver I assume? What was the second port being used for?

          yes the custom driver was made to squeeze even more speed out of our 1.5Gbit/s fiber to the home lines. I have since removed the custom kernel module.

          The second port on the card was not being used initially, because I was not able to figure out why it was not connecting to VLAN 1 by default. I had to explicitly create and assign VLAN 1 to second port (bxe1.1).

          The way my pfSense server is connected to the internet is that the Bell provided GPON module is inserted into a Ubiquiti ES-16-XG switch on Port 1, and that module negotiates to a speed of 2.5 Gbps. I then have a SFP+ DAC going from port 2 on the switch to the Broadcom card in my pfSense server which negotiates to a speed of 10 Gbps. I think with my current setup I am not reaping the benefits of the custom driver.

          Therefore I have Internet on VLAN 35 on Ports 1, 2 and 13 of the switch, and I have VLAN 1 on the same switch on ports 11, 12, 15 and 16 for LAN access. Both ports on the pfSense server are connected to the same switch but on explicit VLANs. These VLANs are not trunked together.

          I will try posting an image here of the switches VLANs.

          LAN

          WAN

          One other thing I wanted to add, is that I was running TCPDumps on both bxe0 (WAN) and bxe1 (LAN) over the weekend also trying to figure out why my IPTV Service was not behaving correctly.

          I hope this information helps.

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Hmm, well I would definitely not use VLAN1. Better to not ever use it as a tagged VLAN. It's hard to imagine the card would balk at it but it will not have been tested. If one if the ports was using it and you still have VLAN hardware tagging off-loading enabled I could just about imagine that as an issue.

            Yes, in that setup you would not be taking advantage of the driver. Though if the switch port can negotiate at 2.5Gb you're not losing anything either. The intention though is to have the Bell module directly in the Broadcom card I believe. I have no way to test that. I can only dream of those speeds! 😉

            Steve

            M 1 Reply Last reply Reply Quote 0
            • M
              MarkFarr @stephenw10
              last edited by

              @stephenw10

              VLAN hardware tagging off-loading enabled

              I am unfamiliar with this option. I don't see it in the System -> Advanced -> Networking section nor in the System Tunables. Is this a driver specific option?

              I don't see anything mentioned that is similar in the man page for the driver.
              https://man.openbsd.org/FreeBSD-11.1/bxe.4

              Thank you again for your ongoing help.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Check the ifconfig output for the bxe NICs for things like VLAN_HWTAGGING,VLAN_HWCSUM,VLAN_HWFILTER.
                There's no GUI knob for that but you can disable it if required. I'm not aware of any issue with it but no-one use VLAN1 so...

                Steve

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.