Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Found a bugfix, how to get it added to the wiki?

    General pfSense Questions
    4
    15
    1.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K
      kieranc
      last edited by

      I was having daily kernel panics on a box with a 4 port igb NIC and I found something that fixed it

      hw.igb.num_queues=1
      

      It used to be on the "Tuning and Troubleshooting Network Cards" page of the wiki until 2015 when it was removed, but apparently it's still useful, how can I get it put back on there?

      1 Reply Last reply Reply Quote 0
      • M
        mwp821
        last edited by

        It's still in the pfSense book.

        I'm having a similar daily kernel panic that I can't solve; maybe I'll try this.

        1 Reply Last reply Reply Quote 0
        • K
          kieranc
          last edited by

          My panics looked like this:

          
          Fatal trap 12: page fault while in kernel mode
          cpuid = 1; apic id = 01
          fault virtual address  = 0xb0
          fault code    = supervisor read data, page not present
          instruction pointer  = 0x20:0xffffffff80d0e74d
          stack pointer          = 0x28:0xfffffe00f0fd2624
          frame pointer          = 0x28:0xfffffe00f0fd2650
          code segment    = base 0x0, limit 0xfffff, type 0x1b
                = DPL 0, pres 1, long 1, def32 0, gran 1
          processor eflags  = interrupt enabled, resume, IOPL = 0
          current process    = 12 (irq273: igb2:que 1)
          
          
          
          Fatal trap 9: general protection fault while in kernel mode
          cpuid = 2; apic id = 02
          instruction pointer  = 0x20:0xffffffff80d0e762
          stack pointer          = 0x28:0xfffffe00f0f88624
          frame pointer          = 0x28:0xfffffe00f0f88650
          code segment    = base 0x0, limit 0xfffff, type 0x1b
                = DPL 0, pres 1, long 1, def32 0, gran 1
          processor eflags  = interrupt enabled, resume, IOPL = 0
          current process    = 0 (igb2 taskq)
          
          

          @mwp821:

          It's still in the pfSense book.

          This is entirely possible, but it's not on the wiki and that's why you didn't find it yet :-)

          1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator
            last edited by

            If you would explain how/why it fixes the panic or point me to some sources give such details be happy to add it to the wiki.

            Looking for the reference in the book now. Your saying its in the current version of the book that is available to gold members?  Or an old copy and not in current?

            edit:
            So I found it in the book.. Nice little explanation, etc..  But not going to take the passage from the book and paste into wiki..  You sure its not in the wiki?

            Ok looking at the edits, Chris removed those comments for some reason? Hmmm I don't know enough about it to know if there was a specific reason why it was removed or maybe it was an oversite, etc.  I wouldn't really feel comfortable putting it directly back in… But I could add a reference to this thread I would think.

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

            1 Reply Last reply Reply Quote 0
            • K
              kieranc
              last edited by

              As you please, I don't see the harm of it being in the wiki as "something to try if it's being weird", and I don't know why it was removed.
              My box has currently been up for over 2 weeks after daily reboots before I applied the tweak, so while I don't know how it works, I can confirm that it resolved my issue, so I think it has value.

              1 Reply Last reply Reply Quote 0
              • jimpJ
                jimp Rebel Alliance Developer Netgate
                last edited by

                It was on the wiki page as a crutch to help with mbuf issues. Those have been fixed. It wasn't really a fix, just a workaround. The instances linked in the book as being related to queues have been fixed as well.

                If you still have a crash with igb that is helped by reducing the queues, it's better to find out why than to just reduce the queue count. Dig deeper in the crash dumps/back trace and see where the crash is happening. It may not even really be from igb, but reducing the queue counts may hide the problem.

                We could add that back into the wiki but without any specific guidance as to when/why someone might want to try it, I'm hesitant to do so. The fact that it panics isn't enough, we need to know more about what is actually causing that panic.

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • johnpozJ
                  johnpoz LAYER 8 Global Moderator
                  last edited by

                  Great response - thanks Jimp

                  An intelligent man is sometimes forced to be drunk to spend time with his fools
                  If you get confused: Listen to the Music Play
                  Please don't Chat/PM me for help, unless mod related
                  SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                  1 Reply Last reply Reply Quote 0
                  • K
                    kieranc
                    last edited by

                    @jimp:

                    It was on the wiki page as a crutch to help with mbuf issues. Those have been fixed. It wasn't really a fix, just a workaround. The instances linked in the book as being related to queues have been fixed as well.

                    Maybe the bug has reappeared, or the fixes aren't working any more? A regression?

                    @jimp:

                    If you still have a crash with igb that is helped by reducing the queues, it's better to find out why than to just reduce the queue count. Dig deeper in the crash dumps/back trace and see where the crash is happening. It may not even really be from igb, but reducing the queue counts may hide the problem.

                    We could add that back into the wiki but without any specific guidance as to when/why someone might want to try it, I'm hesitant to do so. The fact that it panics isn't enough, we need to know more about what is actually causing that panic.

                    I don't really have the knowledge, time or inclination to dig deeper, I just want the box to work. It's a cheap chinese Qotom machine which came with pfSense preinstalled (yes I've wiped it) so it could easily be something to do with their implementation.

                    It was rebooting every day, I tried some stuff, looked on the wiki, didn't find anything useful, tried a bunch more stuff, found an old cached version of the wiki that included this information, bang, no more reboots.

                    I appreciate you both taking the time to respond, but 'we don't know why this fixes it' feels like a poor reason not to include it on a troubleshooting page.

                    1 Reply Last reply Reply Quote 0
                    • jimpJ
                      jimp Rebel Alliance Developer Netgate
                      last edited by

                      The bug didn't reappear, but it's possible your hardware has a different bug or problem.

                      Putting "try this, it might fix it but we don't know why" on the wiki is definitely a bad thing. We need to know why the hardware is crashing with more than one queue.

                      In most cases it's as simple as posting the full crash report that shows up after the panic and reboot. The backtrace will likely have better information, and the message buffer may have some clues as well.

                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      1 Reply Last reply Reply Quote 0
                      • johnpozJ
                        johnpoz LAYER 8 Global Moderator
                        last edited by

                        "Putting "try this, it might fix it but we don't know why""

                        hehehehee - haahahahah… Why is is that Jim?? ROFL.... Best remark have seen all day, anywhere!!

                        An intelligent man is sometimes forced to be drunk to spend time with his fools
                        If you get confused: Listen to the Music Play
                        Please don't Chat/PM me for help, unless mod related
                        SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                        1 Reply Last reply Reply Quote 0
                        • K
                          kieranc
                          last edited by

                          @jimp:

                          The bug didn't reappear, but it's possible your hardware has a different bug or problem.

                          Putting "try this, it might fix it but we don't know why" on the wiki is definitely a bad thing. We need to know why the hardware is crashing with more than one queue.

                          In most cases it's as simple as posting the full crash report that shows up after the panic and reboot. The backtrace will likely have better information, and the message buffer may have some clues as well.

                          Well, I sent the full crash logs via the GUI, so they're available somewhere. Sadly they seem to have been deleted automatically from my box.

                          What I'm failing to understand is the difference between this and all the other tweaks on the troubleshooting page. Are they all better understood than this one?

                          1 Reply Last reply Reply Quote 0
                          • jimpJ
                            jimp Rebel Alliance Developer Netgate
                            last edited by

                            Each of the crashes had an identical backtrace:

                            db:0:kdb.enter.default>  bt
                            Tracing pid 12 tid 100057 td 0xfffff80003d15560
                            rn_match() at rn_match+0x11d/frame 0xfffffe00f0fd2650
                            fib4_lookup_nh_basic() at fib4_lookup_nh_basic+0x84/frame 0xfffffe00f0fd26b0
                            ip_findroute() at ip_findroute+0x31/frame 0xfffffe00f0fd26e0
                            ip_tryforward() at ip_tryforward+0x1f7/frame 0xfffffe00f0fd2750
                            ip_input() at ip_input+0x3c5/frame 0xfffffe00f0fd27b0
                            netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe00f0fd2800
                            ether_demux() at ether_demux+0x16d/frame 0xfffffe00f0fd2830
                            ether_nh_input() at ether_nh_input+0x310/frame 0xfffffe00f0fd2890
                            netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe00f0fd28e0
                            ether_input() at ether_input+0x26/frame 0xfffffe00f0fd2900
                            igb_rxeof() at igb_rxeof+0x6f4/frame 0xfffffe00f0fd2990
                            igb_msix_que() at igb_msix_que+0x109/frame 0xfffffe00f0fd29e0
                            intr_event_execute_handlers() at intr_event_execute_handlers+0xec/frame 0xfffffe00f0fd2a20
                            ithread_loop() at ithread_loop+0xd6/frame 0xfffffe00f0fd2a70
                            fork_exit() at fork_exit+0x85/frame 0xfffffe00f0fd2ab0
                            fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00f0fd2ab0
                            
                            

                            That code path is fairly deep in routing and packet processing, but not in an area we typically see issues. Doesn't look like mbuf exhaustion, no trace of ALTQ, and it doesn't match any of the previous queue-related panics that we have seen.

                            It's possible there is some new FreeBSD issue that is only affected by the specific combination of hardware you have, or it could be that hardware just can't handle the load of multiple queues. Given what you said the hardware is, I am more inclined to blame the hardware.

                            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            1 Reply Last reply Reply Quote 0
                            • K
                              kieranc
                              last edited by

                              @jimp:

                              Each of the crashes had an identical backtrace:

                              db:0:kdb.enter.default>  bt
                              Tracing pid 12 tid 100057 td 0xfffff80003d15560
                              rn_match() at rn_match+0x11d/frame 0xfffffe00f0fd2650
                              fib4_lookup_nh_basic() at fib4_lookup_nh_basic+0x84/frame 0xfffffe00f0fd26b0
                              ip_findroute() at ip_findroute+0x31/frame 0xfffffe00f0fd26e0
                              ip_tryforward() at ip_tryforward+0x1f7/frame 0xfffffe00f0fd2750
                              ip_input() at ip_input+0x3c5/frame 0xfffffe00f0fd27b0
                              netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe00f0fd2800
                              ether_demux() at ether_demux+0x16d/frame 0xfffffe00f0fd2830
                              ether_nh_input() at ether_nh_input+0x310/frame 0xfffffe00f0fd2890
                              netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe00f0fd28e0
                              ether_input() at ether_input+0x26/frame 0xfffffe00f0fd2900
                              igb_rxeof() at igb_rxeof+0x6f4/frame 0xfffffe00f0fd2990
                              igb_msix_que() at igb_msix_que+0x109/frame 0xfffffe00f0fd29e0
                              intr_event_execute_handlers() at intr_event_execute_handlers+0xec/frame 0xfffffe00f0fd2a20
                              ithread_loop() at ithread_loop+0xd6/frame 0xfffffe00f0fd2a70
                              fork_exit() at fork_exit+0x85/frame 0xfffffe00f0fd2ab0
                              fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00f0fd2ab0
                              
                              

                              That code path is fairly deep in routing and packet processing, but not in an area we typically see issues. Doesn't look like mbuf exhaustion, no trace of ALTQ, and it doesn't match any of the previous queue-related panics that we have seen.

                              It's possible there is some new FreeBSD issue that is only affected by the specific combination of hardware you have, or it could be that hardware just can't handle the load of multiple queues. Given what you said the hardware is, I am more inclined to blame the hardware.

                              Fair enough, so if we accept that it's a hardware issue and there's a workaround that mitigates it…. Does it qualify for the wiki? I'd just like other people having the same problem to be able to find the workaround. I wasted more than a few hours searching for it.

                              1 Reply Last reply Reply Quote 0
                              • jimpJ
                                jimp Rebel Alliance Developer Netgate
                                last edited by

                                I added a note about it a few hours ago, but I'm still not terribly happy about it being there. It's a kludge and the actual problem underneath it needs to be addressed, but if it's specific to your hardware and FreeBSD in general, you'll need to work with them on it. It's also possible it's the nature of that hardware and can't be fixed.

                                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                Need help fast? Netgate Global Support!

                                Do not Chat/PM for help!

                                1 Reply Last reply Reply Quote 0
                                • K
                                  kieranc
                                  last edited by

                                  Thanks, I appreciate it. If I get time to dig into it further, I'll do so.

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.