Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Kernel crash - nmbufs?

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 5 Posters 3.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      jasperdillon
      last edited by

      We're periodically (reasonably regularly) seeing kernel panics, on a pfSense 2.2.3 setup as a transparant bridge, with bge Broadcom drivers (Dell PowerEdge server).

      Can post the full crash message, but it ends with

      ….
      <118>Bootup complete
      [zone: mbuf] kern.ipc.nmbufs limit reached
      [zone: mbuf] kern.ipc.nmbufs limit reached
      [zone: mbuf] kern.ipc.nmbufs limit reached
      [zone: mbuf] kern.ipc.nmbufs limit reached

      Fatal trap 12: page fault while in kernel mode
      cpuid = 4; apic id = 04
      fault virtual address              = 0x1d
      fault code                              = supervisor read data, page not present
      instruction pointer = 0x20:0xffffffff80b90647
      stack pointer                  = 0x28:0xfffffe001e1b56f0
      frame pointer                = 0x28:0xfffffe001e1b5770
      code segment                        = base 0x0, limit 0xfffff, type 0x1b
                                                      = DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags  = interrupt enabled, resume, IOPL = 0
      current process                    = 12 (irq16: bge0 bge2+)
      version.txt06000024712572265060  7623 ustarrootwheelFreeBSD 10.1-RELEASE-p13 #0 c77d1b2(releng/10.1)-dirty: Tue Jun 23 17:00:47 CDT 2015
          root@pfs22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10

      The NICs stop passing traffic while it recovers  (which it almost always does).

      We've made the config changes as per https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#Broadcom_bge.284.29_Cards
      but still seems to be occuring.

      From what I read there, it looks like its bge0 and bge2 that are failing (bge2 isn't even wired up / config'd! bge0 isn't a member of the bridge - so handles very little traffic (Respectively).

      Any further thoughts over the ones from the Tuning article? Even if it's a 'replace the nic with Intel model XYZ' we're open to suggestions.

      1 Reply Last reply Reply Quote 0
      • ? This user is from outside of this forum
        Guest
        last edited by

        To what you have changed the mbuf sizes to?

        1 Reply Last reply Reply Quote 0
        • J Offline
          jasperdillon
          last edited by

          It's set to 1,000,000 now, and we're still expericing the issue.  (the unit has 16GB RAM in it, so should be able to handle that)

          The dashboard panel, and the RRD graphs for MBUF usage show it sitting idle at 1% usage - so unless it's an instantaneous spike - it doesn't look like we're actually reaching that cap and it's a red herring to some degree.

          Can anybody clarify what the bge2+  section means?  We're not actually using interface bge2  -  instead bge0, bge4, and bge5… so seeing 2+ seems odd?

          1 Reply Last reply Reply Quote 0
          • J Offline
            jasperdillon
            last edited by

            Crash log attached…

            [pfsense crash.txt](/public/imported_attachments/1/pfsense crash.txt)

            1 Reply Last reply Reply Quote 0
            • ? This user is from outside of this forum
              Guest
              last edited by

              on a pfSense 2.2.3 setup as a transparant bridge,

              Can you short explain what is in front of the pfSense and behind of the pfSense?
              As an example:
              Internet –- ISP --- modem --- Cisco Router --- pfSense --- LAN Switch --- LAN

              1 Reply Last reply Reply Quote 0
              • J Offline
                jasperdillon
                last edited by

                Internet – ISP link (colo'd kit) -- pfSense as bridge -- LAN switch -- LAN

                There's 2 interfaces making the bridge, and an extra interface on a management network.

                1 Reply Last reply Reply Quote 0
                • ? This user is from outside of this forum
                  Guest
                  last edited by

                  pfSense as bridge

                  Is bridging the ports together a so called "must be" for you or would also try out routing that
                  you come closer to the point that the problem is not based on the bridge here in this game?

                  1 Reply Last reply Reply Quote 0
                  • T Offline
                    tim.mcmanus
                    last edited by

                    Can you replace the hardware or the physical NICs?

                    If the kernel is panicking, something really bad is happening. My quick guess is hardware failing and would recommend testing on new or replacement hardware.

                    1 Reply Last reply Reply Quote 0
                    • J Offline
                      jasperdillon
                      last edited by

                      Bridge setup is a definite requirement. We've got very similar hardware doing NAT / routing as well, and thats toddling along quite happily by itself.

                      Can replace the NICs without a prob - any users have strong recommendations? This is production grade, requiring 1GB RJ45 connectivity…
                      Looking through the tuning stuff, seems like a lot of Broadcom and Intel cards may have similar probs with nmbufs.

                      Looks like it might be bge0 or bge2+ which is failing (though I still don't get the 2+ bit). There's a PCI card in there as well as the onboard (ie. daughter card), so trying to ID which one is causing the issue could be fun!

                      1 Reply Last reply Reply Quote 0
                      • ? This user is from outside of this forum
                        Guest
                        last edited by

                        Looking through the tuning stuff,

                        It is not a must be, then more a can be done stuff. And with each CPU core one queue would be opened
                        per LAN port! So a 8 Core CPU is opening 8 queues for only one LAN Port, and this can be really tricky
                        if then not enough space is there, so highhing up the mbufs size will be a real gain for many of us.

                        seems like a lot of Broadcom

                        This is all driver pending and related stuff. The better the driver support the better you
                        pfSense will work with the LAN ports for sure. At the moment you will be really running
                        well with Intel cards! Intel Dual or Quad Port server adapter, i210, i350 or i354 would be
                        the best from the older and newer ones.

                        and Intel cards may have similar probs with nmbufs.

                        Once more again this is a problem with the FreeBSD kernel space size and historical grown up
                        until today and for freeing up much space from this kernel space we all get now the chance to
                        hug up the mbuf size and this can be done easily by adding some RAM inside of the pfSense
                        box as well as other tuning things named on the side under your link above.

                        1 Reply Last reply Reply Quote 0
                        • C Offline
                          cmb
                          last edited by

                          What is kern.ipc.nmbufs set to on your system? Run:

                          sysctl kern.ipc.nmbufs
                          

                          to see.

                          1 Reply Last reply Reply Quote 0
                          • J Offline
                            jasperdillon
                            last edited by

                            kern.ipc.nmbufs: 1,019,445
                            (for a little while, pre-reboot, it was set to >1mill in the tunables.)

                            We haven't actually had it panic in > 30 hrs now, which is the longest it's gone without any interruption in about 2 weeks…

                            1 Reply Last reply Reply Quote 0
                            • ? This user is from outside of this forum
                              Guest
                              last edited by

                              @jasperdillon:

                              kern.ipc.nmbufs: 1,019,445
                              (for a little while, pre-reboot, it was set to >1mill in the tunables.)

                              We haven't actually had it panic in > 30 hrs now, which is the longest it's gone without any interruption in about 2 weeks…

                              Perhaps you should tell us some hardware tech. specs. over the pfSense box it self, likes CPU,
                              Cores and SSD/HDD. To bring perhaps more stability to the entire pfSense box.

                              1 Reply Last reply Reply Quote 0
                              • C Offline
                                cmb
                                last edited by

                                @jasperdillon:

                                kern.ipc.nmbufs: 1,019,445
                                (for a little while, pre-reboot, it was set to >1mill in the tunables.)

                                Ok that's fine, maybe those logs were from before that change was applied. Just wanted to make sure since nmbclusters is usually what gets set, that it didn't somehow get set differently.

                                1 Reply Last reply Reply Quote 0
                                • J Offline
                                  jasperdillon
                                  last edited by

                                  Just to put some closure on this - looks like the problem has just 'gone away'.
                                  Changing it to 1mill (but not over) certainly helped, but didn't resolve it completely.

                                  Nothing has changed since in the pfSense config, but it's just not occuring anymore…

                                  1 Reply Last reply Reply Quote 0
                                  • D Offline
                                    divsys
                                    last edited by

                                    Probably well worthwhile to update to 2.2.5.

                                    In your case there may be a small "risk" in that you don't really know what "fixed" your issue, but the stability of 2.2.5 over older releases is worth it in my mind.

                                    -jfp

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.