Kernel crash - nmbufs?
-
To what you have changed the mbuf sizes to?
-
It's set to 1,000,000 now, and we're still expericing the issue. (the unit has 16GB RAM in it, so should be able to handle that)
The dashboard panel, and the RRD graphs for MBUF usage show it sitting idle at 1% usage - so unless it's an instantaneous spike - it doesn't look like we're actually reaching that cap and it's a red herring to some degree.
Can anybody clarify what the bge2+ section means? We're not actually using interface bge2 - instead bge0, bge4, and bge5… so seeing 2+ seems odd?
-
Crash log attached…
[pfsense crash.txt](/public/imported_attachments/1/pfsense crash.txt)
-
on a pfSense 2.2.3 setup as a transparant bridge,
Can you short explain what is in front of the pfSense and behind of the pfSense?
As an example:
Internet –- ISP --- modem --- Cisco Router --- pfSense --- LAN Switch --- LAN -
Internet – ISP link (colo'd kit) -- pfSense as bridge -- LAN switch -- LAN
There's 2 interfaces making the bridge, and an extra interface on a management network.
-
pfSense as bridge
Is bridging the ports together a so called "must be" for you or would also try out routing that
you come closer to the point that the problem is not based on the bridge here in this game? -
Can you replace the hardware or the physical NICs?
If the kernel is panicking, something really bad is happening. My quick guess is hardware failing and would recommend testing on new or replacement hardware.
-
Bridge setup is a definite requirement. We've got very similar hardware doing NAT / routing as well, and thats toddling along quite happily by itself.
Can replace the NICs without a prob - any users have strong recommendations? This is production grade, requiring 1GB RJ45 connectivity…
Looking through the tuning stuff, seems like a lot of Broadcom and Intel cards may have similar probs with nmbufs.Looks like it might be bge0 or bge2+ which is failing (though I still don't get the 2+ bit). There's a PCI card in there as well as the onboard (ie. daughter card), so trying to ID which one is causing the issue could be fun!
-
Looking through the tuning stuff,
It is not a must be, then more a can be done stuff. And with each CPU core one queue would be opened
per LAN port! So a 8 Core CPU is opening 8 queues for only one LAN Port, and this can be really tricky
if then not enough space is there, so highhing up the mbufs size will be a real gain for many of us.seems like a lot of Broadcom
This is all driver pending and related stuff. The better the driver support the better you
pfSense will work with the LAN ports for sure. At the moment you will be really running
well with Intel cards! Intel Dual or Quad Port server adapter, i210, i350 or i354 would be
the best from the older and newer ones.and Intel cards may have similar probs with nmbufs.
Once more again this is a problem with the FreeBSD kernel space size and historical grown up
until today and for freeing up much space from this kernel space we all get now the chance to
hug up the mbuf size and this can be done easily by adding some RAM inside of the pfSense
box as well as other tuning things named on the side under your link above. -
What is kern.ipc.nmbufs set to on your system? Run:
sysctl kern.ipc.nmbufs
to see.
-
kern.ipc.nmbufs: 1,019,445
(for a little while, pre-reboot, it was set to >1mill in the tunables.)We haven't actually had it panic in > 30 hrs now, which is the longest it's gone without any interruption in about 2 weeks…
-
kern.ipc.nmbufs: 1,019,445
(for a little while, pre-reboot, it was set to >1mill in the tunables.)We haven't actually had it panic in > 30 hrs now, which is the longest it's gone without any interruption in about 2 weeks…
Perhaps you should tell us some hardware tech. specs. over the pfSense box it self, likes CPU,
Cores and SSD/HDD. To bring perhaps more stability to the entire pfSense box. -
kern.ipc.nmbufs: 1,019,445
(for a little while, pre-reboot, it was set to >1mill in the tunables.)Ok that's fine, maybe those logs were from before that change was applied. Just wanted to make sure since nmbclusters is usually what gets set, that it didn't somehow get set differently.
-
Just to put some closure on this - looks like the problem has just 'gone away'.
Changing it to 1mill (but not over) certainly helped, but didn't resolve it completely.Nothing has changed since in the pfSense config, but it's just not occuring anymore…
-
Probably well worthwhile to update to 2.2.5.
In your case there may be a small "risk" in that you don't really know what "fixed" your issue, but the stability of 2.2.5 over older releases is worth it in my mind.