MBUF usage at 87%



  • Sorry for the simple question folks, but I couldn't figure out what to do after searching.  I have a Supermicro A1SAM-2550F PF sense machine running version 2.2.1.  Since my motherboard has motherboard has 4 intel NICS I thought this might be the issue.  I have 8GB of Kingston ECC ram.  For a while the MBUF was running at 33% but today I noticed it is quite high.  Is this a problem?



  • Perhaps this would help you out to tune your NICs a little bit.
    Tuning and Troubleshooting Network Cards


  • Netgate Administrator

    What actual MBUF values (used/max) are you seeing?

    Steve



  • I will post the exact used and max values tonight.  The used value was 87% but I don't remember out of how much.  I did reboot the computer last night and the values dropped back down to about 33%.  I did find the statement about 4 NIC's in the link posted above and found the line kern.ipc.nmbclusters="1000000", but I honestly wasn't sure how to implement it.  I don't know how to get a command prompt within pfsense to make the change in the referenced file.  Thanks folks.



  • You don't need 1M mbufs.  Gonzopancho smacked me around for posting that suggestion in a separate thread about RCC-VE hardware, but also included some educational material.

    I have production firewalls with 40+ users that run with 25k mbufs  only actively using roughly 1800 or so.

    See:
    @gonzopancho:

    @almabes:

    Another tweak…

    Certain intel igb cards, especially multi-port cards, can very easily exhaust mbufs and cause kernel panics, especially on amd64. The following tweak will prevent this from being an issue:
    In /boot/loader.conf.local - Add the following (or create the file if it does not exist):
    kern.ipc.nmbclusters="1000000"
    That will increase the amount of network memory buffers, allowing the driver enough headroom for its optimal operation.

    see:  https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#Intel_igb.284.29_and_em.284.29_Cards

    the kernel doesn't panic when you exhaust mbufs, it panics when you set this limit too high (and your number is too high), because
    the system runs out of memory.

    For each mbuf cluster there is “mbuf” structure needed.  These each consume 256 bytes, and are used to organize mbuf clusters in chains. An mbuf cluster takes another 2048 bytes (or more, for jumbo frames).  There’s possibility to store some additional useful 100B data into the mbuf, but it is not always used.

    When there are no free mbuf clusters available, FreeBSD enters the zonelimit state and stops answering network requests. You can see it as the zoneli state in the output of the top command.  It doesn't panic, it appears to 'freeze' for network activity.

    If  your box has 1GB of RAM or more, 25K mbuf clusters will be created by default.  Occasionally this is not enough.  If it is, then perhaps doubling that value, and maybe doubling again, are in order.  But 1M mbuf clusters?  Are you serious?

    You just advised people to consume 1,000,000 mbuf clusters (at 2K each).  Let me know if I need to explain how much RAM you needlessly advised people to allocate for no good purpose.

    I am well-aware that someone wrote something completely uninformed here: https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#mbuf_.2F_nmbclusters
    so please don't quote it back to me.



  • My MBUF usage readings are below.
    37% (9876/26584).
    Is this acceptable?  As I said over about a period of a month or so I noticed the value was at 87%.


  • Netgate Administrator

    If it was up to 87% and still climbing then that's an issue because you don't want to run out. Try doubling it at first and keep an eye on the mbuf RRD graphs.
    To do that you want to add the line shown to the file /boot/loader.conf.local. You can do that from the GUI be executing the following in the Diagnostics > Command prompt box:

    echo 'kern.ipc.nmbclusters="50000"' >> /boot/loader.conf.local
    

    That will create the file. If you need to change it again you can do so via Diagnostics > Edit file. It only takes effect at boot though.

    Steve



  • My MBUF usage was sitting at 200006/~256000, or about 78% with no traffic going through the box. Granted it didn't go up much when I had small amounts of testing traffic but sitting at over 75% of capacity all the time really made me nervous.

    I added "kern.ipc.nmbclusters="1000000" to the /boot/loader.conf.local file and now my MBUF usage is comfortably at 2%. Memory usage is also comfortably at 5% vs 3% previously. A small price to pay I think to ensure that my firewall won't stop passing traffic if things get busy.

    I suppose I could use a lower number. But the default just seemed off.


  • Netgate Administrator

    What CPU and NICs are you using?



  • I ran the command suggested by stephenw10 and the values are now 18% (9120/50000) after a reboot.  Thanks for the help everyone.  You guys were awesome.



  • I am using a Supermicro A1SRi-2758F.


  • Netgate Administrator

    Ah, OK. You will see a lot then. Cores X NICs X mbuf allocation = big.  :)

    Steve



  • Can someone knowledgeable post some guidelines for mbuf configuration?  There's incomplete and conflicting information out there which is confusing folks.  Some information as to what might cause mbuf utilzation to climb would be useful too.

    Thanks



  • I'm running a Supermicro MBD-A1SRM-LN7F-2758 with 16gb memory with pfSense 2.2.2. My initial MBUF was 73% (19496/26584) memory usage 4%. I edited the /boot/loader.conf.local using Diagnostics-Edit file and added kern.ipc.nmbclusters="1000000"

    Now , after reboot my MBUF is 2% (19750/1000000) Memory usage is 2%.



  • I have the Supermicro A1SRi-2758F with 16GB RAM.  I ran into the same MBUF problem initially.  I had to up kern.ipc.nmbclusters to 1000000 and now all is good.



  • @stephenw10:

    Ah, OK. You will see a lot then. Cores X NICs X mbuf allocation = big.  :)

    Steve

    I've got a system with Atom D525 (4 cores - 1 package(s) x 2 core(s) x 2 HTT threads) and 5 Intel Gigabit NICs, all use the em driver, MBUF is at 2%, no tweak.
    I've also got a new A1SRi-2758F with Atom C2758 (8 cores - 1 package(s) x 2 core(s) x 2 HTT threads) and 4 Intel Gigabit NICs, all use the igb driver, MBUF is at 14%, no tweak.

    Both in the same place using exactly the same config (the 5th NIC on the D525 not connected to keep machines interchangeable).

    Don't see why the MBUFs are so much higher on the C2758. According to your math it shouldn't be more than 5%…
    Couldn't we somehow force to use the em driver instead of the igb driver on the intel nics of A1SRi-2758F?


  • Netgate Administrator

    I guess there are more variables in play than I'm aware of. Most likely the usage scales with traffic throughput. Though I'm guessing now….  ::)
    There's no way to use the em driver with newer Intel NICs as far as I know.

    Steve



  • There could be some automatic detection at boot which would pre-set the correct value based on the specific hardware, using some math like you suggested.



  • I own a new SG-4860 and only did some basic configuration and testing so far. However, the usage of MBUF is causing issues:

    MBUF Usage:  81% (21516/26584)  <- just booted

    MBUF Usage: 100% (26584/26584)  <- climbing without anything really happening on the box
    ...
    kernel: [zone: mbuf_cluster] kern.ipc.nmbclusters limit reached
    Uptime ~18h

    As suggested by Steve before, I start now the game of doubling the nmbclusters until the box it not freezing anymore. But is it just my box, or is that a general issue with the SG-*? Aren't they already tuned?


  • Netgate Administrator

    We were discussing that internally just recently. In testing the limit was not reached apparently but as always the real world can be different to the test bench. You're not the first person to query that setting.
    It's likely that value will be set higher by default in future releases for the SG series. If you run real world tests and come to a conclusion about a suitable value we'd love to hear it.

    Steve

    Edit: managed to leave out an entire word there!



  • I'm still in playing and testing mode. Currently, it's 28090 and raising step for step with my changes. But I'll tell you once the value has settled.



  • From what I have learned working with supermule on DDosing pfSense, there are major architectural differences between the two Intel gigE drivers, em and igb.

    The long and the short of it is Intel cards that use the igb driver set up more queues, which require more mbufs than the older em driver.  I have also read somewhere, that there is talk of Intel rewriting the em driver to make it more scalable, and more like the igb driver.


  • Rebel Alliance Developer Netgate

    For 2.2.3 we bumped up the default mbuf allocation on the 4860 and its relatives.

    You can adjust the nmbclusters value as described at https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards – though one change: We have found that it's also able to be adjusted "live" by adding it as a system tunable. Used to be it only worked as a loader.conf(.local) value but not any more.



  • @jimp:

    though one change: We have found that it's also able to be adjusted "live" by adding it as a system tunable. Used to be it only worked as a loader.conf(.local) value but not any more.

    Hi,

    I couldn't find it (kern.ipc.nmbclusters) in System->Advantage->System tunable list, using 2.2.3


  • Rebel Alliance Developer Netgate

    It's not there by default, that list isn't limited though. Click + to add it in and set whatever value you want.



  • @jimp:

    It's not there by default, that list isn't limited though. Click + to add it in and set whatever value you want.

    Oh, I c, Thanks.



  • For SG-4860, you bumped nmbclusters to 26584 in 2.2.3? That's still too close to the max values I witnessed while playing on my box:

    max current = 26.60K
    max total= 28.09K

    I go with 32K to have a stable box.


  • Rebel Alliance Developer Netgate

    It's not corrected on upgrade, new install only. The 4860 units should probably have 2-4x that value, give or take, minimum.



  • So, what is the value that is being chosen on new installs for the SG-4860?

    [I've just upgraded my SG-2400 to a SG-4860, but I'm am using the same drive & essentially same config that was in the SG-2440]

    Thanks.

    @jimp:

    It's not corrected on upgrade, new install only. The 4860 units should probably have 2-4x that value, give or take, minimum.



  • @jimp:

    It's not corrected on upgrade, new install only.

    What is the recommended upgrade method to get such tuning parameters after an install? Re-installing an SG-* box for each new version of pfsense is not the way, is it?


  • Netgate Administrator

    Create the file /boot/loader.conf.local
    Add to it the line:

    kern.ipc.nmbclusters="131072"
    

    Or whatever value you want. As described here: https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#mbuf_.2F_nmbclusters
    Reboot to see the new value loaded.

    Steve



  • I remember reading somewhere that it's not needed to mess around with /boot/loader.conf.local anymore.
    Creating a new system tunable also works. If it does, that would be better, because it's also stored in the config file, which means it's preserved for the future.


  • Netgate Administrator

    Good point. Jim wrote that in this thread: https://forum.pfsense.org/index.php?topic=92253.msg532431#msg532431
    I guess I'm too used to using loader.conf.local.  ::)

    Steve



  • This seems to be the latest post on MBUF - just to note that in 2.3.2 you can add the kern.ipc.nmbclusters variable in system tunables and it takes effect immediately without needing a reboot. Worked well for me


  • Rebel Alliance Developer Netgate

    There is still an advantage to putting the value in loader.conf.local, however. If the hardware requires more mbufs to properly initialize at boot time, it may not be able to do so if you have only set the value as a tunable. If the problem is that the usage increases with load after boot time, then it is OK to use a tunable.