Recurring Error: kern.ipc.nmbufs limit reached

tmgordon

I have read multiple threads on here regarding this error and have not been able to diagnose. I have a few questions that I could use some help with.

I will start with my setup.

Host
Hyper-V
AMD 8320 Eight-Core Processor
16GB RAM
Samsung 850 PRO SSD
1 x Qualcomm Atheros Network Adapter (Killer e2200 Gigabit Ethernet Controller NDIS 6.30)
2 x ASIX AX88179 Gigabit Ethernet Controller

Virtual Firewall
PFSense 2.3.2-RELEASE-p1 (amd64)
28GB Virtual HDD on SSD
1 CPU
768MB RAM

4 Network Adapters (using standard, not legacy)
WAN: Virtual switch on Killer e2200 Gigabit Ethernet Controller
WLAN/LAN: Virtual switch on ASIX AX88179
SECURE VPN: Virtual switch (internal only)
DMZ: Virtual switch (internal only)

4 VPN Interfaces
4 OpenVPN connections to various locations (used in Gateway group as the only outbound gateway for SECURE VPN network)

Background Info

I started getting these kern.ipc.nmbuf errors after upgrading PFSense up to a year ago. At that time I decided to try out a few other firewalls anyway, and I don't remember what version I was on at the time. I first setup Untangle and realized I couldn't do what I wanted with the VPN. I then setup OPNsense and ran it for a while. This ended up crashing at some point and the OS wouldn't even load anymore. I have since gone back to the latest version of PFSense and these errors have returned.

Troubleshooting

I have tried increasing nmbuf clusters as per the Tuning and Troubleshooting Network Cards guide at https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#mbuf_.2F_nmbclusters.

Increased first to 100,000 > 200,000 > 1,000,000 (the problem remained on all levels)

As you can see, the mbufs clusters in use (which is what shows on the PFSense graphs), is always low. The mbufs in use has gone over 1 million, and the denied mbuf requests are 255 million. This is the output during a kern.ipc.nmbufs limit reached failure.

I don't have any of the network adapters listed in the troubleshooting guide, however I would be willing to try additional settings. Is this guide referring to physical connections to PFSense or the card on the other end of a virtual switch (Hyper-V or VMware). I imagine PFSense can't even see what the network card on the other end is, it's just using virtual hardware.

Should I try going higher than 1 million? What could be causing it to go so high? Should I cut the VPN gateway to a primary and backup? Purchase new network adapters?

Here is the usage during normal operation.

heper

you appear to be hitting mbufs / you adjusted mbuf_clusters (= not the same)
There might be an issue with the virtual drivers of the networks adapters

you could bump the mbufs by specifying "a value" in load.conf.local, but its possible you'll just keep hitting the limit/whatever you set it to

i've never run pfSense on hyper-v, but i'm sure there are some guru-posts about it somewhere

Guest

768MB RAM

If you might be able to hug up more RAM perhaps, pending on what you have actual free, you might be better
of going to high up the mbuf size from actual number to 250000, 500000 or to 1000000, please be carefully with that you may
perhaps ending up in a booting loop with to less RAM. And then you might be shorten down or set up another num_queues number
for all NICs. Let us say 2 for all NICs or 4 as a maximum for all NICs. NIC for NIC you need to know how many queues each NIC will
be opened by "default" and then you should be able to shorten this down.

tmgordon

I realize mbuf's and mbuf clusters are two different statistics, however if they are completely separate from each other then can anybody explain why when the kern.ipc.nmbufs limit reached error occurs the mbufs in use are always just over the limit set for mbuf clusters? For instance, when I had mbuf clusters set to 1,000,000 the error showed 1,033,130 mbufs in use. When I had it set to 200,000, it showed just over 200,000 mbufs in use when the issue occurred. Are you certain these two are not related? It seems to me that increasing the clusters increases the number of mbufs that can be used (which would make sense to me, but I am not an expert in this field).

As of right now I have increased memory from 768 to 1024 and the firewall has gone several days without an issue. Typically the issue would have occurred several times a day. I am hopeful that this is a suitable solution, or at the very least a way to extend the time between occurrences. If it occurs again I will increase to 1280 or 1536.