mbuf cluster rising after upgrade from 2.5.2
-
I have the mbuf that a couple of other people have reported. The mbuf just continues to go up until it crashes the firewall. Any assistance in resolving it would be greatly appreciated. If you need any additional information please let me know.
On 12/20 I upgraded from 2.5.2 to 2.6.0 then to 2.7.0 then to 2.7.2.
The process seemed to go well (only thing of note is I had a bridge setup that didnt remain after the upgrade) however after two days the system stopped responding because of mbuf was fully consumed.
After much troubleshooting I found that this was a problem when using DCO with openvpn. I however DCO was not enabled but I was using openvpn.
I removed my openvpn configuration, enabled wireguard, raised the mbuf cache and everything seemed good for a few days. After 3 weeks it happened again. I rebooted and it came up but this is only a tempoary fix.What can I do to fully resolve it? Here you can see what the mbuf was like for the last 3 months.
I know people often ask about how the traffic has changed so here is that report for the last 3 months.
The interfaces are igb0-4.
I am mostly just using igb0 and igb1.These are the packages I have installed:
acme
apcupsd
arpwatch
bandwidthd
lldpd
mailreport
nmap
pfBlockerNG-devel
suricata
WireGuardhere is my current (I think 5 days after last reboot) netstat -m:
-
Hmm, that graph never shows it reaching the max value but it could be averaging it out if there was a spike?
You say it was OK for a while after disabling your OpenVPN config but then started rising again? Anything logged when that happened?
Steve
-
@stephenw10 That is a good an interesting point. You dont see it hitting the max at any point in time. That may be because of the resolution of the graph, but I dont know that this is the case. The only logs I saw anywhere was when I hooked up the monitor to see why the system was offline. That is where I saw the error message about the mbuf clusters being exceeded.
I'll dig through some more logs and see if I can find those log entries. But for now here is the the 1 hour resolution for the last week. It looks like the highest logged is 1.2M. It is clear though when you compare it to before the update that some is definitely functioning differently after the update. Before it never got above 1% and now its almost constantly rising.
-
Here is a picture I took. This one is from the first time it happened before I disabled openvpn.
-
Also it looks like before the system went offline I was able to forward the errors to my syslog server:
-
Hmm, well that seems pretty conclusive. I wonder if it stopped allowing further mbufs for some other reason than it was actually exhausted....
Were you able to access the console at that point? Querying
netstat -m
whilst in that state would be telling. -
@stephenw10 I think I did run it but I didnt take a picture of it.
-
@captain118 Its been a while since I did it the first time. I dont remember if the console was responsive or not. I may not have run it until after a reboot. Either way, I dont have a picture of it.
-
Are you able to test a clean 2.7.2 install with your config restored into it?
Nothing you're running seems particularly exotic so it could be something that didn't upgrade as expected. Though I'd expect more failures than that in that situation.
I assume you don't have anything custom installed?
-
@stephenw10 Nothing custom installed. I did wonder if there was a possiblity that the bridge that didnt survive the update could have been part of what caused the issues I'm seeing. I can do a fresh install of 2.7.2 in a couple of weeks. I wanted to make sure there wasnt anything I should do before resorting to a full rebuild.
I see you work at netgate. Is there anything that would be useful for you to have before I do the rebuild? I wish I had taken some time between the 2.6 and 2.7 upgrades so I could pinpoint where I saw the problem arise, but alas that was not the case.
-
Hmm, I missed the bridge. What did you have bridged? What do you in place of that now?
Grabbing a status_output file is always useful whenever any debugging is required:
https://docs.netgate.com/pfsense/en/latest/recipes/diagnostic-data.html -
@stephenw10 I had bridged my main inside interface with another interface to function as a span interface for capturing traffic to be analyzed by security onion. I havent set it back up yet.
-
Hmm, but none of that remained in the config? No bridge device shown in
ifconfig
? -
@stephenw10 It doesnt show in either the UI or the ifconfig output.
-
Hmm, well I would be trying a clean 2.7.2 install at this point. I'm not aware of any known mbuf leak so we need to determine if it's the config or something broken in the install.