CE 2.8.1 bsnmpd Memory Leak
-
Hi, after an upgrade from 2.7.2 to 2.8.1 I notice some unsual behavior on a few firewalls in my home network. IPsec sessions don't rekey as usual, bgp session flaps or get stuck in active for no obvious reason and so on. Looking more closely on the affected instances, I noticed a high memory usage of the bsnmpd process on all devices. By high, I'm talking about 2 to 5 GB reserved memory. Some devices run into swap usage at 100%
I'm using Prometheus SNMP exporter on these firewalls to query the BEGEMOT-PF-MIB (OID 1.3.6.1.4.12325.200) for monitoring firewall rule usage. The bsnmp process always consumed a high amount of cpu time due to the querying of all firewall rules with stats every 60 seconds , but memory usage never was an issue - until the latest release. The monitoring was implemented years ago in against pfsense 2.4.x or 2.5.x.
For now I'm limiting the memory consumption with a con job, which restarts bsnmpd every 24h.Does anybody observed similar problems? Haven't found a redmine issue on this, yet and would like to hear from the community if there are similar observations, before opening a report.
-
Here a bsnmpd process after ~18 hours uptime on 2.8.1
ps aux | grep bsnmpd root 9909 0.0 16.9 2990476 1412004 - Ss 03:00 5:13.17 /usr/sbin/bsnmpd -c /var/etc/snmpd.conf -p /var/run/snmpd.pid
On a 2.7.2 maschine the process barley reaches 400MB after a week of uptime
ps aux | grep bsnmpd root 73322 20.2 1.0 379016 340036 - Rs 21Sep25 1129:39.85 /usr/sbin/bsnmpd -c /var/etc/snmpd.conf -p /var/run/snmpd.pid
The configuration is identical on both firewalls, except the redacted <variables>
location := "<location>" contact := "" read := "<ro-community>" system := 1 # pfSense %snmpd sysDescr = "pfSense <hostname> 2.8.1-RELEASE FreeBSD 15.0-CURRENT amd64" begemotSnmpdDebugDumpPdus = 2 begemotSnmpdDebugSyslogPri = 7 begemotSnmpdCommunityString.0.1 = $(read) begemotSnmpdCommunityDisable = 1 begemotSnmpdPortStatus.<ip-1>.161 = 1 begemotSnmpdPortStatus.<ip-2>.161 = 1 begemotSnmpdLocalPortStatus."/var/run/snmpd.sock" = 1 begemotSnmpdLocalPortType."/var/run/snmpd.sock" = 4 # These are bsnmp macros not php vars. sysContact = $(contact) sysLocation = $(location) sysObjectId = 1.3.6.1.4.1.12325.1.1.2.1.$(system) snmpEnableAuthenTraps = 2 begemotSnmpdModulePath."mibII" = "/usr/lib/snmp_mibII.so" begemotSnmpdModulePath."netgraph" = "/usr/lib/snmp_netgraph.so" %netgraph begemotNgControlNodeName = "snmpd" begemotSnmpdModulePath."pf" = "/usr/lib/snmp_pf.so" begemotSnmpdModulePath."hostres" = "/usr/lib/snmp_hostres.so" begemotSnmpdModulePath."ucd" = "/usr/local/lib/snmp_ucd.so" begemotSnmpdModulePath."regex" = "/usr/local/lib/snmp_regex.so"
-
Hmm, I haven't seen that. But I'm also not querying that fast or all the rules like that.
In 2.7.2 I assume the total memory use doesn't continue to climb?
And in 2.8.1 it eventually exhausts the available RAM and causes services to fail?
-
The firewall or services doesn't fail completely, but start to acting unusual due to the memory exhaustion. A few times FFR got stuck and needed to be restarted. With 2.7.2 and all previous releases down to 2.4, the memory usage of the process stayed constant at a level less than 500MB.
This is the memory usage in 2.7.2:
This in is the same firewall in 2.8.1
On 27th of September, I noticed the issue and configured a cron to restart the server every 24h.
The next chart shows the swap usage (orange). When hitting the 100% the problems start.
-
Hmm, OK well that seems pretty conclusive. Let me see if I can replicate it....
-
Did you open a bug report for this yet? (not seeing one)
-
Not yet, as I wrote - I want to check first, if someone else is running into this, too. I'll take care of this, as soon as I figure out my credentials for redmine. Haven't been there for a while.
-
Mmm, yeah we haven't managed to replicate it here yet. Still trying some variations....
-
Redmine created: https://redmine.pfsense.org/issues/16456
I've check the complete SNMP monitoring of the affected devices and identified these OIDs of MIBs in use for pooling on a 60 seconds base:
- 1.3.6.1.4.1.2021.4
- 1.3.6.1.4.1.2021.11
- 1.3.6.1.2.1.25.3.3.1
- 1.3.6.1.2.1.25.4.2.1
- 1.3.6.1.2.1.2.2.1
-
@stephenw10 said in CE 2.8.1 bsnmpd Memory Leak:
Let me see if I can replicate it....
Hi @stephenw10, do you had the chance to replicate this behavior?
-
Nope not yet. We did find and fix a different memory leak. Devs are still reviewing.