bsnmpd crashes regularly

cameloid

Hello,

I am experiencing constant crashes of the bsnmpd service on a Netgate 4200 router running pfSense 24.03. The SNMP service settings are default; it is polled by an external monitoring application (Prometheus/snmp_exporter) every 5 seconds. The following entries appear in the log:

[24.03-RELEASE][admin@pfSense.home.arpa]/var/log: cat system.log | grep snmp
Nov  8 03:58:33 pfSense kernel: pid 578 (bsnmpd), jid 0, uid 0, was killed: failed to reclaim memory
Nov  8 11:46:11 pfSense snmpd[57864]: disk_OS_get_disks: adding device 'da0' to device list
Nov  8 23:50:22 pfSense kernel: pid 57864 (bsnmpd), jid 0, uid 0, was killed: a thread waited too long to allocate a page
Nov 10 11:01:04 pfSense snmpd[57763]: disk_OS_get_disks: adding device 'da0' to device list

Any solution/workaround?

BeBoxer

I'd try a longer polling interval first. 5 seconds is pretty aggressive by SNMP standards. Do you still have problems if you poll every 300 seconds (5 mintutes)?

keyser

@cameloid It’s a known issue in 24.03 when you query BSNMP for pffilter status. By having a “long” polling interval you can postpone the time it takes to run out of open file memory (it leaves thousands of open files). Alternatively you can restart the BSNMP service to recoup the memory and close the files.

Netgate opted to not fix the issue in 24.03, but it is fixed in 24.11, so I can only recommend you upgrade to that.

keyser

@cameloid Here’s the original post I made when I discovered the bug:

https://forum.netgate.com/topic/188050/24-03-causes-sustained-rise-in-processes-count-and-memory-usage

joekislo

We suffered from this issue for a long time on 24.03, but we had been running free and clear since our upgrade to 24.11. Or we thought we were. Unfortunately we just got struck by it or a similar issue again:

Apr 7 18:00:26 fw1 kernel: pid 84967 (bsnmpd), jid 0, uid 0, was killed: failed to reclaim memory

Unfortunately we don't have visibility if bsnmpd was leaking file descriptors or if there was some other issue that caused bsnmpd to crash. Uptime is 34 days, which isn't too much longer than we were forcing bsnmpd restarts before.

We'll keep a closer eye on it now that we know it's not fully fixed for us. Similar to others, we have zabbix monitoring the firewall.

keyser

@joekislo I haven’t seen the issue after upgrading to 24.11

joekislo

We lost bsnmpd on two firewalls 5 minutes apart, starting to narrow this down.

This doesn't look like the same issue as before, which was a FD leak. This looks like a straight up memory leak. Both bsnmpd's crashed in the middle of a tenable scan. Looking at another pair of our firewalls bsnmpd is taking up a boatload of memory:
10545 root 1 20 0 5446M 3015M select 4 14:12 0.03% bsnmpd
and
34784 root 1 20 0 5382M 2626M select 6 16:50 0.00% bsnmpd

So 3GB and 2.6GB ram resident. These two firewalls crashed maybe 15 days ago (when I last posted). We're also seeing the firewalls run out of swap space before the kill event, although a day or few before:

2>1 2025-04-18T14:01:06.988988-04:00 fw kernel - - - swap_pager: out of swap space
<2>1 2025-04-18T14:01:06.989098-04:00 - - - swp_pager_getswapspace(2): failed
<2>1 2025-04-19T04:00:28.371252-04:00 kernel - - - swp_pager_getswapspace(30): failed

Before the final kill
<3>1 2025-04-20T04:55:09.682953-04:00 kernel - - - pid 65139 (bsnmpd), jid 0, uid 0, was killed: failed to reclaim memory

While not fully conclusive, my guess is bsnmpd leaks memory during a tenable scan. I suppose we could prove this by kicking off a scan and looking at memory before/after. However I'm going to capture the status.php and open a support case.