Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    bsnmpd crashes regularly

    SNMP
    4
    7
    306
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      cameloid
      last edited by cameloid

      Hello,

      I am experiencing constant crashes of the bsnmpd service on a Netgate 4200 router running pfSense 24.03. The SNMP service settings are default; it is polled by an external monitoring application (Prometheus/snmp_exporter) every 5 seconds. The following entries appear in the log:

      [24.03-RELEASE][admin@pfSense.home.arpa]/var/log: cat system.log | grep snmp
      Nov  8 03:58:33 pfSense kernel: pid 578 (bsnmpd), jid 0, uid 0, was killed: failed to reclaim memory
      Nov  8 11:46:11 pfSense snmpd[57864]: disk_OS_get_disks: adding device 'da0' to device list
      Nov  8 23:50:22 pfSense kernel: pid 57864 (bsnmpd), jid 0, uid 0, was killed: a thread waited too long to allocate a page
      Nov 10 11:01:04 pfSense snmpd[57763]: disk_OS_get_disks: adding device 'da0' to device list
      

      Any solution/workaround?

      keyserK 2 Replies Last reply Reply Quote 0
      • B
        BeBoxer
        last edited by

        I'd try a longer polling interval first. 5 seconds is pretty aggressive by SNMP standards. Do you still have problems if you poll every 300 seconds (5 mintutes)?

        1 Reply Last reply Reply Quote 0
        • keyserK
          keyser Rebel Alliance @cameloid
          last edited by

          @cameloid It’s a known issue in 24.03 when you query BSNMP for pffilter status. By having a “long” polling interval you can postpone the time it takes to run out of open file memory (it leaves thousands of open files). Alternatively you can restart the BSNMP service to recoup the memory and close the files.

          Netgate opted to not fix the issue in 24.03, but it is fixed in 24.11, so I can only recommend you upgrade to that.

          Love the no fuss of using the official appliances :-)

          1 Reply Last reply Reply Quote 0
          • keyserK
            keyser Rebel Alliance @cameloid
            last edited by

            @cameloid Here’s the original post I made when I discovered the bug:

            https://forum.netgate.com/topic/188050/24-03-causes-sustained-rise-in-processes-count-and-memory-usage

            Love the no fuss of using the official appliances :-)

            1 Reply Last reply Reply Quote 0
            • J
              joekislo
              last edited by

              We suffered from this issue for a long time on 24.03, but we had been running free and clear since our upgrade to 24.11. Or we thought we were. Unfortunately we just got struck by it or a similar issue again:

              Apr 7 18:00:26 fw1 kernel: pid 84967 (bsnmpd), jid 0, uid 0, was killed: failed to reclaim memory

              Unfortunately we don't have visibility if bsnmpd was leaking file descriptors or if there was some other issue that caused bsnmpd to crash. Uptime is 34 days, which isn't too much longer than we were forcing bsnmpd restarts before.

              We'll keep a closer eye on it now that we know it's not fully fixed for us. Similar to others, we have zabbix monitoring the firewall.

              keyserK 1 Reply Last reply Reply Quote 0
              • keyserK
                keyser Rebel Alliance @joekislo
                last edited by

                @joekislo I haven’t seen the issue after upgrading to 24.11

                Love the no fuss of using the official appliances :-)

                J 1 Reply Last reply Reply Quote 0
                • J
                  joekislo @keyser
                  last edited by

                  We lost bsnmpd on two firewalls 5 minutes apart, starting to narrow this down.

                  This doesn't look like the same issue as before, which was a FD leak. This looks like a straight up memory leak. Both bsnmpd's crashed in the middle of a tenable scan. Looking at another pair of our firewalls bsnmpd is taking up a boatload of memory:
                  10545 root 1 20 0 5446M 3015M select 4 14:12 0.03% bsnmpd
                  and
                  34784 root 1 20 0 5382M 2626M select 6 16:50 0.00% bsnmpd

                  So 3GB and 2.6GB ram resident. These two firewalls crashed maybe 15 days ago (when I last posted). We're also seeing the firewalls run out of swap space before the kill event, although a day or few before:

                  2>1 2025-04-18T14:01:06.988988-04:00 fw kernel - - - swap_pager: out of swap space
                  <2>1 2025-04-18T14:01:06.989098-04:00 - - - swp_pager_getswapspace(2): failed
                  <2>1 2025-04-19T04:00:28.371252-04:00 kernel - - - swp_pager_getswapspace(30): failed

                  Before the final kill
                  <3>1 2025-04-20T04:55:09.682953-04:00 kernel - - - pid 65139 (bsnmpd), jid 0, uid 0, was killed: failed to reclaim memory

                  While not fully conclusive, my guess is bsnmpd leaks memory during a tenable scan. I suppose we could prove this by kicking off a scan and looking at memory before/after. However I'm going to capture the status.php and open a support case.

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.