Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    CE 2.8.1 bsnmpd Memory Leak

    Scheduled Pinned Locked Moved General pfSense Questions
    22 Posts 4 Posters 2.5k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      Averlon
      last edited by

      Here a bsnmpd process after ~18 hours uptime on 2.8.1

      ps aux | grep bsnmpd
      root     9909   0.0 16.9 2990476 1412004  -  Ss   03:00       5:13.17 /usr/sbin/bsnmpd -c /var/etc/snmpd.conf -p /var/run/snmpd.pid
      

      On a 2.7.2 maschine the process barley reaches 400MB after a week of uptime

      ps aux | grep bsnmpd
      root    73322   20.2  1.0  379016  340036  -  Rs   21Sep25   1129:39.85 /usr/sbin/bsnmpd -c /var/etc/snmpd.conf -p /var/run/snmpd.pid
      

      The configuration is identical on both firewalls, except the redacted <variables>

      location := "<location>"
      contact := ""
      read := "<ro-community>"
      system := 1     # pfSense
      %snmpd
      sysDescr			= "pfSense <hostname> 2.8.1-RELEASE FreeBSD 15.0-CURRENT amd64"
      begemotSnmpdDebugDumpPdus       = 2
      begemotSnmpdDebugSyslogPri      = 7
      begemotSnmpdCommunityString.0.1 = $(read)
      begemotSnmpdCommunityDisable    = 1
      begemotSnmpdPortStatus.<ip-1>.161 = 1
      begemotSnmpdPortStatus.<ip-2>.161 = 1
      begemotSnmpdLocalPortStatus."/var/run/snmpd.sock" = 1
      begemotSnmpdLocalPortType."/var/run/snmpd.sock" = 4
      
      # These are bsnmp macros not php vars.
      sysContact      = $(contact)
      sysLocation     = $(location)
      sysObjectId     = 1.3.6.1.4.1.12325.1.1.2.1.$(system)
      
      snmpEnableAuthenTraps = 2
      begemotSnmpdModulePath."mibII"  = "/usr/lib/snmp_mibII.so"
      begemotSnmpdModulePath."netgraph" = "/usr/lib/snmp_netgraph.so"
      %netgraph
      begemotNgControlNodeName = "snmpd"
      begemotSnmpdModulePath."pf"     = "/usr/lib/snmp_pf.so"
      begemotSnmpdModulePath."hostres"     = "/usr/lib/snmp_hostres.so"
      begemotSnmpdModulePath."ucd"     = "/usr/local/lib/snmp_ucd.so"
      begemotSnmpdModulePath."regex"     = "/usr/local/lib/snmp_regex.so"
      
      1 Reply Last reply Reply Quote 0
      • stephenw10S Online
        stephenw10 Netgate Administrator
        last edited by

        Hmm, I haven't seen that. But I'm also not querying that fast or all the rules like that.

        In 2.7.2 I assume the total memory use doesn't continue to climb?

        And in 2.8.1 it eventually exhausts the available RAM and causes services to fail?

        A 1 Reply Last reply Reply Quote 0
        • A Offline
          Averlon @stephenw10
          last edited by

          The firewall or services doesn't fail completely, but start to acting unusual due to the memory exhaustion. A few times FFR got stuck and needed to be restarted. With 2.7.2 and all previous releases down to 2.4, the memory usage of the process stayed constant at a level less than 500MB.

          This is the memory usage in 2.7.2:

          9a418a9b-a74e-48c2-ba46-973a2c2dc73d-image.png

          This in is the same firewall in 2.8.1

          f2e8a2f2-435a-4552-924b-acb1e9851690-image.png

          On 27th of September, I noticed the issue and configured a cron to restart the server every 24h.

          The next chart shows the swap usage (orange). When hitting the 100% the problems start.

          5809452e-06b9-488e-a2ce-378f07857478-image.png

          1 Reply Last reply Reply Quote 0
          • stephenw10S Online
            stephenw10 Netgate Administrator
            last edited by

            Hmm, OK well that seems pretty conclusive. Let me see if I can replicate it....

            A 1 Reply Last reply Reply Quote 0
            • stephenw10S Online
              stephenw10 Netgate Administrator
              last edited by

              Did you open a bug report for this yet? (not seeing one)

              1 Reply Last reply Reply Quote 0
              • A Offline
                Averlon
                last edited by

                Not yet, as I wrote - I want to check first, if someone else is running into this, too. I'll take care of this, as soon as I figure out my credentials for redmine. Haven't been there for a while.

                1 Reply Last reply Reply Quote 0
                • stephenw10S Online
                  stephenw10 Netgate Administrator
                  last edited by

                  Mmm, yeah we haven't managed to replicate it here yet. Still trying some variations....

                  1 Reply Last reply Reply Quote 0
                  • A Offline
                    Averlon
                    last edited by

                    Redmine created: https://redmine.pfsense.org/issues/16456

                    I've check the complete SNMP monitoring of the affected devices and identified these OIDs of MIBs in use for pooling on a 60 seconds base:

                    • 1.3.6.1.4.1.2021.4
                    • 1.3.6.1.4.1.2021.11
                    • 1.3.6.1.2.1.25.3.3.1
                    • 1.3.6.1.2.1.25.4.2.1
                    • 1.3.6.1.2.1.2.2.1
                    1 Reply Last reply Reply Quote 1
                    • A Offline
                      Averlon @stephenw10
                      last edited by

                      @stephenw10 said in CE 2.8.1 bsnmpd Memory Leak:

                      Let me see if I can replicate it....

                      Hi @stephenw10, do you had the chance to replicate this behavior?

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S Online
                        stephenw10 Netgate Administrator
                        last edited by

                        Nope not yet. We did find and fix a different memory leak. Devs are still reviewing.

                        1 Reply Last reply Reply Quote 0
                        • P Offline
                          psp
                          last edited by

                          Just to add one vote to confirm memory leak. Using with zabbix standard template reading values, every week needs to restart snmpd service due to swap space full.
                          As a temporary workaround, added daily in cron: pfSsh.php playback svc restart bsnmpd

                          K 1 Reply Last reply Reply Quote 0
                          • K Offline
                            kprovost @psp
                            last edited by

                            @psp As Steve said, I've been unable to reproduce this.

                            It'd be helpful if you could narrow down the OID to (repeatedly) query to reproduce the leak.

                            A 1 Reply Last reply Reply Quote 0
                            • A Offline
                              Averlon @kprovost
                              last edited by

                              @kprovost

                              I've posted all OIDs my monitoring is walking through. Maybe @psp can share the OIDs he's using, but I'm pretty sure these are almost the same. Maybe it's a combination of services running on the firewall. @psp do you have also telegraf running?

                              P 1 Reply Last reply Reply Quote 0
                              • P Offline
                                psp @Averlon
                                last edited by

                                @Averlon No telegraf here. Standard Zabbix "pfSense by SNMP" template uses BEGEMOT-PF-MIB and HOST-RESOURCES-MIB to walk interfaces and get OID values.

                                A 1 Reply Last reply Reply Quote 0
                                • A Offline
                                  Averlon @psp
                                  last edited by

                                  @kprovost BEGEMOT-PF-MIB and HOST-RESOURCES-MIB it is.

                                  K 1 Reply Last reply Reply Quote 0
                                  • K Offline
                                    kprovost @Averlon
                                    last edited by

                                    @Averlon I still can't reproduce this leak. That's why I asked if you could identify the specific OID that's causing the leak. That way I could inspect the code and recent related changes to see if I could identify the leak.
                                    Naturally I cannot do that for the entire trees under those two nodes.

                                    A 1 Reply Last reply Reply Quote 0
                                    • A Offline
                                      Averlon @kprovost
                                      last edited by

                                      @kprovost I can nail down the BEGEMOT-PF-MIB to these OIDs for regular walks

                                      • 1.3.6.1.4.1.12325.1.200.1.1
                                      • 1.3.6.1.4.1.12325.1.200.1.11
                                      • 1.3.6.1.4.1.12325.1.200.1.2
                                      • 1.3.6.1.4.1.12325.1.200.1.3
                                      • 1.3.6.1.4.1.12325.1.200.1.5
                                      • 1.3.6.1.4.1.12325.1.200.1.8

                                      For HOST-RESOURCES-MIB to these OIDs

                                      • 1.3.6.1.2.1.25.3.3.1
                                      • 1.3.6.1.2.1.25.4.2.1

                                      After 24h the bsnmp process is at approx. 500MB if the monitoring is querying the device on a 60 second base. Not sure why this doesn't occur when you try to reproduce it. It may require a certain count of firewall rules to trigger the condition which causing the leak. The firewalls I observed this, have ~ 400 to 600 rules.

                                      K 1 Reply Last reply Reply Quote 1
                                      • K Offline
                                        kprovost @Averlon
                                        last edited by

                                        @Averlon Do each of those leak individually or do you need to query all of those to provoke the leak?

                                        You've listed most of the BEGEMOT-PF-MIB mib here.

                                        My test setup has the standard CE ruleset, which is already has 100 rules in it, and there's no point at which the relevant code does different things for more rules.

                                        All of the nodes you mentioned are populated in my test setup, and I've been polling as quickly as the test device would respond, not every 60 seconds. I'd expect that to result in an even faster leak that you described, but there's nothing.

                                        A 1 Reply Last reply Reply Quote 0
                                        • A Offline
                                          Averlon @kprovost
                                          last edited by Averlon

                                          @kprovost I'm at the point where I doubt the leak can be triggered by simply running queries on specific MIBs. Monitoring with SNMP is still very common and this topic hasn't got much attention, what let assume that not every configuration is affected by this. In addition your tests shown that the conditions to reproduce this issue, isn't straight forward as running snmpwalks as fast as possible.

                                          I have three different pfSense installations which are affected by leaks of the bsnmp process. All of these have this in common:

                                          • These are VMs
                                          • SNMP v2 is in use (YES - Security Not My Problem) 😬
                                          • Bridge Interfaces for transparent filtering are in used (physical interfaces only, no bridge interface)
                                          • IPSec with VTI Interfaces are used
                                          • The SNMP monitoring runs queries via the VTI IFs to the LAN IP, these may flap.
                                          • BGP from FFR Package is running and peering via VTIs
                                          • There is a pretty high load of dropped / logged IP communication from the firewall
                                          • The Table sizes are pretty large due pfBlockerNG rules

                                          I'm currently on a business trip and cannot do further tests in my environment.I may find some time next weekend to do more investigation.

                                          Meanwhile maybe @psp can share some details about his environment to find the common ground for this issue.

                                          1 Reply Last reply Reply Quote 1
                                          • A Offline
                                            Averlon
                                            last edited by

                                            Just got home today and my wife told me about the plans for the weekend. Unfortunately it doesn't include troubleshooting sessions for this issue. I'll have to postpone contributions till next week - sorry.

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.