Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login
    Introducing Netgate Nexus: Multi-Instance Management at Your Fingertips.

    CE 2.8.1 bsnmpd Memory Leak

    Scheduled Pinned Locked Moved General pfSense Questions
    70 Posts 10 Posters 8.3k Views 15 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      JD 0 @Wolfsbane2k
      last edited by

      @Wolfsbane2k I've spent the last two weeks+ "dumbing down" the agent to the point where I have not much else to remove. I'm fairly convinced it's the number of oids and cadence of polling and that bsnmp itself leaks memory like a sieve through some fundamental coding issue. My next step is to step outside of Zabbix entirely. I'm going to disable snmp monitoring through Zabbix and create my own snmp monitor so I can control the traffic more precisely. Now I just need to find the time to do it. Time, Time, Time. See what's become of me...

      1 Reply Last reply Reply Quote 2
      • A Offline
        Averlon
        last edited by

        When I tried to nail this down, I had build a lab and started with my standard snmp template for Prometheus, but couldn't trigger any leak for weeks at all. That lab was a fresh installed pfSense with a LAN Interface and a Prometheus Server running the SNMP queries against the LAN interface.

        Looking at one of my firewalls in real time, which is queried every 60 seconds for various OIDs, the bsmpd process stays, at least for 30min, at the same level of memory usage.
        I looked at my historical performance data for some firewalls in 24h Intervals between the restarts of bsnmpd and notice that devices with higher pps rates, drain memory slightly faster than others with lower traffic volumes. I don't have a per process view of that, only a total usage / free memory. My conclusion is, that even if queries a certain OID or multiple OIDs are cause of the leak, the traffic volume also comes into play.

        1 Reply Last reply Reply Quote 0
        • stephenw10S Offline
          stephenw10 Netgate Administrator
          last edited by

          Hmm, so no single OID or set of OIDs can trigger this unless the rate of queries or data volume is high enough.... 🤔

          1 Reply Last reply Reply Quote 0
          • A Offline
            Averlon
            last edited by

            Well, not exactly what I meant. On my production firewall it took only 5 days to max out about 8GB RAM. The Lab installation with only 2GB RAM last for over 2 weeks and the SNMP Process was under 500MB memory usage. On both firewall the same monitoring template was used with the same query rate of every 60 seconds.
            The Lab installation was pretty basic configured with a LAN and WAN Interface plus a SNMP-Server config. If simply a specific OID or a set of OIDs are cause of the leak, I assume that the Lab firewall would run into memory problems after a day or two.
            I compared the evening on one production firewall the memory stats of the bsnmpd process in between 30mins and it hasn't changed or went up significant. Somethings more needs to happen to cause the snmp process eating up memory. Interface flaps, counter rollover - your guess is as good as mine.
            Did someone took a look at the changes around the SNMP implementation of pfSense between 2.7 and 2.8?

            tinfoilmattT stephenw10S 2 Replies Last reply Reply Quote 0
            • tinfoilmattT Offline
              tinfoilmatt LAYER 8 @Averlon
              last edited by

              @Averlon said in CE 2.8.1 bsnmpd Memory Leak:

              Did someone took a look at the changes around the SNMP implementation of pfSense between 2.7 and 2.8?

              Go ahead. (Make an account to see the code repository itself.) It's not called the Community Edition for nothin'!

              A 1 Reply Last reply Reply Quote 0
              • stephenw10S Offline
                stephenw10 Netgate Administrator @Averlon
                last edited by

                @Averlon Ah so it still leaked just not at a rate that hits a limit quickly. Hmm.

                keyserK 1 Reply Last reply Reply Quote 0
                • A Offline
                  Averlon @tinfoilmatt
                  last edited by

                  @tinfoilmatt said in CE 2.8.1 bsnmpd Memory Leak:

                  Go ahead. (Make an account to see the code repository itself.) It's not called the Community Edition for nothin'!

                  If you had read the forum posts you would know, that I already have an redmine account. So, if you like to contribute something useful to the topic, you are very welcome!

                  1 Reply Last reply Reply Quote 0
                  • A Offline
                    Averlon
                    last edited by

                    https://cgit.freebsd.org/src/commit/?id=f1612e7087d7c3df766ff0bf58c48d02fb0e2f6d

                    Thats pretty much the only commit I found. It comes from https://redmine.pfsense.org/issues/15481

                    1 Reply Last reply Reply Quote 0
                    • J Offline
                      JD 0
                      last edited by

                      I've turned off bsnmp on my 4200. It's just leaking too badly -- it nearly ran out of swap overnight, and I also don't want to continue chewing through the lifespan on the flash. This was with Zabbix monitoring dialed back as far as I could reasonably set it and still retain any useful data.

                      I've since stood up a FreeBSD host in my Proxmox environment and plan to test a little further there. Though I noticed, interestingly enough, that the bsnmp package is listed as "no maintainer". Perhaps bsnmp isn't the best package to use for this service -- though I understand the intent was for it to be "lightweight".

                      keyserK 1 Reply Last reply Reply Quote 0
                      • stephenw10S stephenw10 referenced this topic on
                      • keyserK Offline
                        keyser Rebel Alliance @JD 0
                        last edited by

                        I have the exact same issue with my 6100’s being monitored by the same pfsense SNMP template from zabbix. I initially created another thread about the issue because I first noticed the behaviour by some massive memory jumps in the monitoring graphs, but that was a cosmetic thing. Upon investigation, the issue seems to be exactly the same as this thread.

                        Here’s my thead that I have closed: https://forum.netgate.com/topic/200635/bsnmp-causing-massive-memory-use-spikes-since-26.03-update/5

                        Love the no fuss of using the official appliances :-)

                        1 Reply Last reply Reply Quote 0
                        • keyserK Offline
                          keyser Rebel Alliance @stephenw10
                          last edited by

                          @stephenw10 Do you have any insights into how we might diagnose and identify the leak here?

                          I initially tried disabling all queries related to pf filter stats because that has previously been a culprit in leaving open files (24.03), but that made no difference.

                          There is no doubt it's a plain memory leak and it's related to the amount of queries you make as far as I can tell (I query more stats on one particular box, and the leak grows slightly faster on that)

                          Love the no fuss of using the official appliances :-)

                          K 1 Reply Last reply Reply Quote 0
                          • K Offline
                            kprovost @keyser
                            last edited by

                            @keyser As I said in https://forum.netgate.com/post/1229469 : we have been unable to reproduce a leak.
                            Try to narrow down the specific OID that triggers the leak. Probably the easiest way to do that is to disable the query to half of them and to see if the leak remains. If it does not, switch the disabled halves and try again.
                            Once you identify a leaking half split that one in half and repeat the exercise until you have it narrowed down, ideally down to one, but just a handful would be a useful step already.

                            keyserK 2 Replies Last reply Reply Quote 0
                            • keyserK Offline
                              keyser Rebel Alliance @kprovost
                              last edited by keyser

                              @kprovost Just to let the thread know: 26.03.1 continues (as expected) to exibit the memory leak issue in BSNMP.
                              I have been working diligently on identifying the issue and I’m getting pretty close. But it takes a sh**load of time to do conquer and divide on this issue as the leak is so slow it takes around 12 hours to reliably identify whether it is still leaking when some SNMP keys has been disabled in Zabbix.

                              I will report back when I have some more hard evidence.

                              Love the no fuss of using the official appliances :-)

                              1 Reply Last reply Reply Quote 3
                              • keyserK Offline
                                keyser Rebel Alliance @kprovost
                                last edited by

                                @kprovost @stephenw10 Okay, I have some important initial findings to share on this issue now. I have been conducting a very structured conquer and divide strategy to isolate the issue in BSNMPD on my two "identical" SG6100's, and here are the details:

                                When the Zabbix Template item called "PFsense: Firewall Rules Count" is queried, the memory leak issues occur. I have inserted the specs for this item here:
                                0e30f873-8f91-4b71-a381-333e91b812b1-image.png

                                The OID is: .1.3.6.1.4.1.12325.1.200.1.11.1.0

                                The memory leak size is correlated to how often this OID is queried. By default it is done every minute which leads to a slow memory leak of < 100MB in a couple of hours. I have increased the query interval to 5 seconds in order to see the effects and it goes to about 300 MB in a couple of hours. NOTE: These are not precise numbers and are just graph readings.

                                On the identical 6100 there is no longer an obvious memory leak once I disable that particular template item and restart BSNMPD. I have tried reversing the test on the boxes and as expected the opposite happens.

                                However: 2 things suggests the leak is not directly related to every query made at the OID:
                                1: Leaked memory does not seem to scale linearly with queries/time. Query every 5s is 6 times more often that once a minute, but the memory leak size has not risen 6 times.
                                2: Once the OID has been queried the memory leak will continue even though I disable the item query. Memory leaking seems to slowly fade in size until it almost flatlines, but is does keep going for at least 1 hour+ - This could however be false readings on my part as I only have the graphs as a baseline for now. I Have not had enough time and a precise way of telling the memory consumption of the BSNMPD process in specific.

                                But one thing is clear: I need to restart BSNMPD and not query the OID at all to stop the memory leak all together.

                                I have not had enough time to determine if there are other items that very slowly leak memory. I will continue to investigate that, but it seems stable when that item is disabled.

                                NB: The memory consumption of BSNMPD is quite noticeable - about 300MB out of the gates (with this OID disabled). Is that normal or?

                                I hope this can help you find the bug in BSNMPD and create a patch :-)

                                Love the no fuss of using the official appliances :-)

                                tinfoilmattT 1 Reply Last reply Reply Quote 5
                                • tinfoilmattT Offline
                                  tinfoilmatt LAYER 8 @keyser
                                  last edited by

                                  Nice work doing the needful.

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Yup nice work. 👍

                                    1 Reply Last reply Reply Quote 0
                                    • C Offline
                                      cjrnz
                                      last edited by

                                      testing here, have reenabled polling with pfsense.rules.count disabled. cheers!

                                      keyserK 1 Reply Last reply Reply Quote 1
                                      • keyserK Offline
                                        keyser Rebel Alliance @cjrnz
                                        last edited by keyser

                                        @cjrnz There is no doubt here, that the metioned ITEM is the culprit. Here is a several days graph of the difference bewteen two “identical” SG6100’s where one has the ITEM disabled:
                                        66e2648e-8fb8-4c36-a5ee-094c247f49f2-image.png

                                        Since yesterday both have been running with the ITEM disabled, and the BSNMPD process memory courve is now completely flat. Inspecting the process with TOP shows that one has grown 1MB in size, the other 2MB since their restart 16 hours ago. To early to say if that indicates a VERY VERY slow memory leak, or just normal behaviour from restart.

                                        EDIT: I’m no longer sure that it does not scale liniearly with the amount of queries. As you can see from the graph above, the overall firewall memory leak seams to be in the 40 - 50MB / 2 hours when doing the standard query every minute. That is very much 1/6th of the 300MB / 2 hours when quering every 5 second.
                                        I have also confirmed that the firewall cached memory graph shows somewhat different behavior than the actual BSNMPD process (inspected from CLI). So the process might stops leaking right away I stop querying the ITEM.

                                        Right now I’m just testing long term stability when the ITEM is disabled, so it will be a week or so before I can confirm the linearity with queries that does indeed seem to be there.

                                        Love the no fuss of using the official appliances :-)

                                        C 1 Reply Last reply Reply Quote 1
                                        • C Offline
                                          cjrnz @keyser
                                          last edited by

                                          @keyser ahh, I do like pretty graphs, I've been watching mine on the cli with.

                                          while ( 1 )
                                          echo `date -u` bsnmpd VSZ:`ps -Haxuww | grep 'bsnmpd' | grep -v grep | awk '{print $5}'` RSS:`ps -Haxuww | grep 'bsnmpd' | grep -v grep | awk '{print $6}'`
                                          sleep 60
                                          end
                                          

                                          I started at
                                          Wed Jun 3 20:26:36 UTC 2026 bsnmpd VSZ:292236 RSS:264532
                                          and now it's at
                                          Thu Jun 4 06:28:07 UTC 2026 bsnmpd VSZ:333196 RSS:272456
                                          so, as we know, it grows a little over time anyway but only a few bytes over hours.. but so much more with the rule count included.

                                          keyserK 1 Reply Last reply Reply Quote 1
                                          • keyserK Offline
                                            keyser Rebel Alliance @cjrnz
                                            last edited by

                                            @cjrnz Thanks

                                            It’s still i bit early to tell, but there seems to be another minor memory leak occuring - at about 4-5 MB/24H.
                                            Not anywhere near the problem with the rules.count ITEM, but none the less something that looks like a small leak. I will monitor this for another week properly confirm.

                                            Regarding the identified issue in BSNMPD with the rules.count OID - are you Netgate guys on this / have created a redmine for the problem?
                                            It is after all not insignificant and WILL cause issues in production where the Firewall rules number is monitored by SNMP from fx. Zabbix or Prometheus or the likes.

                                            Love the no fuss of using the official appliances :-)

                                            K 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2026 Rubicon Communications LLC (Netgate). All rights reserved.