Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Snmpd keeps crashing (1.2.3-RELEASE)

    Scheduled Pinned Locked Moved SNMP
    23 Posts 7 Posters 19.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jimpJ
      jimp Rebel Alliance Developer Netgate
      last edited by

      I'm monitoring various bits of my pfSense boxes via snmp using Cacti and I have never seen it crash.

      Is it possible that it's just reacting badly to a malformed query from whatever is polling SNMP?

      You might run a tcpdump on SNMP traffic with the verbosity WAY up, e.g.

      tcpdump -i <int> -vvvv -X -s 8192 udp and port 161</int>
      

      See if you can tell what query is happening when it dies, it might lead somewhere.

      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      1 Reply Last reply Reply Quote 0
      • B
        Briantist
        last edited by

        Thanks jimp. I was able to catch this right as it crashed (somewhat sanitized):

        
        15:53:09.186274 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 70) 10.11.12.242.48646 > 10.11.12.249.161: [udp sum ok]  { SNMPv2c { GetRequest(27) R=35358  .1.3.6.1.2.1.1.1.0 } }
                0x0000:  4500 0046 0000 4000 4011 10a9 0a0a 0af2  E..F..@.@.......
                0x0010:  0a0a 0af9 be06 00a1 0032 538b 3028 0201  .........2S.0(..
                0x0020:  0104 0670 7562 6c69 63a0 1b02 0300 8a1e  ...public.......
                0x0030:  0201 0002 0100 300e 300c 0608 2b06 0102  ......0.0...+...
                0x0040:  0101 0100 0500                           ......
        15:53:09.187897 IP (tos 0x0, ttl 64, id 61385, offset 0, flags [none], proto UDP (17), length 122) 10.11.12.249.161 > 10.11.12.242.48646: [udp sum ok]  { SNMPv2c { GetResponse(79) R=35358  .1.3.6.1.2.1.1.1.0="gateway1.bti.local 2285352088 FreeBSD 7.2-RELEASE-p5" } }
                0x0000:  4500 007a efc9 0000 4011 60ab 0a0a 0af9  E..z....@.`.....
                0x0010:  0a0a 0af2 00a1 be06 0066 5ae0 305c 0201  .........fZ.0\..
                0x0020:  0104 0670 7562 6c69 63a2 4f02 0300 8a1e  ...public.O.....
                0x0030:  0201 0002 0100 3042 3040 0608 2b06 0102  ......0B0@..+...
                0x0040:  0101 0100 0434 6761 7465 7761 7931 2e62  .....4gateway1.b
                0x0050:  7469 2e6c 6f63 616c 2032 3238 3533 3532  ti.local.2285352
                0x0060:  3038 3820 4672 6565 4253 4420 372e 322d  088.FreeBSD.7.2-
                0x0070:  5245 4c45 4153 452d 7035                 RELEASE-p5
        15:53:09.347659 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 72) 10.11.12.242.48646 > 10.11.12.249.161: [udp sum ok]  { SNMPv2c { GetBulk(29) R=35359  N=0 M=1115 .1.3.6.1.2.1.2.2.1.1 } }
                0x0000:  4500 0048 0000 4000 4011 10a7 0a0a 0af2  E..H..@.@.......
                0x0010:  0a0a 0af9 be06 00a1 0034 5910 302a 0201  .........4Y.0*..
                0x0020:  0104 0670 7562 6c69 63a5 1d02 0300 8a1f  ...public.......
                0x0030:  0201 0002 0204 5b30 0f30 0d06 092b 0601  ......[0.0...+..
                0x0040:  0201 0202 0101 0500                      ........
        [/code]
        
        It looks like it successfully received and responded to the one request, and then died on the second. Unfortunately I don't really know what it means. The only things I'm checking at the moment are two interfaces: vlan0 and vlan1.
        
        1 Reply Last reply Reply Quote 0
        • jimpJ
          jimp Rebel Alliance Developer Netgate
          last edited by

          Can you repeat that a couple more times and see if it's the same request killing it every time?

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          1 Reply Last reply Reply Quote 0
          • B
            Briantist
            last edited by

            I tried it several more times, and I'm almost certain it's the one that's sending the "GetBulk" request. I am trying to reproduce it by manually running some nagios plugins, but I can't figure out how to send a request that shows up in the packet capture with GetBulk. Everything I'm trying comes back successful and does not crash it. If it helps, here is the usage for the check_snmp command in nagios:

            
            Usage:check_snmp -H <ip_address>-o <oid>[-w warn_range] [-c crit_range]
            [-C community] [-s string] [-r regex] [-R regexi] [-t timeout] [-e retries]
            [-l label] [-u units] [-p port-number] [-d delimiter] [-D output-delimiter]
            [-m miblist] [-P snmp version] [-L seclevel] [-U secname] [-a authproto]
            [-A authpasswd] [-x privproto] [-X privpasswd]</oid></ip_address> 
            

            Doing a simple:

            ./check_snmp -H 10.11.12.249 -C public -o .1.3.6.1.2.1.1.1.0

            returns successfully just like it does in the packet capture, and does not crash the daemon. I have tried a couple of things with check_snmp_interfaces and check_snmp_ifstatus but still no crash and still no GetBulk in the packet capture. For example:

            ./check_snmp_ifstatus -H 10.11.12.249 -C public -v 2c -i vlan0

            returns successfully (Status is OK - vlan0 (Layer 2 Virtual LAN using 802.1Q) - Speed: 10 Mbps, MTU: 1500, Last change: 0.00 seconds, STATS:(in errors: 0, out errors: 2, queue length: 0)|queue=0) and doesn't crash the daemon.

            If you can give me some parameters to put into the check_ plugin that will reproduce the GetBulk we were seeing I think we could get it to a point where the error is reproducible easily.

            Thanks for your help!

            1 Reply Last reply Reply Quote 0
            • B
              Briantist
              last edited by

              Jim, just wondering if you saw my post above, and what your thoughts are. Do you need any other information from me? Thanks.

              1 Reply Last reply Reply Quote 0
              • jimpJ
                jimp Rebel Alliance Developer Netgate
                last edited by

                I saw it but I haven't had any time to look into this particular issue further. I'm not sure what, offhand, might cause a GetBulk request and why that seems to make it keel over.

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • jimpJ
                  jimp Rebel Alliance Developer Netgate
                  last edited by

                  I haven't seen anything else with bsnmpd crashing, but I did find that if you have net-snmp installed you should also have two programs that may help diagnose: snmpbulkget and snmpbulkwalk

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  1 Reply Last reply Reply Quote 0
                  • B
                    Briantist
                    last edited by

                    To clarify, does that mean I should have those installed on the pfSense box or on the machine I'm making the requests from?

                    1 Reply Last reply Reply Quote 0
                    • jimpJ
                      jimp Rebel Alliance Developer Netgate
                      last edited by

                      The snmp client machine, from which the requests originate.

                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      1 Reply Last reply Reply Quote 0
                      • R
                        rkelleyrtp
                        last edited by

                        Not sure if this will make a difference, but I have had to use SNMP v1 to properly connect to my pfSense boxes.  When using version 2 (or 2c), Cacti could not read data properly from my pfSense boxes.

                        Can you tell Nagios to use "v1" instead of v2" when communicating with your pfSense box?

                        1 Reply Last reply Reply Quote 0
                        • T
                          Takaratiki
                          last edited by

                          After some cursory probing with snmpbulkget and snmpbulkwalk from the server, I have no issues running the commands. Bsnmpd responds promptly with data. Working within the context of the Nagios implimentation, I fired off a walk request that produced this:

                          SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.6 = Counter64: 0
                          SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.7 = Counter64: 0
                          SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.8 = Counter64: 0
                          SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.9 = Counter64: 0
                          SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.10 = Counter64: 0
                          SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.11 = Counter64: 0
                          SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.12 = Counter64: 0
                          Error in packet.
                          Connection terminated by remote host

                          After this message, no further attempts to request data were possible from Nagios, even though I can snmpbulkwalk from the command line successfully. Any attempts to query the interfaces from Nagios fails and brings down the daemon with this error in the logs.

                          kernel: pid 58616 (bsnmpd), uid 0: exited on signal 11 (core dumped)

                          1 Reply Last reply Reply Quote 0
                          • S
                            sseidel
                            last edited by

                            Hi,

                            I've also had this problem and I found that bsmnpd crashes when the "max-repetitions  field in the GETBULK PDUs" (man snmpbulkwalk) value is greater than 100 on the "if" subtree.
                            Test this (on a linux system):

                            snmpbulkwalk -Cr100 -v 2c -c public 192.168.154.1 if
                            

                            (should work) against this:

                            snmpbulkwalk -Cr101 -v 2c -c public 192.168.154.1 if
                            

                            (should crash).

                            Our (providers) Nagios sent 340 in this field, I see from the logs that Briantists even sent 1115 (M=1115). Can this be fixed for 1.2.3 or at least double-checked for 2.0?

                            Thanks!

                            Stefan

                            1 Reply Last reply Reply Quote 0
                            • jimpJ
                              jimp Rebel Alliance Developer Netgate
                              last edited by

                              Looks like it's still a problem with bsnmpd on 2.0. Not sure there is much we can do about that, the program comes from upstream. We have a couple patches to it, but it's mostly stock.

                              snmpbulkwalk -Cr101 -v 2c -c public 192.168.1.1 if 
                              

                              …

                              Jan 17 19:49:02 pfsense snmpd[34209]: stack overflow detected; terminated
                              Jan 17 19:49:03 pfsense kernel: pid 34209 (bsnmpd), uid 0: exited on signal 6 (core dumped)
                              

                              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                              Need help fast? Netgate Global Support!

                              Do not Chat/PM for help!

                              1 Reply Last reply Reply Quote 0
                              • E
                                eri--
                                last edited by

                                Can you please attach the core file here zipped.

                                1 Reply Last reply Reply Quote 0
                                • B
                                  Briantist
                                  last edited by

                                  @ermal:

                                  Can you please attach the core file here zipped.

                                  Where do I find the core file?

                                  1 Reply Last reply Reply Quote 0
                                  • jimpJ
                                    jimp Rebel Alliance Developer Netgate
                                    last edited by

                                    It's probably in / (the root directory)

                                    Ermal has a core from me, and I believe he made it crash himself as well (From talking to him on IRC). He said he saw the bad code but hadn't had a chance to fix it yet.

                                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    1 Reply Last reply Reply Quote 0
                                    • B
                                      Briantist
                                      last edited by

                                      Okay, you or he can let me know if you guys need anything else. Thanks!

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.