Snmpd keeps crashing (1.2.3-RELEASE)
-
Can you repeat that a couple more times and see if it's the same request killing it every time?
-
I tried it several more times, and I'm almost certain it's the one that's sending the "GetBulk" request. I am trying to reproduce it by manually running some nagios plugins, but I can't figure out how to send a request that shows up in the packet capture with GetBulk. Everything I'm trying comes back successful and does not crash it. If it helps, here is the usage for the check_snmp command in nagios:
Usage:check_snmp -H <ip_address>-o <oid>[-w warn_range] [-c crit_range] [-C community] [-s string] [-r regex] [-R regexi] [-t timeout] [-e retries] [-l label] [-u units] [-p port-number] [-d delimiter] [-D output-delimiter] [-m miblist] [-P snmp version] [-L seclevel] [-U secname] [-a authproto] [-A authpasswd] [-x privproto] [-X privpasswd]</oid></ip_address>
Doing a simple:
./check_snmp -H 10.11.12.249 -C public -o .1.3.6.1.2.1.1.1.0
returns successfully just like it does in the packet capture, and does not crash the daemon. I have tried a couple of things with check_snmp_interfaces and check_snmp_ifstatus but still no crash and still no GetBulk in the packet capture. For example:
./check_snmp_ifstatus -H 10.11.12.249 -C public -v 2c -i vlan0
returns successfully (Status is OK - vlan0 (Layer 2 Virtual LAN using 802.1Q) - Speed: 10 Mbps, MTU: 1500, Last change: 0.00 seconds, STATS:(in errors: 0, out errors: 2, queue length: 0)|queue=0) and doesn't crash the daemon.
If you can give me some parameters to put into the check_ plugin that will reproduce the GetBulk we were seeing I think we could get it to a point where the error is reproducible easily.
Thanks for your help!
-
Jim, just wondering if you saw my post above, and what your thoughts are. Do you need any other information from me? Thanks.
-
I saw it but I haven't had any time to look into this particular issue further. I'm not sure what, offhand, might cause a GetBulk request and why that seems to make it keel over.
-
I haven't seen anything else with bsnmpd crashing, but I did find that if you have net-snmp installed you should also have two programs that may help diagnose: snmpbulkget and snmpbulkwalk
-
To clarify, does that mean I should have those installed on the pfSense box or on the machine I'm making the requests from?
-
The snmp client machine, from which the requests originate.
-
Not sure if this will make a difference, but I have had to use SNMP v1 to properly connect to my pfSense boxes. When using version 2 (or 2c), Cacti could not read data properly from my pfSense boxes.
Can you tell Nagios to use "v1" instead of v2" when communicating with your pfSense box?
-
After some cursory probing with snmpbulkget and snmpbulkwalk from the server, I have no issues running the commands. Bsnmpd responds promptly with data. Working within the context of the Nagios implimentation, I fired off a walk request that produced this:
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.6 = Counter64: 0
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.7 = Counter64: 0
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.8 = Counter64: 0
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.9 = Counter64: 0
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.10 = Counter64: 0
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.11 = Counter64: 0
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.12 = Counter64: 0
Error in packet.
Connection terminated by remote hostAfter this message, no further attempts to request data were possible from Nagios, even though I can snmpbulkwalk from the command line successfully. Any attempts to query the interfaces from Nagios fails and brings down the daemon with this error in the logs.
kernel: pid 58616 (bsnmpd), uid 0: exited on signal 11 (core dumped)
-
Hi,
I've also had this problem and I found that bsmnpd crashes when the "max-repetitions field in the GETBULK PDUs" (man snmpbulkwalk) value is greater than 100 on the "if" subtree.
Test this (on a linux system):snmpbulkwalk -Cr100 -v 2c -c public 192.168.154.1 if
(should work) against this:
snmpbulkwalk -Cr101 -v 2c -c public 192.168.154.1 if
(should crash).
Our (providers) Nagios sent 340 in this field, I see from the logs that Briantists even sent 1115 (M=1115). Can this be fixed for 1.2.3 or at least double-checked for 2.0?
Thanks!
Stefan
-
Looks like it's still a problem with bsnmpd on 2.0. Not sure there is much we can do about that, the program comes from upstream. We have a couple patches to it, but it's mostly stock.
snmpbulkwalk -Cr101 -v 2c -c public 192.168.1.1 if
…
Jan 17 19:49:02 pfsense snmpd[34209]: stack overflow detected; terminated Jan 17 19:49:03 pfsense kernel: pid 34209 (bsnmpd), uid 0: exited on signal 6 (core dumped)
-
Can you please attach the core file here zipped.
-
@ermal:
Can you please attach the core file here zipped.
Where do I find the core file?
-
It's probably in / (the root directory)
Ermal has a core from me, and I believe he made it crash himself as well (From talking to him on IRC). He said he saw the bad code but hadn't had a chance to fix it yet.
-
Okay, you or he can let me know if you guys need anything else. Thanks!