SNMP Service Stopping

  • We have a few hundred PFSense instances deployed. These are running mostly on Dell workstations, but we do have some ALiX devices out there as well. We monitor the WAN and LAN status via Nagios with SNMP. Over the past 2 weeks we've seen a huge uptick (as many as 10 per day) of the SNMP service stopping on random PFSense endpoints. While we've had this issue on devices before, it has typically been few and far between, maybe once or twice a month in total.

    Logging in and starting the SNMP service fixes the issue, but we're concerned why this is happening so pervasively. We have everything from 1.2.3 to 2.3.4 deployed, and we're seeing this across all types of devices and installed PFSense versions.

    There is no commonality (ISP, physical location, etc.). These are literally spread across the world. The only common factor is where they're being monitored from.

    Here is a system log entry from a device that had this issue this morning. There are no entries for over a day before it happened, and the only entries after are me logging in and restarting it. This device is a Dell Optiplex 750

    Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz
    2 CPUs: 1 package(s) x 2 core(s)

    CPU utilization is ~1%

    Version is 2.3.2

    Oct 21 06:55:25 kernel pid 29732 (bsnmpd), uid 0: exited on signal 11 (core dumped)

    Any suggestions are greatly appreciated.

  • Rebel Alliance Developer Netgate

    bsnmpd is not known for its reliability, rather for its small footprint.

    Sometimes it will crash with a specific query, used to be that was a certain size snmpwalk but it depended on the hardware. Most likely there is some specific query performed by your NMS that triggers it.

    You have two possible choices:
    1. Setup the Service Watchdog to automatically restart bsnmpd
    2. Upgrade to 2.4 and switch to the NET-SNMP package which is much more robust and configurable, but also uses a bit more RAM/CPU.

  • What kind of devices does PFSense instances deployed?

