net-snmpd stops responding after certain amount of time



  • Hi,

    after a certain time (1-2 weeks) my snmpd (0.1.5_2 net-snmp-5.7.3_18) daemon is no longer reachable. In the GUI it seems to be started but it doesn't respond to requests anymore.

    After a daemon-restart via the GUI the service is still not available. In the CLI the daemon seems to load a CPU core:

    [2.4.4-RELEASE][admin@X-FW01A]/usr/local: ps auwwx | grep snmp
    root   20340 100.0  0.2  31404  26640  -  R    09:18       1:06.54 /usr/local/sbin/snmpd -LF 0-4 d -p /var/run/net_snmpd.pid -M /usr/share/snmp/mibs/:/usr/local/share/snmp/mibs -C -c /var/etc/netsnmpd.conf,/var/etc/netsnmpd-users.conf
    

    After rebooting the whole pfsense (2.4.4-RELEASE-p2) instance, the daemon starts correct, is reachable and the CPU load looks fine:

    [2.4.4-RELEASE][admin@X-FW01A]/root: ps auwwx | grep snmp
    root   59265   0.0  0.1  21164  14184  -  S    09:32   0:00.03 /usr/local/sbin/snmpd -LF 0-4 d -p /var/run/net_snmpd.pid -M /usr/share/snmp/mibs/:/usr/local/share/snmp/mibs -C -c /var/etc/netsnmpd.conf,/var/etc/netsnmpd-users.conf
    root    3263   0.0  0.0   6564   2460  0  S+   09:32   0:00.00 grep snmp
    [2.4.4-RELEASE][admin@X-FW01A]/root:
    

    When I try to debug the cause, the program doesn't seem to really start. I think i am calling it not in the right way!?

    [2.4.4-RELEASE][admin@X-FW01A]/usr/local: /usr/local/sbin/snmpd -f -D all -Le -M /usr/share/snmp/mibs/:/usr/local/share/snmp/mibs -C -c /var/etc/netsnmpd.conf
    trace: main(): snmpd.c, 862:
    snmpd/main: optind 3, argc 10
    trace: netsnmp_ds_set_string(): default_store.c, 294:
    netsnmp_ds_set_string: Setting APP:2 = "all"
    trace: netsnmp_ds_set_string(): default_store.c, 294:
    netsnmp_ds_set_string: Setting APP:2 = "all,-Le"
    trace: netsnmp_ds_set_string(): default_store.c, 294:
    netsnmp_ds_set_string: Setting APP:2 = "all,-Le,-M"
    trace: netsnmp_ds_set_string(): default_store.c, 294:
    netsnmp_ds_set_string: Setting APP:2 = "all,-Le,-M,/usr/share/snmp/mibs/:/usr/local/share/snmp/mibs"
    trace: netsnmp_ds_set_string(): default_store.c, 294:
    netsnmp_ds_set_string: Setting APP:2 = "all,-Le,-M,/usr/share/snmp/mibs/:/usr/local/share/snmp/mibs,-C"
    trace: netsnmp_ds_set_string(): default_store.c, 294:
    netsnmp_ds_set_string: Setting APP:2 = "all,-Le,-M,/usr/share/snmp/mibs/:/usr/local/share/snmp/mibs,-C,-c"
    trace: netsnmp_ds_set_string(): default_store.c, 294:
    netsnmp_ds_set_string: Setting APP:2 = "all,-Le,-M,/usr/share/snmp/mibs/:/usr/local/share/snmp/mibs,-C,-c,/var/etc/netsnmpd.conf"
    trace: main(): snmpd.c, 883:
    snmpd/main: port spec: all,-Le,-M,/usr/share/snmp/mibs/:/usr/local/share/snmp/mibs,-C,-c,/var/etc/netsnmpd.conf
    logging:register: registering log type 3 with pri 7
    Log handling defined - disabling stderr
    [2.4.4-RELEASE][admin@X-FW01A]/usr/local:
    

    The config files:

    [2.4.4-RELEASE][admin@X-FW01A]/root: cat /var/etc/netsnmpd.conf
    agentaddress udp:x.x.x.x:
    engineIDType 1
    [snmp] tsmUseTransportPrefix no
    sysLocation xxx
    sysContact hostmaster@xxx
    sysName x-fw01a
    sysDescr pfSense x-fw01a
    interface_fadeout 300
    interface_replace_old no
    ignoreDisk /dev
    ignoreDisk /var/dhcpd/dev
    includeAllDisks 20%
    rwuser -s usm "xxx" noauth
    rocommunity xxx
    iquerySecName "xxx"
    agentSecName "xxx"
    master agentx
    [2.4.4-RELEASE][admin@X-FW01A]/root: cat /var/etc/netsnmpd-users.conf
    createUser "xxx" SHA "xxx" AES "xxx"
    [2.4.4-RELEASE][admin@X-FW01A]/root:
    
    

    Anyone any suggesting for further debugging?

    Cheers,
    Helge


  • Rebel Alliance Developer Netgate

    In that first bit of output, snmpd is using 100% CPU. So something has it really stuck chewing through CPU time.

    You might be able to use truss to attach to the process and see what it's doing. For example, in that output the PID is 20340 so you'd run truss -fp 20340 and capture some of that output. Since it's stuck in a loop it will probably dump a lot of output very quickly. You can usually press ctrl-c to break out of that.



  • Nice. I will give it a try the next time it appears.



  • Hi @jimp, you were right: It's kind of a loop. This output is generated every ~1 seconds with different memory addresses:
    truss-snmpd-loop.txt

    It's the same behavior directly after restarting snmpd. But after rebooting it looks like this:
    truss-snmpd-after-reboot.txt
    and repeating, when there is no active snmpget.

    Unfortunately I can do almost nothing with it :-(

    Cheers
    Helge


  • Rebel Alliance Developer Netgate

    Hmm, nothing really noteworthy in there. Is there a client polling it every second? That might explain why it's repeating that often.

    I'd expect things to be repeating a heck of a lot more than once per second if it's consuming 100% CPU.



  • @jimp actually, there is absolutely no snmp traffic while truss -fp shows this repeating output. I watched it with tcpdump -i vmx4 port 161 where vmx4 is the management interface.

    Strange ☹



  • I went back to bsnmpd.. Although it has no IPv6-support.


Log in to reply