Telegraf memory stats broken on 2.4.3



  • Was running on 2.4.1 and had no issues with memory info from Telegraf. As soon as I upgraded to 2.4.3 all memory data stopped coming in. The telegraf.log is not very informative:

    [2.4.3-RELEASE][admin@redacted]/root: cat /var/log/telegraf.log
    daemon: process already running, pid: 43910
    
    

    Running telegraf testmode:

    [2.4.3-RELEASE][admin@redacted]/root: /usr/local/bin/telegraf -config=/usr/local/etc/telegraf.conf --test
    * Plugin: inputs.system, Collection 1
    > system,host=redacted n_users=2i,n_cpus=4i,load1=0.22998046875,load5=0.13134765625,load15=0.09228515625 1522517091000000000
    > system,host=redacted uptime_format="17621 days, 17:24",uptime=1522517090i 1522517091000000000
    * Plugin: inputs.cpu, Collection 1
    * Plugin: inputs.cpu, Collection 2
    > cpu,host=redacted,cpu=cpu0 usage_softirq=0,usage_steal=0,usage_user=1.5151515151515151,usage_system=1.5151515151515151,usage_iowait=0,usage_irq=0,usage_guest=0,usage_guest_nice=0,usage_idle=96.96969696969697,usage_nice=0 1522517091000000000
    > cpu,cpu=cpu1,host=redacted usage_nice=0,usage_irq=0,usage_softirq=0,usage_guest_nice=0,usage_steal=0,usage_guest=0,usage_user=0,usage_system=1.5151515151515151,usage_idle=98.48484848484848,usage_iowait=0 1522517091000000000
    > cpu,cpu=cpu2,host=redacted usage_user=3.0303030303030303,usage_iowait=0,usage_guest_nice=0,usage_steal=0,usage_guest=0,usage_system=0,usage_idle=96.96969696969697,usage_nice=0,usage_irq=0,usage_softirq=0 1522517091000000000
    > cpu,cpu=cpu3,host=redacted usage_iowait=0,usage_softirq=0,usage_steal=0,usage_user=0,usage_system=1.5151515151515151,usage_idle=98.48484848484848,usage_nice=0,usage_irq=0,usage_guest=0,usage_guest_nice=0 1522517091000000000
    > cpu,cpu=cpu-total,host=redacted usage_softirq=0,usage_steal=0,usage_guest=0,usage_guest_nice=0,usage_system=1.1363636363636365,usage_idle=97.72727272727273,usage_nice=0,usage_iowait=0,usage_irq=0,usage_user=1.1363636363636365 1522517091000000000
    * Plugin: inputs.net, Collection 1
    > net,interface=hn0,host=redacted packets_recv=2756850i,err_in=0i,drop_out=0i,bytes_recv=1641622349i,packets_sent=3029794i,err_out=0i,drop_in=0i,bytes_sent=1900121830i 1522517091000000000
    > net,interface=hn1,host=redacted err_in=0i,err_out=0i,drop_out=0i,bytes_sent=1668827008i,packets_sent=2714028i,packets_recv=3446748i,drop_in=0i,bytes_recv=2002283568i 1522517091000000000
    > net,host=redacted,interface=hn2 bytes_sent=122285326i,packets_sent=270127i,drop_out=0i,bytes_recv=21829754i,packets_recv=105449i,err_in=0i,err_out=0i,drop_in=0i 1522517091000000000
    > net,interface=bridge0,host=redacted bytes_recv=1693805054i,drop_in=0i,drop_out=0i,err_out=0i,bytes_sent=3959773046i,packets_sent=3299921i,packets_recv=2862299i,err_in=0i 1522517091000000000
    > net,interface=ovpns3,host=redacted bytes_recv=0i,packets_sent=5i,packets_recv=0i,err_out=0i,drop_in=0i,drop_out=0i,bytes_sent=436i,err_in=0i 1522517091000000000
    > net,interface=ovpnc1,host=redacted packets_recv=667371i,err_in=0i,packets_sent=416280i,err_out=0i,drop_in=0i,drop_out=29i,bytes_sent=47497767i,bytes_recv=716109917i 1522517091000000000
    * Plugin: inputs.swap, Collection 1
    > swap,host=redacted used_percent=0,total=2097152i,used=0i,free=2097152i 1522517091000000000
    > swap,host=redacted in=0i,out=0i 1522517091000000000
    * Plugin: inputs.mem, Collection 1
    2018-03-31T17:24:51Z E! error getting virtual memory info: cannot allocate memory
    
    

    I attached the output from truss /usr/local/bin/telegraf -config=/usr/local/etc/telegraf.conf –test. It looks like the sysctl call to read vfs.bufspace is failing.
    truss_telegraf.txt



  • I'm seeing the same error after upgrading to 2.4.3.

    FWIW, I modified the [agent] section of telegraf.inc to:

    [agent]
            interval = "{$telegraf_conf['interval']}s"
            round_interval = true
            debug = true
            quiet = false
            logfile = "/var/log/telegraf/telegraf.log"
    

    A snippet of that log:

    2018-04-02T00:08:50Z E! Error in plugin [inputs.mem]: error getting virtual memory info: cannot allocate memory
    2018-04-02T00:08:50Z D! Output [influxdb] buffer fullness: 27 / 10000 metrics.
    2018-04-02T00:08:50Z D! Output [influxdb] wrote batch of 27 metrics in 5.119488ms
    
    


  • This is a bug upstream. And it's been logged in Redmine for the pfSense Telegraf package maintainer.

    I worked with an InfluxDB dev., and we were able to solve the issue and I have it working on 3x pfSense instances, but the package maintainer, I surmise, needs to determine the course of action:

    https://redmine.pfsense.org/issues/8425#change-36362



  • Just in case anyone happens across this thread, this issue is resolved for me in 2.4.4.