Telegraf memory stats broken on 2.4.3
-
Was running on 2.4.1 and had no issues with memory info from Telegraf. As soon as I upgraded to 2.4.3 all memory data stopped coming in. The telegraf.log is not very informative:
[2.4.3-RELEASE][admin@redacted]/root: cat /var/log/telegraf.log daemon: process already running, pid: 43910
Running telegraf testmode:
[2.4.3-RELEASE][admin@redacted]/root: /usr/local/bin/telegraf -config=/usr/local/etc/telegraf.conf --test * Plugin: inputs.system, Collection 1 > system,host=redacted n_users=2i,n_cpus=4i,load1=0.22998046875,load5=0.13134765625,load15=0.09228515625 1522517091000000000 > system,host=redacted uptime_format="17621 days, 17:24",uptime=1522517090i 1522517091000000000 * Plugin: inputs.cpu, Collection 1 * Plugin: inputs.cpu, Collection 2 > cpu,host=redacted,cpu=cpu0 usage_softirq=0,usage_steal=0,usage_user=1.5151515151515151,usage_system=1.5151515151515151,usage_iowait=0,usage_irq=0,usage_guest=0,usage_guest_nice=0,usage_idle=96.96969696969697,usage_nice=0 1522517091000000000 > cpu,cpu=cpu1,host=redacted usage_nice=0,usage_irq=0,usage_softirq=0,usage_guest_nice=0,usage_steal=0,usage_guest=0,usage_user=0,usage_system=1.5151515151515151,usage_idle=98.48484848484848,usage_iowait=0 1522517091000000000 > cpu,cpu=cpu2,host=redacted usage_user=3.0303030303030303,usage_iowait=0,usage_guest_nice=0,usage_steal=0,usage_guest=0,usage_system=0,usage_idle=96.96969696969697,usage_nice=0,usage_irq=0,usage_softirq=0 1522517091000000000 > cpu,cpu=cpu3,host=redacted usage_iowait=0,usage_softirq=0,usage_steal=0,usage_user=0,usage_system=1.5151515151515151,usage_idle=98.48484848484848,usage_nice=0,usage_irq=0,usage_guest=0,usage_guest_nice=0 1522517091000000000 > cpu,cpu=cpu-total,host=redacted usage_softirq=0,usage_steal=0,usage_guest=0,usage_guest_nice=0,usage_system=1.1363636363636365,usage_idle=97.72727272727273,usage_nice=0,usage_iowait=0,usage_irq=0,usage_user=1.1363636363636365 1522517091000000000 * Plugin: inputs.net, Collection 1 > net,interface=hn0,host=redacted packets_recv=2756850i,err_in=0i,drop_out=0i,bytes_recv=1641622349i,packets_sent=3029794i,err_out=0i,drop_in=0i,bytes_sent=1900121830i 1522517091000000000 > net,interface=hn1,host=redacted err_in=0i,err_out=0i,drop_out=0i,bytes_sent=1668827008i,packets_sent=2714028i,packets_recv=3446748i,drop_in=0i,bytes_recv=2002283568i 1522517091000000000 > net,host=redacted,interface=hn2 bytes_sent=122285326i,packets_sent=270127i,drop_out=0i,bytes_recv=21829754i,packets_recv=105449i,err_in=0i,err_out=0i,drop_in=0i 1522517091000000000 > net,interface=bridge0,host=redacted bytes_recv=1693805054i,drop_in=0i,drop_out=0i,err_out=0i,bytes_sent=3959773046i,packets_sent=3299921i,packets_recv=2862299i,err_in=0i 1522517091000000000 > net,interface=ovpns3,host=redacted bytes_recv=0i,packets_sent=5i,packets_recv=0i,err_out=0i,drop_in=0i,drop_out=0i,bytes_sent=436i,err_in=0i 1522517091000000000 > net,interface=ovpnc1,host=redacted packets_recv=667371i,err_in=0i,packets_sent=416280i,err_out=0i,drop_in=0i,drop_out=29i,bytes_sent=47497767i,bytes_recv=716109917i 1522517091000000000 * Plugin: inputs.swap, Collection 1 > swap,host=redacted used_percent=0,total=2097152i,used=0i,free=2097152i 1522517091000000000 > swap,host=redacted in=0i,out=0i 1522517091000000000 * Plugin: inputs.mem, Collection 1 2018-03-31T17:24:51Z E! error getting virtual memory info: cannot allocate memory
I attached the output from truss /usr/local/bin/telegraf -config=/usr/local/etc/telegraf.conf –test. It looks like the sysctl call to read vfs.bufspace is failing.
truss_telegraf.txt -
I'm seeing the same error after upgrading to 2.4.3.
FWIW, I modified the [agent] section of telegraf.inc to:
[agent] interval = "{$telegraf_conf['interval']}s" round_interval = true debug = true quiet = false logfile = "/var/log/telegraf/telegraf.log"
A snippet of that log:
2018-04-02T00:08:50Z E! Error in plugin [inputs.mem]: error getting virtual memory info: cannot allocate memory 2018-04-02T00:08:50Z D! Output [influxdb] buffer fullness: 27 / 10000 metrics. 2018-04-02T00:08:50Z D! Output [influxdb] wrote batch of 27 metrics in 5.119488ms
-
This is a bug upstream. And it's been logged in Redmine for the pfSense Telegraf package maintainer.
I worked with an InfluxDB dev., and we were able to solve the issue and I have it working on 3x pfSense instances, but the package maintainer, I surmise, needs to determine the course of action:
https://redmine.pfsense.org/issues/8425#change-36362
-
Just in case anyone happens across this thread, this issue is resolved for me in 2.4.4.
-
It's great information. Thank you.