Telegraf service not starting after change of setting
-
I installed the Telegraf package and I notice that every time I change a setting (Services/Telegraph) I have to manually start Telegraph (Status / Services / Telegraf / Start).
Start with Telegraf running:
Change a setting and save, leads to
Telegraf stopped:Manually start Telegraf gets it running again:
Why doesn't Telegraf automatically restart after changing settings? There are no error indications in the logs as far as I have noticed.
Note: I have not used Telegraf in earlier releases, I just start in in the current beta (25.03.b.20250515.1415)
-
Further manual testing shows that the Telegraf service is not always restarted. It always stops properly but sometimes it is not started again.
Start
[25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh start [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:02:28Z I! Loading config: /usr/local/etc/telegraf.conf 2025-06-03T14:02:28Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB 2025-06-03T14:02:28Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores 2025-06-03T14:02:28Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system 2025-06-03T14:02:28Z I! Loaded aggregators: 2025-06-03T14:02:28Z I! Loaded processors: 2025-06-03T14:02:28Z I! Loaded secretstores: 2025-06-03T14:02:28Z I! Loaded outputs: influxdb 2025-06-03T14:02:28Z I! Tags enabled: host=pfsense.local.lan 2025-06-03T14:02:28Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s 2025-06-03T14:02:28Z W! [inputs.kernel] Current platform is not supported 2025-06-03T14:02:30Z E! [inputs.swap] Error in plugin: error getting swap memory info: no swap devices found
Restart, where service does not restart
[25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart 2025-06-03T14:02:38Z I! [agent] Hang on, flushing any cached metrics before shutdown [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:02:38Z I! [agent] Stopping running outputs
Next restart, where service does start
[25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart cat: /var/run/telegraf.pid: No such file or directory usage: kill [-s signal_name] pid ... kill -l [exit_status] kill -signal_name pid ... kill -signal_number pid ... [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:02:47Z I! Loading config: /usr/local/etc/telegraf.conf 2025-06-03T14:02:47Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB 2025-06-03T14:02:47Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores 2025-06-03T14:02:47Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system 2025-06-03T14:02:47Z I! Loaded aggregators: 2025-06-03T14:02:47Z I! Loaded processors: 2025-06-03T14:02:47Z I! Loaded secretstores: 2025-06-03T14:02:47Z I! Loaded outputs: influxdb 2025-06-03T14:02:47Z I! Tags enabled: host=pfsense.local.lan 2025-06-03T14:02:47Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s 2025-06-03T14:02:47Z W! [inputs.kernel] Current platform is not supported
Next restart, the service fails to start again
[25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart 2025-06-03T14:02:52Z I! [agent] Hang on, flushing any cached metrics before shutdown [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:02:52Z I! [agent] Stopping running outputs
The problem seems to be related to the new service handling in pfSense Plus, the generated code for the service start/stop/restart does not wait for the pid to die after killing it. Compare with the old way of controlling services (/usr/local/etc/rc.d/telegraf for example) which had a much more controlled restart flow: kill + wait + start.
I added a pwait in the rc_stop() and now the restart is always successful
[25.03-BETA][admin@pfsense.local.lan]/root: vim /usr/local/etc/rc.d/telegraf.sh 1 #!/bin/sh 2 # This file was automatically generated 3 # by the Netgate pfSense Plus service handler. 4 5 rc_start() { 6 /usr/sbin/daemon -crP /var/run/telegraf.pid /usr/local/bin/telegraf -config=/usr/local/etc/telegraf.conf 2> /va r/log/telegraf.log 7 } 8 9 rc_stop() { 10 pid=`/bin/cat /var/run/telegraf.pid` 11 /bin/kill `/bin/cat /var/run/telegraf.pid` 12 pwait ${pid} 13 } 14 15 rc_restart() { 16 rc_stop 17 rc_start 18 19 } 20 21 case $1 in 22 start) 23 rc_start 24 ;; 25 stop) 26 rc_stop 27 ;; 28 restart) 29 rc_restart 30 ;; 31 esac 32 "/usr/local/etc/rc.d/telegraf.sh" 32L, 485B written [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart cat: /var/run/telegraf.pid: No such file or directory cat: /var/run/telegraf.pid: No such file or directory usage: kill [-s signal_name] pid ... kill -l [exit_status] kill -signal_name pid ... kill -signal_number pid ... usage: pwait [-t timeout] [-ov] pid ... [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:12:45Z I! Loading config: /usr/local/etc/telegraf.conf 2025-06-03T14:12:45Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB 2025-06-03T14:12:45Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores 2025-06-03T14:12:45Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system 2025-06-03T14:12:45Z I! Loaded aggregators: 2025-06-03T14:12:45Z I! Loaded processors: 2025-06-03T14:12:45Z I! Loaded secretstores: 2025-06-03T14:12:45Z I! Loaded outputs: influxdb 2025-06-03T14:12:45Z I! Tags enabled: host=pfsense.local.lan 2025-06-03T14:12:45Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s 2025-06-03T14:12:45Z W! [inputs.kernel] Current platform is not supported [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart 2025-06-03T14:12:48Z I! [agent] Hang on, flushing any cached metrics before shutdown 2025-06-03T14:12:48Z I! [agent] Stopping running outputs [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:12:49Z I! Loading config: /usr/local/etc/telegraf.conf 2025-06-03T14:12:49Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB 2025-06-03T14:12:49Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores 2025-06-03T14:12:49Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system 2025-06-03T14:12:49Z I! Loaded aggregators: 2025-06-03T14:12:49Z I! Loaded processors: 2025-06-03T14:12:49Z I! Loaded secretstores: 2025-06-03T14:12:49Z I! Loaded outputs: influxdb 2025-06-03T14:12:49Z I! Tags enabled: host=pfsense.local.lan 2025-06-03T14:12:49Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s 2025-06-03T14:12:49Z W! [inputs.kernel] Current platform is not supported 2025-06-03T14:12:50Z E! [inputs.swap] Error in plugin: error getting swap memory info: no swap devices found [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart 2025-06-03T14:12:52Z I! [agent] Hang on, flushing any cached metrics before shutdown 2025-06-03T14:12:52Z I! [agent] Stopping running outputs [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:12:53Z I! Loading config: /usr/local/etc/telegraf.conf 2025-06-03T14:12:53Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB 2025-06-03T14:12:53Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores 2025-06-03T14:12:53Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system 2025-06-03T14:12:53Z I! Loaded aggregators: 2025-06-03T14:12:53Z I! Loaded processors: 2025-06-03T14:12:53Z I! Loaded secretstores: 2025-06-03T14:12:53Z I! Loaded outputs: influxdb 2025-06-03T14:12:53Z I! Tags enabled: host=pfsense.local.lan 2025-06-03T14:12:53Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s 2025-06-03T14:12:53Z W! [inputs.kernel] Current platform is not supported [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart 2025-06-03T14:12:56Z I! [agent] Hang on, flushing any cached metrics before shutdown 2025-06-03T14:12:56Z I! [agent] Stopping running outputs [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:12:57Z I! Loading config: /usr/local/etc/telegraf.conf 2025-06-03T14:12:57Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB 2025-06-03T14:12:57Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores 2025-06-03T14:12:57Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system 2025-06-03T14:12:57Z I! Loaded aggregators: 2025-06-03T14:12:57Z I! Loaded processors: 2025-06-03T14:12:57Z I! Loaded secretstores: 2025-06-03T14:12:57Z I! Loaded outputs: influxdb 2025-06-03T14:12:57Z I! Tags enabled: host=pfsense.local.lan 2025-06-03T14:12:57Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s 2025-06-03T14:12:57Z W! [inputs.kernel] Current platform is not supported [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:13:00Z E! [inputs.swap] Error in plugin: error getting swap memory info: no swap devices found [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh stop 2025-06-03T14:13:06Z I! [agent] Hang on, flushing any cached metrics before shutdown 2025-06-03T14:13:06Z I! [agent] Stopping running outputs [25.03-BETA][admin@pfsense.local.lan]/root:
I'll raise a redmine with the following proposed patch, that fixes the issue
--- /usr/local/pkg/telegraf.inc 2025-05-15 17:55:23.000000000 +0200 +++ /usr/local/pkg/telegraf.inc.new 2025-06-03 16:28:00.675486000 +0200 @@ -200,7 +200,7 @@ write_rcfile(array( "file" => "telegraf.sh", "start" => "/usr/sbin/daemon -crP {$pidfile} /usr/local/bin/telegraf -config={$conffile} 2> {$logfile}", - "stop" => "/bin/kill `/bin/cat {$pidfile}`" + "stop" => "pid=`/bin/cat {$pidfile}`; /bin/kill \$pid; pwait \$pid" ) );
-
https://redmine.pfsense.org/issues/16225
-
I've struggled with this same issue, and this solution sounds promising.