Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Telegraf service not starting after change of setting

    Scheduled Pinned Locked Moved Plus 25.03 Develoment Snapshots
    4 Posts 2 Posters 276 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      pst
      last edited by pst

      I installed the Telegraf package and I notice that every time I change a setting (Services/Telegraph) I have to manually start Telegraph (Status / Services / Telegraf / Start).

      Start with Telegraf running:

      2395aa88-8fa0-4fa6-907e-01e5d069a8fc-image.png

      Change a setting and save, leads to
      Telegraf stopped:

      cadf6101-846d-4cf1-94e6-6acf7cfb4515-image.png

      Manually start Telegraf gets it running again:

      d20689c9-7a4f-4b80-bb0b-976b4cf67b02-image.png

      Why doesn't Telegraf automatically restart after changing settings? There are no error indications in the logs as far as I have noticed.

      Note: I have not used Telegraf in earlier releases, I just start in in the current beta (25.03.b.20250515.1415)

      P 1 Reply Last reply Reply Quote 1
      • P
        pst @pst
        last edited by

        Further manual testing shows that the Telegraf service is not always restarted. It always stops properly but sometimes it is not started again.

        Start

        [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh start
        [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:02:28Z I! Loading config: /usr/local/etc/telegraf.conf
        2025-06-03T14:02:28Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB
        2025-06-03T14:02:28Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores
        2025-06-03T14:02:28Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system
        2025-06-03T14:02:28Z I! Loaded aggregators:
        2025-06-03T14:02:28Z I! Loaded processors:
        2025-06-03T14:02:28Z I! Loaded secretstores:
        2025-06-03T14:02:28Z I! Loaded outputs: influxdb
        2025-06-03T14:02:28Z I! Tags enabled: host=pfsense.local.lan
        2025-06-03T14:02:28Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s
        2025-06-03T14:02:28Z W! [inputs.kernel] Current platform is not supported
        2025-06-03T14:02:30Z E! [inputs.swap] Error in plugin: error getting swap memory info: no swap devices found
        

        Restart, where service does not restart

        [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart
        2025-06-03T14:02:38Z I! [agent] Hang on, flushing any cached metrics before shutdown
        [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:02:38Z I! [agent] Stopping running outputs
        

        Next restart, where service does start

        [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart
        cat: /var/run/telegraf.pid: No such file or directory
        usage: kill [-s signal_name] pid ...
               kill -l [exit_status]
               kill -signal_name pid ...
               kill -signal_number pid ...
        [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:02:47Z I! Loading config: /usr/local/etc/telegraf.conf
        2025-06-03T14:02:47Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB
        2025-06-03T14:02:47Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores
        2025-06-03T14:02:47Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system
        2025-06-03T14:02:47Z I! Loaded aggregators:
        2025-06-03T14:02:47Z I! Loaded processors:
        2025-06-03T14:02:47Z I! Loaded secretstores:
        2025-06-03T14:02:47Z I! Loaded outputs: influxdb
        2025-06-03T14:02:47Z I! Tags enabled: host=pfsense.local.lan
        2025-06-03T14:02:47Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s
        2025-06-03T14:02:47Z W! [inputs.kernel] Current platform is not supported
        

        Next restart, the service fails to start again

        [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart
        
        2025-06-03T14:02:52Z I! [agent] Hang on, flushing any cached metrics before shutdown
        [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:02:52Z I! [agent] Stopping running outputs
        
        

        The problem seems to be related to the new service handling in pfSense Plus, the generated code for the service start/stop/restart does not wait for the pid to die after killing it. Compare with the old way of controlling services (/usr/local/etc/rc.d/telegraf for example) which had a much more controlled restart flow: kill + wait + start.

        I added a pwait in the rc_stop() and now the restart is always successful

        [25.03-BETA][admin@pfsense.local.lan]/root: vim /usr/local/etc/rc.d/telegraf.sh
          1 #!/bin/sh
          2 # This file was automatically generated
          3 # by the Netgate pfSense Plus service handler.
          4
          5 rc_start() {
          6         /usr/sbin/daemon -crP /var/run/telegraf.pid /usr/local/bin/telegraf -config=/usr/local/etc/telegraf.conf 2> /va    r/log/telegraf.log
          7 }
          8
          9 rc_stop() {
         10         pid=`/bin/cat /var/run/telegraf.pid`
         11         /bin/kill `/bin/cat /var/run/telegraf.pid`
         12         pwait ${pid}
         13 }
         14
         15 rc_restart() {
         16         rc_stop
         17         rc_start
         18
         19 }
         20
         21 case $1 in
         22         start)
         23                 rc_start
         24                 ;;
         25         stop)
         26                 rc_stop
         27                 ;;
         28         restart)
         29                 rc_restart
         30                 ;;
         31 esac
         32
        "/usr/local/etc/rc.d/telegraf.sh" 32L, 485B written
        [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart
        cat: /var/run/telegraf.pid: No such file or directory
        cat: /var/run/telegraf.pid: No such file or directory
        usage: kill [-s signal_name] pid ...
               kill -l [exit_status]
               kill -signal_name pid ...
               kill -signal_number pid ...
        usage: pwait [-t timeout] [-ov] pid ...
        [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:12:45Z I! Loading config: /usr/local/etc/telegraf.conf
        2025-06-03T14:12:45Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB
        2025-06-03T14:12:45Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores
        2025-06-03T14:12:45Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system
        2025-06-03T14:12:45Z I! Loaded aggregators:
        2025-06-03T14:12:45Z I! Loaded processors:
        2025-06-03T14:12:45Z I! Loaded secretstores:
        2025-06-03T14:12:45Z I! Loaded outputs: influxdb
        2025-06-03T14:12:45Z I! Tags enabled: host=pfsense.local.lan
        2025-06-03T14:12:45Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s
        2025-06-03T14:12:45Z W! [inputs.kernel] Current platform is not supported
        
        [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart
        2025-06-03T14:12:48Z I! [agent] Hang on, flushing any cached metrics before shutdown
        2025-06-03T14:12:48Z I! [agent] Stopping running outputs
        [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:12:49Z I! Loading config: /usr/local/etc/telegraf.conf
        2025-06-03T14:12:49Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB
        2025-06-03T14:12:49Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores
        2025-06-03T14:12:49Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system
        2025-06-03T14:12:49Z I! Loaded aggregators:
        2025-06-03T14:12:49Z I! Loaded processors:
        2025-06-03T14:12:49Z I! Loaded secretstores:
        2025-06-03T14:12:49Z I! Loaded outputs: influxdb
        2025-06-03T14:12:49Z I! Tags enabled: host=pfsense.local.lan
        2025-06-03T14:12:49Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s
        2025-06-03T14:12:49Z W! [inputs.kernel] Current platform is not supported
        2025-06-03T14:12:50Z E! [inputs.swap] Error in plugin: error getting swap memory info: no swap devices found
        
        [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart
        2025-06-03T14:12:52Z I! [agent] Hang on, flushing any cached metrics before shutdown
        2025-06-03T14:12:52Z I! [agent] Stopping running outputs
        [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:12:53Z I! Loading config: /usr/local/etc/telegraf.conf
        2025-06-03T14:12:53Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB
        2025-06-03T14:12:53Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores
        2025-06-03T14:12:53Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system
        2025-06-03T14:12:53Z I! Loaded aggregators:
        2025-06-03T14:12:53Z I! Loaded processors:
        2025-06-03T14:12:53Z I! Loaded secretstores:
        2025-06-03T14:12:53Z I! Loaded outputs: influxdb
        2025-06-03T14:12:53Z I! Tags enabled: host=pfsense.local.lan
        2025-06-03T14:12:53Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s
        2025-06-03T14:12:53Z W! [inputs.kernel] Current platform is not supported
        
        [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh restart
        2025-06-03T14:12:56Z I! [agent] Hang on, flushing any cached metrics before shutdown
        2025-06-03T14:12:56Z I! [agent] Stopping running outputs
        [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:12:57Z I! Loading config: /usr/local/etc/telegraf.conf
        2025-06-03T14:12:57Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB
        2025-06-03T14:12:57Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 4 secret-stores
        2025-06-03T14:12:57Z I! Loaded inputs: cpu disk diskio kernel mem net pf processes swap system
        2025-06-03T14:12:57Z I! Loaded aggregators:
        2025-06-03T14:12:57Z I! Loaded processors:
        2025-06-03T14:12:57Z I! Loaded secretstores:
        2025-06-03T14:12:57Z I! Loaded outputs: influxdb
        2025-06-03T14:12:57Z I! Tags enabled: host=pfsense.local.lan
        2025-06-03T14:12:57Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"pfsense.local.lan", Flush Interval:10s
        2025-06-03T14:12:57Z W! [inputs.kernel] Current platform is not supported
        
        [25.03-BETA][admin@pfsense.local.lan]/root: 2025-06-03T14:13:00Z E! [inputs.swap] Error in plugin: error getting swap memory info: no swap devices found
        
        [25.03-BETA][admin@pfsense.local.lan]/root: service telegraf.sh stop
        2025-06-03T14:13:06Z I! [agent] Hang on, flushing any cached metrics before shutdown
        2025-06-03T14:13:06Z I! [agent] Stopping running outputs
        [25.03-BETA][admin@pfsense.local.lan]/root:
        

        I'll raise a redmine with the following proposed patch, that fixes the issue

        --- /usr/local/pkg/telegraf.inc 2025-05-15 17:55:23.000000000 +0200
        +++ /usr/local/pkg/telegraf.inc.new     2025-06-03 16:28:00.675486000 +0200
        @@ -200,7 +200,7 @@
                write_rcfile(array(
                        "file" => "telegraf.sh",
                        "start" => "/usr/sbin/daemon -crP {$pidfile} /usr/local/bin/telegraf -config={$conffile} 2> {$logfile}",
        -               "stop" => "/bin/kill `/bin/cat {$pidfile}`"
        +               "stop" => "pid=`/bin/cat {$pidfile}`; /bin/kill \$pid; pwait \$pid"
                        )
                );
        
        
        P 1 Reply Last reply Reply Quote 2
        • P
          pst @pst
          last edited by pst

          https://redmine.pfsense.org/issues/16225

          1 Reply Last reply Reply Quote 1
          • A
            andrew_cb
            last edited by

            I've struggled with this same issue, and this solution sounds promising.

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.