Telegraf: no connection from pfSense via FRR (can I choose the interface?)


  • Hi,

    I have a problem with Telegraf, I think I know why, but I don't know how to solve it.

    I have several networks connected to each other via FRR, all working fine. Each of these has an own pfSense firewall. Let's call them:
    172.xx.xx.1
    10.10.xx.1
    10.20.xx.1

    I set up an InfluxDB server behind 172.xx.xx.1, it has IP 172.xx.xx.111. On the pfSense box at 172.xx.xx.1, I added the Telegraf package and set the IP - all works perfectly.

    On 10.10.xx.1 and 10.20.xx.1, the same setup does not work.
    From servers behind 10.10.xx.1 and 10.20.xx.1 I can reach the InfluxDB server without problems.

    When I run telegraf from the command line on one of the pfSense firewalls with no connection, I get this:

    2020-02-10T16:19:25Z E! [outputs.influxdb]: when writing to [http://172.xx.xx.111:8086]: Post http://172.xx.xx.111:8086/write?db=pfsense: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    2020-02-10T16:19:25Z E! Error writing to output [influxdb]: could not write any address
    

    OK, so try curl on the address to see if it can even be reached:

    /root: curl -v http://172.xx.xx.111
    
    * Expire in 0 ms for 6 (transfer 0x803a94000)
    *   Trying 172.xx.xx.111...
    * TCP_NODELAY set
    * Expire in 200 ms for 4 (transfer 0x803a94000)
    * connect to 172.xx.xx.111 port 80 failed: Operation timed out
    * Failed to connect to 172.xx.xx.111 port 80: Operation timed out
    * Closing connection 0
    curl: (7) Failed to connect to 172.xx.xx.111 port 80: Operation timed out
    

    If I try this:

    /root: curl -v --interface ix0  http://172.xx.xx.111
    * Expire in 0 ms for 6 (transfer 0x803a94000)
    *   Trying 172.xx.xx.111...
    * TCP_NODELAY set
    * Local Interface ix0 is ip 10.20.xx.1 using address family 2
    * Local port: 0
    * Expire in 200 ms for 4 (transfer 0x803a94000)
    * Connected to 172.xx.xx.111 (172.xx.xx.111) port 80 (#0)
    > GET / HTTP/1.1
    > Host: 172.xx.xx.111
    > User-Agent: curl/7.64.0
    > Accept: */*
    > 
    < HTTP/1.1 200 OK
    < Server: nginx/1.14.0 (Ubuntu)
    etc etc...
    

    So, that works - the problem seems to be that Telegraf is trying to go out over the wrong interface - probably because of FRR (I can see with a package capture that the source IP is the VTI interface IP). Is there any way I can change the interface which Telegraf communicates on?


  • Any suggestions? I've returned to this problem, and tried to find options for Telegraf as well.. but I can't seem to find any simple solution (as in eg ping command, where ping -S 172.xx.xx.1 solves the problem...)


  • Did you try adding the route for the IP


  • Hi daffodil,

    thanks for your input. I have not added a static route, if that's what you suggest.

    There is a route to this network created by FRR, it looks something like this:

    172.xx.xx.0/24	link#2	U	322721536	1500	vtnet1
    

    What route should I add so that telegraf chooses the right interface?


  • Sorry for the late reply, In the above 10* addresses are not included. Try setting up the static route for each interface -> System/Routing/Static Routes


  • Hey,

    as I understand it, the static routes method should not be necessary when using FRR.

    What I showed wasn't quite correct, it was from the wrong firewall. This is what it looks like from 10.10.xx.1:

    172.xx.xx.0/24	10.88.88.17	UG1	41097543	1500	ipsec3000
    10.88.88.17	link#15	UH	15196154	1500	ipsec3000
    

    There are a lot more routes there, but I won't post them all since it will be incredibly confusing (there is a large network of several firewalls interconnected). There are also entries for all the local interfaces, for example:

    10.xx.xx.0/24	link#3	U	4056229286	1500	ix0	
    

    10.88.88.17 is the VTI interface IP on the local firewall. In general, the routing also works - it's only on the actual firewall that the wrong interface is chosen for the outgoing connection. Anything originating behind the firewall takes the right route.


  • I have solved my problem, for reference and for anyone who runs into the same problem:

    On the pfSense gateway 172.xx.xx.1 (the one in front of the InfluxDB server, which was working), add the following to the Telegraf configuration:

    [[inputs.influxdb_listener]]
      service_address = ":8086"
    

    This can be done at the bottom of the Telegraf configuration page. This activates Telegraf as a forwarder.

    On each other pfSense machine, I find the VTI interface IP, ie 10.88.88.xx and set the influxDB server as http://10.88.88.xx:8086.
    Now, the data is sent via Telegraf to the InfluxDB server, and arrives as it should.

    There would probably be other solutions with proxies, but this seems simplest.