Apinger gets lost and RTD is usally not correct



  • Hello,

    Have been running PFSense for about 5 years and when 2.1 was rolled out the consistency of my gateway reporting went down the tubes, both from an up time and RTD perspective, in other words just plain broke. Since this was cosmetic at that time I let it go and disabled gateway monitoring. I have setup a separate system that just pings my local interfaces 24/7 on my work VPN (.20 and .21) gateway. and both IPV6 and IPV4 gateways to the internet. Out of the million + pings to these devices I have about .0001% loss for the gateways and zero to the CVG's (Interfaces for my company hardware VPN solution) which are defined on a VM that is running on the same network as my PFSense. Also the reported ping time in apinger do not match what is being displayed on my dashboard. As I was again researching this via google I saw where someone mentioned to turn on more verbose debugging to see if I could get some indication what is going on. Once I did this and restarted apinger all seemed ok for a while. I think I may have found the trigger. Please see the log captures below. What I did to recreate these errors was just clicking on the dashboard and then returning to the gateway log. After a while it corrected itself.

    The system log showing that the issue is happening:

    Sep 8 08:19:52 php-fpm[89182]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use CVG2.
    Sep 8 08:19:52 php-fpm[89182]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use CVG1.
    Sep 8 08:19:22 check_reload_status: Reloading filter
    Sep 8 08:19:21 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Sep 8 08:19:21 check_reload_status: Restarting ipsec tunnels
    Sep 8 08:19:21 check_reload_status: updating dyndns CVG2
    Sep 8 08:19:21 check_reload_status: Reloading filter
    Sep 8 08:19:21 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Sep 8 08:19:21 check_reload_status: Restarting ipsec tunnels
    Sep 8 08:19:21 check_reload_status: updating dyndns CVG1
    Sep 8 08:19:20 php-fpm[44968]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use CVG2.
    Sep 8 08:19:20 php-fpm[44968]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use CVG1.

    My CVG interfaces have/can not change and are never down.

    My Gateway logs after going to the dashboard and back, checked the time and 08:19:20 time is when I went to the dashboard:

    Sep 8 08:19:45 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:44 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:43 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:42 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:41 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:40 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:39 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:38 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:37 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:36 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:35 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:34 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:33 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:32 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:31 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:30 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:29 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:28 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:27 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:26 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:25 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:24 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:23 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:22 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:21 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:21 apinger: alarm canceled: CVG2(192.168.170.21) *** down ***
    Sep 8 08:19:21 apinger: alarm canceled: CVG1(192.168.170.20) *** down ***
    Sep 8 08:19:20 apinger: alarm canceled: WAN_DHCP6(fe80::201:5cff:fe78:a046%igb0) *** down ***
    Sep 8 08:19:20 apinger: Alien echo-reply received from xxx. Expected 46489, received 28233
    Sep 8 08:19:20 apinger: alarm canceled: WAN_DHCP(73.222.192.1) *** down ***
    Sep 8 08:19:20 apinger: Polling, timeout: 0.148s
    Sep 8 08:19:20 apinger: Alien echo-reply received from 73.222.192.1. Expected 46489, received 28233
    Sep 8 08:19:20 apinger: Polling, timeout: 0.155s
    Sep 8 08:19:20 apinger: Polling, timeout: 0.156s
    Sep 8 08:19:20 apinger: Alien echo-reply received from 192.168.170.21. Expected 46489, received 28233
    Sep 8 08:19:20 apinger: Alien echo-reply received from 192.168.170.20. Expected 46489, received 28233
    Sep 8 08:19:20 apinger: Alien echo-reply received from 192.168.170.21. Expected 46489, received 28233
    Sep 8 08:19:20 apinger: Alien echo-reply received from 192.168.170.20. Expected 46489, received 28233
    Sep 8 08:19:20 apinger: ALARM: WAN_DHCP6(fe80::201:5cff:fe78:a046%igb0) *** down ***
    Sep 8 08:19:20 apinger: ALARM: WAN_DHCP(73.222.192.1) *** down ***
    Sep 8 08:19:20 apinger: ALARM: CVG2(192.168.170.21) *** down ***
    Sep 8 08:19:20 apinger: ALARM: CVG1(192.168.170.20) *** down ***
    Sep 8 08:19:20 apinger: Polling, timeout: 0.156s
    Sep 8 08:19:19 apinger: Polling, timeout: 0.987s
    Sep 8 08:19:19 apinger: (avg. loss: 0.0%)
    Sep 8 08:19:19 apinger: (avg: 13.966ms)
    Sep 8 08:19:19 apinger: #2706 from WAN_DHCP6(fe80::201:5cff:fe78:a046%igb0) delay: 12.735ms/13.981ms/139.658ms received = 31
    Sep 8 08:19:19 apinger: Polling, timeout: 0.992s
    Sep 8 08:19:19 apinger: (avg. loss: 0.0%)
    Sep 8 08:19:19 apinger: (avg: 8.893ms)
    Sep 8 08:19:19 apinger: #2706 from WAN_DHCP(73.222.192.1) delay: 7.736ms/8.255ms/88.925ms received = 31
    Sep 8 08:19:19 apinger: Polling, timeout: 0.999s
    Sep 8 08:19:19 apinger: (avg. loss: 0.0%)
    Sep 8 08:19:19 apinger: (avg: 0.205ms)
    Sep 8 08:19:19 apinger: #2706 from CVG2(192.168.170.21) delay: 0.245ms/0.217ms/2.048ms received = 61
    Sep 8 08:19:19 apinger: (avg. loss: 0.0%)
    Sep 8 08:19:19 apinger: (avg: 0.202ms)
    Sep 8 08:19:19 apinger: #2706 from CVG2(192.168.170.21) delay: 0.245ms/0.157ms/2.020ms received = 60

    Strange just by moving to the dashboard my system goes from no issue to gateways down and then these Alien responses. I have been able to recreate this around 50% of the time and it will repair itself after a while. I hope this helps narrows down to what is causing the issue.

    Here is the gateway widget from the dashboard, notice the borked up time for the RTD, does show correctly in the log before it goes offline. My normal IPV4 RTD is around 8ms and IPV6 is 12ms. SO the number in the logs do not match up to what is being displayed.

    CVG1             192.168.170.20 0.2ms 0.0% Online
    CVG2             192.168.170.21 0.2ms 0.0% Online
    WAN_DHCP 73.222.xxx.1 1.5ms 0.0% Online
    WAN_DHCP6 fe80::201:5cff:xxxx:a046 2.8ms         0.0% Online

    Here is the display showing correct RTD, this was just after the issue resolves it self. I have also included my configuration.

    CVG1             192.168.170.20 0.2ms 0.0% Online
    CVG2             192.168.170.21 0.2ms 0.0% Online
    WAN_DHCP 73.222.192.1 8.6ms 0.0% Online
    WAN_DHCP6 fe80::201:5cff:fe78:a046 12.9ms 0.0% Online

    If the devs need access to my system all you have to do is just ask.

    Best Regards,
    MT

    pfsense.zip