High RTT and RTTsd in dashboard but ping from the firewall is normal



  • I have no idea why this is taking place and its really bugging me.

    pfSense 2.4.2_P1 in HA setup.
    In a data center so our gateways are in the same room pretty much.
    Its a 1gb LAN connection.

    The dashboard shows RTT to between 2 and 12ms and RTTsd can be over 20ms.
    But if I ping the gateway from the firewall its less than 1ms.  Same from backup firewall.

    If I ping from a server on the LAN side to the same gateway I am seeing sub ms too.
    So pinging appears to be normal but dashboard is showing different numbers.

    The data center is using VSF I am sure since the gateways can't be pinged if we are not connected.

    I took a simple pfsense box and plugged it in to the same port as the primary and it shows the same high RTT/RTTsd.
    Loaded 2.3.5 on the box but nothing changed, I was thinking it could be a bug, or something.

    Moved that same firewall to another connection in the same data center (onsite tech workstation area) and it shows normal sub ms RTT and normal RTTsd.

    Any idea why this is taking place. 
    No idea why this is bugging me so much but I might need mental help to get over this :)

    H.

    ![DF gateway.JPG](/public/imported_attachments/1/DF gateway.JPG)
    ![DF gateway.JPG_thumb](/public/imported_attachments/1/DF gateway.JPG_thumb)
    ![df primary gateway ping.JPG](/public/imported_attachments/1/df primary gateway ping.JPG)
    ![df primary gateway ping.JPG_thumb](/public/imported_attachments/1/df primary gateway ping.JPG_thumb)



  • @Heimire:

    The dashboard shows RTT to between 2 and 12ms and RTTsd can be over 20ms.
    But if I ping the gateway from the firewall its less than 1ms.  Same from backup firewall.

    Is it possible that you have a monitor address set that is different than the gateway address?



  • @dennypage:

    @Heimire:

    The dashboard shows RTT to between 2 and 12ms and RTTsd can be over 20ms.
    But if I ping the gateway from the firewall its less than 1ms.  Same from backup firewall.

    Is it possible that you have a monitor address set that is different than the gateway address?

    Yes, but it still shows the same numbers.
    Changed monitor address to one hop up.

    I really need to figure this out, its driving me nuts.



  • Can you log into the pfSense box and post the output from the following command please?

    ps -axuwww | grep dpinger
    

    Also, the exact command and output from the same box for the following please?

    @Heimire:

    But if I ping the gateway from the firewall its less than 1ms.



  • root    26113    0.0  0.0  13084  2784  -  S    16:05      0:00.00 sh -c ps -axuwww | grep dpinger 2>&1
    root    26667    0.0  0.0  14728  2436  -  S    16:05      0:00.00 grep dpinger
    root    28745    0.0  0.0  10980  2436  -  Is  Fri10      0:07.45 /usr/local/bin/dpinger -S -r 0 -i WAN2GW -B 64.9.133.27 -p /var/run/dpinger_WAN2GW~64.9.133.27~64.9.133.25.pid -u /var/run/dpinger_WAN2GW~64.9.133.27~64.9.133.25.sock -C /etc/rc.gateway_alarm -d 0 -s 500 -l 2000 -t 60000 -A 1000 -D 500 -L 20 64.9.133.25
    root    29321    0.0  0.0  10980  2436  -  Is  Fri10      0:07.59 /usr/local/bin/dpinger -S -r 0 -i WANGW -B 64.9.133.19 -p /var/run/dpinger_WANGW~64.9.133.19~64.9.133.17.pid -u /var/run/dpinger_WANGW~64.9.133.19~64.9.133.17.sock -C /etc/rc.gateway_alarm -d 0 -s 500 -l 2000 -t 60000 -A 1000 -D 500 -L 20 64.9.133.17
    Execute Shell Command

    ps -axuwww | grep dpinger

    The ping was done from the diagnostics/ping menu option.
    I entered the gateway 64.9.133.17
    Selected WAN for the source address.
    then ran the ping.

    Just did it and noticed I got a 29ms response time on one of the pings.
    First time I see that.

    Ran it again and this time I see a 234ms ping.

    PING 64.9.133.17 (64.9.133.17) from 64.9.133.19: 56 data bytes
    64 bytes from 64.9.133.17: icmp_seq=0 ttl=255 time=0.278 ms
    64 bytes from 64.9.133.17: icmp_seq=1 ttl=255 time=29.385 ms
    64 bytes from 64.9.133.17: icmp_seq=2 ttl=255 time=0.207 ms
    64 bytes from 64.9.133.17: icmp_seq=3 ttl=255 time=0.239 ms
    64 bytes from 64.9.133.17: icmp_seq=4 ttl=255 time=0.216 ms
    64 bytes from 64.9.133.17: icmp_seq=5 ttl=255 time=0.196 ms
    64 bytes from 64.9.133.17: icmp_seq=6 ttl=255 time=0.253 ms
    64 bytes from 64.9.133.17: icmp_seq=7 ttl=255 time=0.202 ms
    64 bytes from 64.9.133.17: icmp_seq=8 ttl=255 time=0.188 ms
    64 bytes from 64.9.133.17: icmp_seq=9 ttl=255 time=0.198 ms

    –- 64.9.133.17 ping statistics ---
    10 packets transmitted, 10 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.188/3.136/29.385/8.750 ms

    Results
    PING 64.9.133.17 (64.9.133.17) from 64.9.133.19: 56 data bytes
    64 bytes from 64.9.133.17: icmp_seq=0 ttl=255 time=0.253 ms
    64 bytes from 64.9.133.17: icmp_seq=1 ttl=255 time=0.191 ms
    64 bytes from 64.9.133.17: icmp_seq=2 ttl=255 time=0.198 ms
    64 bytes from 64.9.133.17: icmp_seq=3 ttl=255 time=0.194 ms
    64 bytes from 64.9.133.17: icmp_seq=4 ttl=255 time=234.569 ms
    64 bytes from 64.9.133.17: icmp_seq=5 ttl=255 time=0.210 ms
    64 bytes from 64.9.133.17: icmp_seq=6 ttl=255 time=0.190 ms
    64 bytes from 64.9.133.17: icmp_seq=7 ttl=255 time=0.195 ms
    64 bytes from 64.9.133.17: icmp_seq=8 ttl=255 time=0.188 ms
    64 bytes from 64.9.133.17: icmp_seq=9 ttl=255 time=0.235 ms

    --- 64.9.133.17 ping statistics ---
    10 packets transmitted, 10 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.188/23.642/234.569/70.309 ms



  • @Heimire:

    Just did it and noticed I got a 29ms response time on one of the pings.
    First time I see that.

    Ran it again and this time I see a 234ms ping.

    --- 64.9.133.17 ping statistics ---
    10 packets transmitted, 10 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.188/23.642/234.569/70.309 ms

    Well, that would certainly explain things. This could arise from a few things, but the most likely guess is the target device handles ICMP as a very low priority. You can confirm this by using a monitor address that is a little further out into the world.

    As a general rule you want to use a monitor address that is physically on the other side of your WAN link. Some people use public addresses such as Google's DNS servers. For my monitoring, I use one of my ISPs regional concentrators.

    You can use the mtr package to help you choose a suitable target. Run mtr with a target of 8.8.8.8 and look at the hops along the way.



  • @dennypage:

    @Heimire:

    Just did it and noticed I got a 29ms response time on one of the pings.
    First time I see that.

    Ran it again and this time I see a 234ms ping.

    --- 64.9.133.17 ping statistics ---
    10 packets transmitted, 10 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.188/23.642/234.569/70.309 ms

    Well, that would certainly explain things. This could arise from a few things, but the most likely guess is the target device handles ICMP as a very low priority. You can confirm this by using a monitor address that is a little further out into the world.

    As a general rule you want to use a monitor address that is physically on the other side of your WAN link. Some people use public addresses such as Google's DNS servers. For my monitoring, I use one of my ISPs regional concentrators.

    You can use the mtr package to help you choose a suitable target. Run mtr with a target of 8.8.8.8 and look at the hops along the way.

    I think you hit it on the head.
    This is still being setup and we have no live traffic there yet.
    We are moving in there and just seen weird things we did not expect.

    I will find some points to monitor outside the data center.

    Thank you so much for your input.
    Very helpful and I also realize I jumped to conclusion.
    Should have done more than 3 ping when tested but they came back perfect every time.
    I think when i did the testing earlier when i set the ping to 10 and ran it several times, I saw high numbers in probably 60-70% of the time.
    Should have dug a bit deeper before posting.

    H.


Log in to reply