Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    dpinger not reliable - ping request/replies

    Scheduled Pinned Locked Moved Routing and Multi WAN
    13 Posts 5 Posters 4.2k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S Offline
      siegmarb
      last edited by siegmarb

      Dear Users,

      The running dpinger process, seems to be stall for no reason, hence reporting the gateway as down, but it is not:

      root 62188 0.0 0.0 17736 2808 - Is Sat13 0:27.94 /usr/local/bin/dpinger -S -r 0 -i GW_KD -B 10.8.0.2 -p /var/run/dpinger_GW_KD_DH~10.8.0.2~1.1.1.1.pid -u /var/run/dpinger_GW_KD_DH~10.8.0.2~1.1.1.1.sock -C /etc/rc.gateway_alarm -d 1 -s 500 -l 2000
      
      # tcpdump -ni vtnet2 host 1.1.1.1
      07:31:08.014179 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 37365, seq 53675, length 9
      07:31:08.523369 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 37365, seq 53676, length 9
      07:31:09.033452 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 37365, seq 53677, length 9
      07:31:09.544022 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 37365, seq 53678, length 9
      07:31:10.053855 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 37365, seq 53679, length 9
      07:31:10.563363 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 37365, seq 53680, length 9
      

      Saving the gateway configuration without any changes, "re-starts" the dpinger and pings are working again.

      07:31:10.826086 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 23910, seq 0, length 9
      07:31:10.841466 IP 1.1.1.1 > 10.8.0.2: ICMP echo reply, id 23910, seq 0, length 9
      07:31:11.333371 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 23910, seq 1, length 9
      07:31:11.347465 IP 1.1.1.1 > 10.8.0.2: ICMP echo reply, id 23910, seq 1, length 9
      07:31:11.843354 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 23910, seq 2, length 9
      07:31:11.902847 IP 1.1.1.1 > 10.8.0.2: ICMP echo reply, id 23910, seq 2, length 9
      07:31:12.353358 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 23910, seq 3, length 9
      07:31:12.369040 IP 1.1.1.1 > 10.8.0.2: ICMP echo reply, id 23910, seq 3, length 9
      

      Checking the fw states before saving the gw settings again, shows:

      3127d333-494e-4ec3-83d1-2043249f2cb4-2025-04-07_09-28.png

      After saving the gw settings again:

      9a589d6b-0df6-4786-9b69-28fb2b37d0d7-2025-04-07_09-34.png

      Somehow, the existing fw-state is stall.

      any helps is greatly appreciated.

      GertjanG 1 Reply Last reply Reply Quote 0
      • GertjanG Offline
        Gertjan @siegmarb
        last edited by

        @siegmarb

        10.8.0.2 is your pfSense WAN interface ?
        Static setup or DHCP ?

        Ones you've saved, and everything is fine, when does it start to fail ?
        What was going on at that, or just before, moment ? (system logs)

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        S 1 Reply Last reply Reply Quote 0
        • S Offline
          siegmarb @Gertjan
          last edited by

          @Gertjan

          correct. 1.8.0.2 is our Pfsense WAN interface. Static setup.

          It starts randomly to fail. This is the log, right after i hit 'Save' again:

          Apr 7 09:31:10	dpinger	37780	exiting on signal 15
          Apr 7 09:31:10	dpinger	89446	send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% alarm_hold 10000ms dest_addr 1.1.1.1 bind_addr 10.8.0.2 identifier "GW_KD_DH "
          Apr 8 07:07:12	dpinger	89446	GW_KD_DH 1.1.1.1: Alarm latency 27627us stddev 14386us loss 21%
          Apr 9 11:07:14	dpinger	89446	GW_KD_DH 1.1.1.1: Clear latency 22638us stddev 53358us loss 5%
          Apr 15 01:00:42	dpinger	89446	GW_KD_DH 1.1.1.1: Alarm latency 21509us stddev 5210us loss 22%
          Apr 15 01:06:52	dpinger	89446	GW_KD_DH 1.1.1.1: Alarm latency 1293341us stddev 881246us loss 95%
          Apr 15 01:07:07	dpinger	89446	GW_KD_DH 1.1.1.1: Alarm latency 243671us stddev 600909us loss 70%
          Apr 15 01:07:48	dpinger	89446	GW_KD_DH 1.1.1.1: Clear latency 93734us stddev 349188us loss 5%
          Apr 15 06:30:21	dpinger	89446	GW_KD_DH 1.1.1.1: Alarm latency 28365us stddev 7087us loss 21%
          Apr 15 06:34:35	dpinger	89446	GW_KD_DH 1.1.1.1: Clear latency 28547us stddev 86447us loss 5%
          Apr 16 10:44:07	dpinger	89446	GW_KD_DH 1.1.1.1: Alarm latency 28632us stddev 35032us loss 22%
          

          I see nothing else special in the logs. It's our primary firewall and aside from the dpinger issue, behaves "normally":

          Uptime 118 Days 04 Hours 27 Minutes 09 Seconds

          GertjanG 1 Reply Last reply Reply Quote 0
          • GertjanG Offline
            Gertjan @siegmarb
            last edited by

            @siegmarb said in dpinger not reliable - ping request/replies:

            Apr 8 07:07:12 dpinger 89446 GW_KD_DH 1.1.1.1: Alarm latency 27627us stddev 14386us loss 21%
            Apr 9 11:07:14 dpinger 89446 GW_KD_DH 1.1.1.1: Clear latency 22638us stddev 53358us loss 5%
            Apr 15 01:00:42 dpinger 89446 GW_KD_DH 1.1.1.1: Alarm latency 21509us stddev 5210us loss 22%
            Apr 15 01:06:52 dpinger 89446 GW_KD_DH 1.1.1.1: Alarm latency 1293341us stddev 881246us loss 95%
            Apr 15 01:07:07 dpinger 89446 GW_KD_DH 1.1.1.1: Alarm latency 243671us stddev 600909us loss 70%
            Apr 15 01:07:48 dpinger 89446 GW_KD_DH 1.1.1.1: Clear latency 93734us stddev 349188us loss 5%
            Apr 15 06:30:21 dpinger 89446 GW_KD_DH 1.1.1.1: Alarm latency 28365us stddev 7087us loss 21%
            Apr 15 06:34:35 dpinger 89446 GW_KD_DH 1.1.1.1: Clear latency 28547us stddev 86447us loss 5%
            Apr 16 10:44:07 dpinger 89446 GW_KD_DH 1.1.1.1: Alarm latency 28632us stddev 35032us loss 22%

            dpinger not reliable - ping request/replies

            You can remove the word 'not'. 😊
            Test for yourself : Go here : Diagnostics > Packet Capture
            and select (Capture Options) your WAN interface,
            Set "View Options" to High,
            Set PROTOCOL to PING, and ETHERTYPE to IPv4.
            Hit the green Start.

            From now on, you'll see that "ICMP echo requests" are send. It's the dpinger process that pings ^^
            These "ICMP echo requests" are send to an upstream gateway (you'll see the IP in the capture also) and if all goes well, and answer "ICMP echo reply" comes back.
            The duration between the moment a packet was send and the answer comes back is known as :

            89ddd9ad-0612-4ee8-afa5-c61c05cdd4cb-image.png

            You'll se the avarage time it took, and te variation.
            The simple fact that packets did come back is n enough to mark the interface as "Online".

            So, now you know dpinger is reliable ^^
            Less reliable is probably your connection, as you've shown yourself : ICMP packets are (always) send, but not all come back. That said, a couple over several days ... that not that bad.
            Or : maybe the gateway to where the packets where send to was very busy and missed a packet, so it didn't reply back.

            Be aware that the ICMP packets have less priority as other TCP or UDP packets, so if a ICMP gets discarded, then that's not the end of the world. It can happen.
            If your connection is saturated, then its normal that you see that a ICMP packet didn't make it back.

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            S 1 Reply Last reply Reply Quote 0
            • S Offline
              siegmarb @Gertjan
              last edited by

              @Gertjan

              thank you for your answer. Further debugging shows, that dpinger does not correctly recover:

              I restarted dpinger and replies are there again:

              10:00:07.359483 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 28786, seq 35848, length 9
              10:00:07.836358 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 8356, seq 0, length 9
              10:00:07.886237 IP 1.1.1.1 > 10.8.0.2: ICMP echo reply, id 8356, seq 0, length 9
              

              After ~ 10 hours:

              07:56:44.384429 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 23931, seq 29764, length 9
              07:56:44.404683 IP 1.1.1.1 > 10.8.0.2: ICMP echo reply, id 23931, seq 29764, length 9 
              07:56:44.894107 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 23931, seq 29765, length 9
              07:56:44.916906 IP 1.1.1.1 > 10.8.0.2: ICMP echo reply, id 23931, seq 29765, length 9
              07:56:45.433620 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 23931, seq 29766, length 9
              07:56:45.942312 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 23931, seq 29767, length 9
              07:56:46.454289 IP 10.8.0.2 > 1.1.1.1: ICMP echo request, id 23931, seq 29768, length 9
              

              dpinger detects higher latency and loss, but does not recover:

              Apr 24 07:56:59	dpinger	23931	GW_KD_DH 1.1.1.1: Alarm latency 23815us stddev 7223us loss 21%
              

              Pinging at the same time manually from the pfsense shows, that 1.1.1.1 is reachable:

              /root: ping 1.1.1.1
              PING 1.1.1.1 (1.1.1.1): 56 data bytes
              64 bytes from 1.1.1.1: icmp_seq=0 ttl=56 time=21.769 ms
              64 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=12.738 ms
              64 bytes from 1.1.1.1: icmp_seq=2 ttl=56 time=27.216 ms
              64 bytes from 1.1.1.1: icmp_seq=3 ttl=56 time=12.617 ms
              64 bytes from 1.1.1.1: icmp_seq=4 ttl=56 time=13.614 ms
              64 bytes from 1.1.1.1: icmp_seq=5 ttl=56 time=22.943 ms

              Still looks like a dpinger issue to me.

              patient0P 1 Reply Last reply Reply Quote 0
              • patient0P Online
                patient0 @siegmarb
                last edited by

                @siegmarb what pfSense version are you working with?

                What I'm a bit surprised is that the source and destination ICMP ID is the same. Nothing wrong with it but not standard, have you set it on purpose?

                10.0.8.2:23910 -> 1.1.1.1:23910
                ...
                10.0.8.2:37365 -> 1.1.1.1:37365
                

                For me the source ID/port is random:

                WAN 	icmp 	<WAN IP>:12790 -> <monitoring IP>:8 	0:0 	211.625K / 211.625K 	5.85 MiB / 5.85 MiB
                
                S 1 Reply Last reply Reply Quote 0
                • S Offline
                  siegmarb @patient0
                  last edited by

                  @patient0

                  2.7.2-RELEASE (amd64)
                  built on Fri Dec 8 21:55:00 CET 2023
                  FreeBSD 14.0-CURRENT

                  no, i did not set the id manually.

                  GertjanG patient0P 2 Replies Last reply Reply Quote 0
                  • GertjanG Offline
                    Gertjan @siegmarb
                    last edited by

                    @siegmarb

                    Right now, tens (hundreds) of thousands of pfSense installs use "2.7.2". Not saying that this is a proof it's 'perfect', but if for every pfSense the WAN is flaky at best, then at the end of this year, pfSense won't exist anymore.
                    The good news is : it's your setup ^^

                    What about this : 2.8.0 is out, true, it's beta. It's out there for nearly a month now, and there are no big issues. So : go 2.8.0.

                    And again : you can disable the dpinger action, so it won't touch your WAN connection anymore. If the interface still goes down, it wasn't dpinger doing so. dpinger will still "ping", and this is just so stats get generated and "on-line" gets shown on the dashboard.

                    No "help me" PM's please. Use the forum, the community will thank you.
                    Edit : and where are the logs ??

                    1 Reply Last reply Reply Quote 0
                    • patient0P Online
                      patient0 @siegmarb
                      last edited by

                      @siegmarb said in dpinger not reliable - ping request/replies:

                      no, i did not set the id manually

                      Ok, seeing the same on 2.7.2 (I'm on 25.03-BETA on prod), that's normal then.

                      R 1 Reply Last reply Reply Quote 0
                      • R Offline
                        reberhar @patient0
                        last edited by reberhar

                        @patient0 I think you are right. There is something not quite right or not quite understood about dpinger. I use dpinger as a check on my ISPs as is typical. Today, for example, my main head was marked as down. I simply changed the ping target and all came back. I changed from 1.1.1.1 to 4.2.2.1. There was no outage as far as I could tell. The problem is not just with 1.1.1.1 either. The ubiquitious 8.8.8.8 can have it happen as well. And I have seen some modems mark the continuous ping as a DOS threat, but that seems unlikely in this case.

                        This happens just occasionally and It can happen on any of my 9 servers. In this case It seems unlikely that 1.1.1.1 was down. I did not think to ping it manually.

                        I have noted in the past that dpinger seems to have problems recovering. For awhile I would restart it with service watchdog if it was down. That was a few versions back. With the newer versions, that has not been necessary.

                        Although very infrequent for me, it can be irritating as it happened in the middle of a Zoom meeting today. Miraculously no one complained.

                        I recognize that problems like these can be hard to find and fix.

                        Roy

                        GertjanG 1 Reply Last reply Reply Quote 0
                        • GertjanG Offline
                          Gertjan @reberhar
                          last edited by

                          @reberhar

                          dpinger is a small PHP (or bash ?) script that loops around.
                          For every :

                          8c16e46b-d77c-4da0-9206-57ec9ef2d964-image.png

                          it sends a ping, get the reply back, writes the time to a file,
                          and reloops.

                          If pings replies stop coming back, these events are logged.
                          If the losses becomes to big (you can see and set these under System > Routing > Gateways > Edit ) dpinger even stops executing. This will trigger the 'action' :

                          ce4cfc37-61b5-4121-bcb9-3e2c24bd5581-image.png

                          the action is : resetting the connection = taking it down, and rebuilds it.
                          For a WAN, this means : DHCP client gets re executed, or a pppoe is re created, etc.

                          Using the service watchdog to 'handle' dpinger makes (to me) ... no sense, and at best will create a connection mess.

                          If, for some reason, pings get filtered out upstream, as ICMP is after all a low priority protocol, or the destination stops replying as it has other things to do, then dpinger will draw the conclusion that the line (uplink) is bad, which can be the case, but maybe it's just not true.
                          dpinger signalling the loss of ICMP replies just means ICMP didn't came back.
                          Is there a better method to test the uplink ? I which there was ...

                          If you suspect that some one or some thing is playing with your ICMP packets, consider disabling the gateway action or even the monitoring. This can have also consequences of course.
                          Or change the settings, making it more 'patient' before it pulls the plug.

                          No "help me" PM's please. Use the forum, the community will thank you.
                          Edit : and where are the logs ??

                          1 Reply Last reply Reply Quote 0
                          • P Offline
                            pwood999
                            last edited by

                            +1 it does sound like issues with the internet connection. I'm using 2.8.0 and not seeing any issues on this or in previous release builds (I don't use Beta).

                            Did have dpinger restarts back on Aug 14th, but that was my VirginMedia connection down for 3 hours. Nothing in the logs since.

                            Maybe your internet provider is routing the ICMP to a not-nearest location causing ping issues ?

                            Try changing the Gateway Monitor IP to something local to you ?

                            R 1 Reply Last reply Reply Quote 0
                            • R Offline
                              reberhar @pwood999
                              last edited by reberhar

                              @pwood999 Hi pwood999 and Gertjan

                              This happens with various service providers and I have changed ping targets. It also happens on various installs in different cities. I have installs in 5 different locations on 9 servers.

                              I also know about the tweaks and the other things you mentioned Gertjan and used them heavily with marginal DSL connections.

                              It happens very infrequently so it is difficult to know how to handle something that works 99% of the time.

                              By the way, 8 of my WAN connections are statics. This is something to think about. I was about to make the 9th static as well, but maybe I will wait. Statics are especially useful with HA. The current DHCP unit is the only one that is not HA.

                              I will be watching 2.8.1.

                              Thanks so much for your suggestions.

                              Roy

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.