Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Dpinger controls

    Scheduled Pinned Locked Moved 2.3-RC Snapshot Feedback and Issues - ARCHIVED
    9 Posts 5 Posters 10.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      phil.davis
      last edited by

      1. The GUI still allows input of losslow and losshigh percentages for the acceptable packet loss. Only losshigh is passed on to dpinger, which uses it to alarm if the packet loss goes above losshigh%.
        But in get_dpinger_status() the loss percentage is returned from dpinger and then is compared to losslow to determine if the gateway state should be marked "loss".
        For example, with the default low/high 10/20, if the loss goes above 20% then there will be a real alarm that results in gateway failover. When the loss average drops below 20% dpinger will "unalarm" and gateway failback should happen. However the gateways status/widget will show gateway "loss" if the current loss is above 10%.

      Is that intentional?
      Or should losslow be removed, and everything depend just on losshigh now?

      1. Same for latencylow and latencyhigh msec. latencyhigh is being passed through to dpinger as the alarm point. latencylow is being used in get_dpinger_status() to determine gateway status "delay".

      2. Relationship between the new set of parameters Probe Interval, Loss Interval and Time Period. The text at Additional Information on system_gateways_edit.php needs to be rewritten to explain these. As I understand it,

      a) Probe Interval - how often an echo request (ping) is sent out - default 250ms means there will be 4 pings every second.

      b) Loss Interval - how long to wait for an echo reply before considering that an echo request has been lost. Default 500ms means that if echo replies are taking a while to come back, their might be 2 echo requests outstanding at one time. If you set it to something high, e.g. 2000ms then there might be up to 8 echo requests outstanding at one time. This parameter can be somewhat unrelated to Probe Interval. e.g. if the moitor IP is something very local and should always respond quickly (e.g. well inside 50ms) then you could set Loss Interval to 50ms and nothing bad would happen. Every Probe Interval (250ms) an echo request would be sent out. 99.9% of the time the echo reply will be received in only a few msecs. If the echo reply is not received within 50ms it will be counted as loss. dpinger will be effectively "idle" for another 200ms until it sends out the next echo request.
      Implication: the validation does not need to check that Loss Interval > Probe Interval.

      c) Time Period - the time over which a rolling average of the observed RTT and loss% is calculated. Default 25000ms means, at the default Probe Interval or 250ms, that it will be a rolling average of the results of the last 100 probes.
      Implication: validation should check that (Time Period > Probe Interval) - and it would probably be odd to have (Time Period < 2 * Probe Interval) - that would make the gateway alarm on a single packet delay or loss, and unalarm again on a single packet success.

      Comments please, do I understand this correctly? What is the proper explanation to go on the GUI? What is the proper validation to put in the code?

      As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
      If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

      1 Reply Last reply Reply Quote 0
      • rbgargaR
        rbgarga Developer Netgate Administrator
        last edited by

        @phil.davis:

        1. The GUI still allows input of losslow and losshigh percentages for the acceptable packet loss. Only losshigh is passed on to dpinger, which uses it to alarm if the packet loss goes above losshigh%.
          But in get_dpinger_status() the loss percentage is returned from dpinger and then is compared to losslow to determine if the gateway state should be marked "loss".
          For example, with the default low/high 10/20, if the loss goes above 20% then there will be a real alarm that results in gateway failover. When the loss average drops below 20% dpinger will "unalarm" and gateway failback should happen. However the gateways status/widget will show gateway "loss" if the current loss is above 10%.

        Is that intentional?
        Or should losslow be removed, and everything depend just on losshigh now?

        1. Same for latencylow and latencyhigh msec. latencyhigh is being passed through to dpinger as the alarm point. latencylow is being used in get_dpinger_status() to determine gateway status "delay".

        2. Relationship between the new set of parameters Probe Interval, Loss Interval and Time Period. The text at Additional Information on system_gateways_edit.php needs to be rewritten to explain these. As I understand it,

        a) Probe Interval - how often an echo request (ping) is sent out - default 250ms means there will be 4 pings every second.

        b) Loss Interval - how long to wait for an echo reply before considering that an echo request has been lost. Default 500ms means that if echo replies are taking a while to come back, their might be 2 echo requests outstanding at one time. If you set it to something high, e.g. 2000ms then there might be up to 8 echo requests outstanding at one time. This parameter can be somewhat unrelated to Probe Interval. e.g. if the moitor IP is something very local and should always respond quickly (e.g. well inside 50ms) then you could set Loss Interval to 50ms and nothing bad would happen. Every Probe Interval (250ms) an echo request would be sent out. 99.9% of the time the echo reply will be received in only a few msecs. If the echo reply is not received within 50ms it will be counted as loss. dpinger will be effectively "idle" for another 200ms until it sends out the next echo request.
        Implication: the validation does not need to check that Loss Interval > Probe Interval.

        c) Time Period - the time over which a rolling average of the observed RTT and loss% is calculated. Default 25000ms means, at the default Probe Interval or 250ms, that it will be a rolling average of the results of the last 100 probes.
        Implication: validation should check that (Time Period > Probe Interval) - and it would probably be odd to have (Time Period < 2 * Probe Interval) - that would make the gateway alarm on a single packet delay or loss, and unalarm again on a single packet success.

        Comments please, do I understand this correctly? What is the proper explanation to go on the GUI? What is the proper validation to put in the code?

        #1 - losslow is used to show a warning on gateway widget / gateway status. If you have losslow = 10 and losshigh = 20:

        loss < 10 - status = green (Online)
        loss >= 10 and loss < 20 - status = yellow (loss)
        loss >= 20 - status = red (down)

        #2 - Same as #1. Dpinger only have a binary alarm (on/off), but php code deal with the alert start when alarm is off but latency or loss is higher than low value

        #3

        a) Correct

        b) Yeah, you are right. Will you submit a PR or should I change it?

        c) Your idea is good, I'm in favor of deny (Time Period < 2 * Probe Interval), same question, you do or I do? :)

        And please submit text changes you judge necessary, your english is much better than mine.

        Thank you!

        Renato Botelho

        1 Reply Last reply Reply Quote 0
        • G
          grandrivers
          last edited by

          getting a warning in gateways of latency its set at 200/500 and ping is coming back at 25.xxx ms tried moving 50 2000/5000 still shows up

          Dec 11 09:17:17 dpinger send_interval 1000ms report_interval 1000ms loss_interval 10000ms time_period 25000ms alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.4.4 bind_addr 192.168.254.1 alert_cmd "/etc/rc.gateway_alarm DSL_DHCP"

          also the know issue of no rrd data yet

          pfsense plus 25.03 super micro A1SRM-2558F
          C2558 32gig ECC  60gig SSD

          1 Reply Last reply Reply Quote 0
          • P
            phil.davis
            last edited by

            a) Correct

            b) Yeah, you are right. Will you submit a PR or should I change it?

            c) Your idea is good, I'm in favor of deny (Time Period < 2 * Probe Interval), same question, you do or I do? :)

            And please submit text changes you judge necessary, your english is much better than mine.

            Pull request https://github.com/pfsense/pfsense/pull/2207

            The remaining issue I see is the setting of "loss interval" and "latencyhigh". Currently the default loss interval is 500 and latencyhigh is also 500. This makes no sense to me. If a probe comes back in > 500ms then the thread that is waiting for the reply will have given up (loss interval has expired). So any packets with an RTT > 500ms will be considered lost. Therefore there will be no packets recorded with an RTT > 500ms. Therefore the average latency can never exceed 500ms = latencyhigh.
            It seems to me that "loss interval" needs to be reasonably higher than "latencyhigh" in any sensible configuration.
            Thoughts?

            As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
            If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

            1 Reply Last reply Reply Quote 0
            • P
              phil.davis
              last edited by

              Looking at dpinger source code, the "loss interval" and "latencyhigh" is not as silly as I first thought. The recv_thread is waiting for all incoming echo replies, so they do not actually timeout after "loss interval". It is just when the alarm or report calculation is done, any entries in the array of packets sent that do not have a reply and are older than "loss interval" are counted as lost.

              On a high-latency link, some of those "lost" reply packets might actually show up some time later, and at the next calculation they will be included in the average latency calculation and no longer be included in the "packet loss percentage" calculation.

              But still it seems weird if these 2 parameters are set the same (or nearly the same) - it results in packets being counted as "lost" at first, then later they just turn out to be "high latency".

              As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
              If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

              1 Reply Last reply Reply Quote 0
              • rbgargaR
                rbgarga Developer Netgate Administrator
                last edited by

                @grandrivers:

                getting a warning in gateways of latency its set at 200/500 and ping is coming back at 25.xxx ms tried moving 50 2000/5000 still shows up

                Dec 11 09:17:17 dpinger send_interval 1000ms report_interval 1000ms loss_interval 10000ms time_period 25000ms alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.4.4 bind_addr 192.168.254.1 alert_cmd "/etc/rc.gateway_alarm DSL_DHCP"

                also the know issue of no rrd data yet

                I pushed a fix for Latency and Loss bad math. Thanks for reporting.

                Also, RRD is now working.

                Renato Botelho

                1 Reply Last reply Reply Quote 0
                • G
                  grandrivers
                  last edited by

                  looking good now thanks
                  will keep an eye out for issues

                  pfsense plus 25.03 super micro A1SRM-2558F
                  C2558 32gig ECC  60gig SSD

                  1 Reply Last reply Reply Quote 0
                  • luckman212L
                    luckman212 LAYER 8
                    last edited by

                    How do we get these latest fixes? just do a gitsync?

                    1 Reply Last reply Reply Quote 0
                    • C
                      cmb
                      last edited by

                      gitsync only if you're on a new enough snapshot that you have the dpinger binary. The latest snapshot now available should be new enough to have caught it all.

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.