Dpinger - Probe & Alert Interval Relationship

  • https://pfsense.home/system_gateways_edit.php

    Advanced - Additional information

    "The alert interval must be greater than or equal to the probe interval. There is no point checking for alerts more often than probes are done."

    Perhaps if the two are coordinated events.  But if they not …


    Probe interval 10 seconds
    Alert interval 10 seconds

    t0 minus 1 alert check
    t0 probe fail
    t9 alert check
    t10 probe
    t19 alert check

    Wouldn't it be better to receive the alert sooner than the probe interval?
    Maybe like this.

    Probe interval 10 seconds
    Alert interval 5 seconds

    t0 minus 1 alert check
    t0 probe fail
    t4 alert check
    t9 alert check
    t10 probe
    t14 alert check

  • The information section could possibly offer some additional guidance. While it's generally true that there is no point in checking for alerts more often than echo requests are sent, it's actually a bit more complicated than that.

    Dpinger works quite differently than apinger or smokeping. Rather than a "probe" that fires off and processes a handful of echo request/replies all at once, dpinger maintains a rolling array of echo requests spaced on the send interval. In other words, instead of waking up every second and sending 4 echo requests at once, dpinger sends an echo request every 250 milliseconds. When dpinger receives an echo reply, the time difference between the request packet and reply packet (latency) is recorded. But there is nothing that records a reply/request as permanently lost.

    When the alert check is made, or a report is generated, dpinger goes through the array and examines each echo request. If a reply has been received, it is used as part of the overall latency calculation. If a reply has not yet been received, the amount of time since the request is compared against the loss interval. If it is greater than the loss interval, the request/reply is counted as lost in the current report. But this is not a permanent decision. In subsequent reports, if a the missing reply has been received, its latency will be used instead of being counted as lost.

    It's important to keep in mind that latency and loss are reported as averages across the entire set. The default time period for dpinger is 30 seconds, with an echo request being sent every 250 milliseconds. This means that the latency and loss will be reported as averages across 115-120 samples. The alert check runs every second by default. So each time, the 4 oldest entries in the set have been replaced by the 4 newest ones.

    Btw, if you want to have accurate loss reporting, it is important that the number of samples be sufficient. If you want to achieve 1% loss resolution, you have need more than 100 samples in the set. The actual calculation for loss resolution is:

    100 * send_interval / (time_period - loss_interval)

    The default settings for dpinger produce report loss with an accuracy of 0.87%.

    I would generally recommend staying with the default values unless you are monitoring high latency links.

  • After reviewing the dpinger source code and doing some associated changes/fixes to the pfSense GUI front-end, I looked at the previous text explanation of the parameter relationships, added some validation checks in the GUI for stuff that I thought "must be on in any use case I can think of", and then wrote that text explanation based on what I observed.

    If anyone has later thoughts about extra use cases that should be allowed or better wording of the explanation… feel free to make suggestions and PRs...

Log in to reply