Member Down triggering with 0% loss



  • We have been getting notifications like these every day or two recently:

    9:22:32 MONITOR: WANGW is down, omitting from routing group GWGROUP
    8.8.4.4|50.x.x.x|WANGW|515.097ms|647.962ms|0.0%|down

    and <1 minute later:

    9:23:06 4769MONITOR: WANGW is available now, adding to routing group GWGROUP
    8.8.4.4|50.x.x.x|WANGW|390.011ms|637.821ms|0.0%|delay

    Straight pinging 8.8.4.4 is about 15ms within a few minutes of this alert. Obviously load varies throughout the day, etc., however, per https://docs.netgate.com/pfsense/en/latest/routing/multi-wan.html, "Member Down: Triggers when the monitor IP has 100% packet loss." Am I missing something or is this showing 0% loss yet still triggering? Bug? Member Down is the trigger level selected in the gateway group. Do we need to change to a Packet Loss trigger of 100% (or whatever) to accomplish this?



  • Maybe the doc needs to be updated..??

    Go to " SystemRoutingGatewaysEdit" go down to "Advanced" Set your "Latency thresholds" to a higher number.


  • LAYER 8 Netgate

    Yes that triggered on Latency, not Loss. On a high-latency link you will want to adjust those thresholds higher.

    Docs are complete:

    https://docs.netgate.com/pfsense/en/latest/book/routing/gateway-settings.html#latency-thresholds



  • Thanks Derelict.. I did not go look at the doc. (got lazy)

    There ya go teamits :)



  • @Derelict said in Member Down triggering with 0% loss:

    Yes that triggered on Latency, not Loss

    So the doc page I cited should read something like, "Member Down: Triggers when the monitor IP is over latency thresholds"?

    I totally get there is a whole conversation around traffic shaping and so forth that I'm not getting into here, just looking for what Member Down means. :)

    Thanks all.


  • LAYER 8 Netgate

    The book is controlling here (and everywhere). It should be your primary source of documentation.



  • re: book, fair enough. That (https://docs.netgate.com/pfsense/en/latest/book/multiwan/load-balancing-and-failover.html) says "Marks the gateway as down only when it is completely down, past one or both of the higher thresholds configured for the gateway." "the thresholds" I gather would be either packet loss or latency, looking at the higher threshold for each? (vs. the individual choices or "Packet Loss or High Latency" (which doesn't specifically say) looking at the lower threshold?



  • I thought we'd be good now but I am still confused. With the gateway group set to use Packet Loss, and changes applied yesterday, we got these notifications today:

    9:47:06 MONITOR: WANGW is down, omitting from routing group GWGROUP
    8.8.4.4|50.x.x.x|WANGW|506.332ms|600.226ms|0.0%|down

    9:47:39 28411MONITOR: WANGW is available now, adding to routing group GWGROUP
    8.8.4.4|50.x.x.x|WANGW|400.057ms|594.767ms|0.0%|delay



  • I upgraded the device to 2.4.4_3 five days ago and while we did get a real packet loss momentary outage yesterday with the gateway group set to Packet Loss, we just got another alert with 0% loss and high latency. As these seem to be momentary spikes in latency, I guess I will set it back to Member Down and raise the higher latency limit up above the default 500, and will report back if it recurs. Overall not sure the Packet Loss setting is triggering on packet loss...

    FWIW this is on an SG-2440.


Log in to reply