Failover didn't fall back to tier1 after downtime



  • -Tier1 came back online
    -Gateway group showed both 'online'
    -Default route = Tier1
    –---Default-gateway switching = enabled

    still, for whatever reason, it kept sending "some" clients out through tier2 - this was still happening 10hours after the last gateway event.

    to fix it i clicked "reset all states" in the GUI.
    (it's impossible that the states were still alive from before the gateway-event, because nobody was around at 2am in the morning)

    
    May 2 13:38:35 	dpinger 		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 40% dest_addr 195.130.130.11 bind_addr 81.82.213.131 identifier "WAN_TELENET0 "
    May 2 13:38:35 	dpinger 		send_interval 500ms loss_interval 10000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 40% dest_addr 195.238.2.21 bind_addr 192.168.5.2 identifier "WAN_SCARLETGW "
    May 1 02:14:26 	dpinger 		WAN_TELENET0 195.130.130.11: Clear latency 9788us stddev 1395us loss 29%
    May 1 02:13:26 	dpinger 		WAN_TELENET0 195.130.130.11: Alarm latency 9607us stddev 1263us loss 41% 
    
    

    The values set for dpinger are those that made APINGER work somewhat reliably.
    perhaps the values need to be set to sane values, now that we have a good pinger ?

    That said, the system has been up for 20 days, and a couple of failover events took place … this is the first time it didn't fall back.

    suggestions?



  • since first post it happened again, 3 times to be exact.
    i've reset all dpinger value's/variables to their default settings by GUI.

    dpinger clears the error, but pfsense keeps sending traffic towards the Tier2 gateway (identical as first post).
    i'm thinking the 'clear' isn't (always/under every circumstance) picked up by the backend code.

    today i changed the trigger level from 'member down' -to-> 'packetloss or high latency'.
    will update this thread with updates in the next couple of days/weeks



  • Maybe you`re hitting this: https://redmine.pfsense.org/issues/6110 ?



  • perhaps, but no PPP(oE) involved. it is possible that default gateway switching is still enabled (from back in the day when there was a transparent proxy running). will check if disabling this makes a difference



  • Guessing it's already-established connections that are staying there maybe? That'd be expected.

    Two things influence traffic routing. Guessing your clients are being routed via a gateway group, which you can verify on the back end with:

    grep route-to /tmp/rules.debug
    

    The other thing would be the default gateway, for traffic matching firewall rules set to "default" rather than a gateway group. Check Diag>Routes to verify that.



  • @cmb:

    Guessing it's already-established connections that are staying there maybe? That'd be expected.

    it kept sending new clients towards tier2 for days after the gateway event … can't have been (all) established connections

    @cmb:

    Two things influence traffic routing. Guessing your clients are being routed via a gateway group, which you can verify on the back end with:

    grep route-to /tmp/rules.debug
    

    The other thing would be the default gateway, for traffic matching firewall rules set to "default" rather than a gateway group. Check Diag>Routes to verify that.

    will check the rules.debug when/if it happens next


Log in to reply