Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Failover didn't fall back to tier1 after downtime

    Scheduled Pinned Locked Moved 2.3.1 Snapshots Testing and Feedback - ARCHIVED
    6 Posts 3 Posters 3.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H
      heper
      last edited by

      -Tier1 came back online
      -Gateway group showed both 'online'
      -Default route = Tier1
      –---Default-gateway switching = enabled

      still, for whatever reason, it kept sending "some" clients out through tier2 - this was still happening 10hours after the last gateway event.

      to fix it i clicked "reset all states" in the GUI.
      (it's impossible that the states were still alive from before the gateway-event, because nobody was around at 2am in the morning)

      
      May 2 13:38:35 	dpinger 		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 40% dest_addr 195.130.130.11 bind_addr 81.82.213.131 identifier "WAN_TELENET0 "
      May 2 13:38:35 	dpinger 		send_interval 500ms loss_interval 10000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 40% dest_addr 195.238.2.21 bind_addr 192.168.5.2 identifier "WAN_SCARLETGW "
      May 1 02:14:26 	dpinger 		WAN_TELENET0 195.130.130.11: Clear latency 9788us stddev 1395us loss 29%
      May 1 02:13:26 	dpinger 		WAN_TELENET0 195.130.130.11: Alarm latency 9607us stddev 1263us loss 41% 
      
      

      The values set for dpinger are those that made APINGER work somewhat reliably.
      perhaps the values need to be set to sane values, now that we have a good pinger ?

      That said, the system has been up for 20 days, and a couple of failover events took place … this is the first time it didn't fall back.

      suggestions?

      1 Reply Last reply Reply Quote 0
      • H
        heper
        last edited by

        since first post it happened again, 3 times to be exact.
        i've reset all dpinger value's/variables to their default settings by GUI.

        dpinger clears the error, but pfsense keeps sending traffic towards the Tier2 gateway (identical as first post).
        i'm thinking the 'clear' isn't (always/under every circumstance) picked up by the backend code.

        today i changed the trigger level from 'member down' -to-> 'packetloss or high latency'.
        will update this thread with updates in the next couple of days/weeks

        1 Reply Last reply Reply Quote 0
        • M
          maverick_slo
          last edited by

          Maybe you`re hitting this: https://redmine.pfsense.org/issues/6110 ?

          1 Reply Last reply Reply Quote 0
          • H
            heper
            last edited by

            perhaps, but no PPP(oE) involved. it is possible that default gateway switching is still enabled (from back in the day when there was a transparent proxy running). will check if disabling this makes a difference

            1 Reply Last reply Reply Quote 0
            • C
              cmb
              last edited by

              Guessing it's already-established connections that are staying there maybe? That'd be expected.

              Two things influence traffic routing. Guessing your clients are being routed via a gateway group, which you can verify on the back end with:

              grep route-to /tmp/rules.debug
              

              The other thing would be the default gateway, for traffic matching firewall rules set to "default" rather than a gateway group. Check Diag>Routes to verify that.

              1 Reply Last reply Reply Quote 0
              • H
                heper
                last edited by

                @cmb:

                Guessing it's already-established connections that are staying there maybe? That'd be expected.

                it kept sending new clients towards tier2 for days after the gateway event … can't have been (all) established connections

                @cmb:

                Two things influence traffic routing. Guessing your clients are being routed via a gateway group, which you can verify on the back end with:

                grep route-to /tmp/rules.debug
                

                The other thing would be the default gateway, for traffic matching firewall rules set to "default" rather than a gateway group. Check Diag>Routes to verify that.

                will check the rules.debug when/if it happens next

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.