Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    Failback from Primary WAN after failover to Secondary WAN

    Routing and Multi WAN
    7
    8
    1201
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      samtoopid last edited by

      I have a 2 WAN connections configured with failover using gateway groups. This works very well to fail over, however the connections that get opened after the failover do not fail back automatically. I use a metered secondary WAN connection, and so not failing back is a problem as it creates meaningful expense for no reason.

      I noticed that when failover occurs it is actually fairly smooth, with connections restarting themselves very quickly after a failure is detected. I wanted the failback to happen in a similar fashion, vs. taking 1-2 minutes for connections to come back live.

      I did not use default gateway switching because I need to make sure only certain users failover. The rest of the users do not require a high availability connection.

      After searching the forums and watching some hangouts, I came to understand that failback is not implemented as of 2.4.3-RELEASE-p1.

      I got this working and am sharing the procedure for others.

      1. Modify the /etc/rc.kill_states script so that it no longer checks to see if a gateway is down.

      1a. Copy /etc/rc.kill_states to /etc/my_kill_states, and then edit my_kill_states.
      1b. Comment out the if statement in my_kill_states:
      #if (isset($config['system']['gw_down_kill_states'])) {
      and its closing brace line as well
      #}
      1c. Now /etc/my_kill_states will gracefully reset the state table when you pass it an IP address and an interface for the backup WAN. If the primary WAN is up and running, connections will automatically re-establish over the primary WAN. The entire my_kill_states file is duplicated at the end of this post.

      1. I created a script check_backup_wan that can be run by cron every few minutes. This script checks to see if there is live traffic on the backup WAN, and if so, then checks if the primary WAN is functioning. If the primary WAN is functioning with traffic on the backup WAN, it will use my_kill_states to kill connections on the backup WAN gracefully. These connections will then re-establish over the primary WAN.

      2a. My implementation of the check_backup_wan script is below.
      2b. Final step is to run the check_backup_wan script automatically via cron. I think a 2 minute interval is best as it gives time to close out connections.
      ===group
      /root/check_backup_wan

      #!/bin/sh
      # check_backup_wan script
      # mvneta0 is the 2nd WAN interface. mvneta2 is the primary WAN
      # 8.8.4.4 is set as the monitor IP on the primary WAN interface
      # The idea is to get the IP addresses of the primary and secondary WAN interfaces.
      # If the primary WAN IP address is not available, assume the primary WAN is still down.
      # Assuming the primary WAN is still up, check if there any live TCP connections on the backup WAN.
      # If live TCP connections are found on the backup WAN, check that the primary WAN is responding to
      # pings on the monitor IP address.  If the primary WAN is responding to pings, then kill the states
      # on the backup WAN, and they will automatically reconnect over the primary WAN.
      
      check_wan_time=`date "+%Y-%m-%d %H:%M:%S"`
      check_wan=8.8.4.4
      
      wan_ipaddress=`ifconfig mvneta2 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1`
      wan2_ipaddress=`ifconfig mvneta0 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1`
      
      echo 'primary, backup WAN IP address ' ${wan_ipaddress} '(primary) ' ${wan2_ipaddress} '(backup)'
      # check for valid primary WAN IP address.
      if [ -z "${wan_ipaddress}" ]; then
        echo ${check_wan_time} '... primary WAN is still down (no WAN IP)' | tee -a /var/log/check_backup_wan.log
        exit 0
      fi
      
      # check for active connections on backup_wan
      pfctl -i mvneta0 -ss | grep 'tcp'
      wan2_liveconn=`pfctl -i mvneta0 -ss | grep 'tcp'`
      if [ -n "${wan2_liveconn}" ]; then
      # found a tcp connection on the backup wan interface
        ping -c 2 -t 2 -S ${wan_ipaddress} ${check_wan} > /dev/null 2>&1
        wan1_resp=$?
        wan_resp=`expr ${wan1_resp}`
      
        echo 'primary WAN ping check (0 means passed)' ${wan1_resp}
      
        if [ ${wan_resp} -eq 0 ]; then
          echo ${check_wan_time} '... killing states and resetting connections on backup WAN' | tee -a /var/log/check_backup_wan.log
          /etc/my_kill_states mvneta0 ${wan2_ipaddress}
        else
          echo ${check_wan_time} '... primary WAN is still downi (pings failing)' | tee -a /var/log/check_backup_wan.log
        fi
      else
        echo ${check_wan_time} '... no active tcp connections found on backup WAN' | tee -a /var/log/check_backup_wan.log
      fi
      

      /etc/my_kill_states:

      #!/usr/local/bin/php-cgi -f
      <?php
      /*
       * my_kill_states
       * derived from:
       * part of pfSense (https://www.pfsense.org)
       * Copyright (c) 2004-2018 Rubicon Communications, LLC (Netgate)
       * All rights reserved.
       *
       * Licensed under the Apache License, Version 2.0 (the "License");
       * you may not use this file except in compliance with the License.
       * You may obtain a copy of the License at
       *
       * http://www.apache.org/licenses/LICENSE-2.0
       *
       * Unless required by applicable law or agreed to in writing, software
       * distributed under the License is distributed on an "AS IS" BASIS,
       * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
       * See the License for the specific language governing permissions and
       * limitations under the License.
       */
      
      /* parse the configuration and include all functions used below */
      require_once("globals.inc");
      require_once("config.inc");
      require_once("interfaces.inc");
      require_once("util.inc");
      
      // Do not process while booting
      if (platform_booting()) {
      	return;
      }
      
      /* Interface address to cleanup states */
      $interface = str_replace("\n", "", $argv[1]);
      
      /* IP address to cleanup states */
      $local_ip = str_replace("\n", "", $argv[2]);
      
      if (empty($interface) || !does_interface_exist($interface)) {
      	log_error("rc.kill_states: Invalid interface '{$interface}'");
      	return;
      }
      
      if (!empty($local_ip)) {
      	list($local_ip, $subnet_bits) = explode("/", $local_ip);
      
      	if (empty($subnet_bits)) {
      		$subnet_bits = "32";
      	}
      
      	if (!is_ipaddr($local_ip)) {
      		log_error("rc.kill_states: Invalid IP address '{$local_ip}'");
      		return;
      	}
      }
      
      # for my_kill_states just assume the gateway is down and rebuild 
      #if (isset($config['system']['gw_down_kill_states'])) {
      	if (!empty($local_ip)) {
      		log_error("rc.kill_states: Removing states for IP {$local_ip}/{$subnet_bits}");
      		$nat_states = exec_command("/sbin/pfctl -i {$interface} -ss | " .
      			"/usr/bin/egrep '\-> +{$local_ip}:[0-9]+ +\->'");
      
      		$cleared_states = array();
      		foreach (explode("\n", $nat_states) as $nat_state) {
      			if (preg_match_all('/([\d\.]+):[\d]+[\s->]+/i', $nat_state, $matches, PREG_SET_ORDER) != 3) {
      				continue;
      			}
      
      			$src = $matches[0][1];
      			$dst = $matches[2][1];
      
      			if (empty($src) || empty($dst) || in_array("{$src},{$dst}", $cleared_states)) {
      				continue;
      			}
      
      			$cleared_states[] = "{$src},{$dst}";
      			pfSense_kill_states($src, $dst);
      		}
      
      		pfSense_kill_states("0.0.0.0/0", "{$local_ip}/{$subnet_bits}");
      		pfSense_kill_states("{$local_ip}/{$subnet_bits}");
      		pfSense_kill_srcstates("{$local_ip}/{$subnet_bits}");
      	}
      	log_error("rc.kill_states: Removing states for interface {$interface}");
      	mwexec("/sbin/pfctl -i {$interface} -Fs", true);
      #}
      

      ===

      1 Reply Last reply Reply Quote 3
      • L
        lovingHDTV last edited by

        Thank you. I just got my failover WAN working. As it is also a metered connection I had noticed a lot of data use after failback and I was just looking into this.

        Was easy enough to deploy on my system. Just had to change the interface names to match mine.

        I triggered it manually to test and it seems to work fine. Will continue monitoring it.

        I was surprised that this was not already supported.

        thanks
        david

        1 Reply Last reply Reply Quote 0
        • D
          dwr953topfsense last edited by

          Nice scripts. Is this still necessary in newer versions?

          Thanks

          1 Reply Last reply Reply Quote 0
          • M
            mjh_ca last edited by mjh_ca

            Still required as of 2.4.5. It is a needed feature in the core; to reset states on primary WAN recovery. This is particularly a problem for IPSec connections.

            See:

            • https://redmine.pfsense.org/issues/855
            • https://redmine.pfsense.org/issues/6370
            1 Reply Last reply Reply Quote 0
            • C
              cwitter last edited by

              Thanks for posting this script.
              Two hopefully simple questions:

              1. Is there anything that needs to be done to rotate the log "/var/log/check_backup_wan.log" or will PF sense take of that for me.
              2. Will the survive pf sense upgrades or do the scripts need to be restored?

              Thanks so much,

              Craig

              1 Reply Last reply Reply Quote 0
              • A
                Asulu last edited by Asulu

                Nice scirpt, works well if you use 2. wan as failover only.
                How whould this work with policy based routing?

                I use my low latency wan for gaming and my other wan for downloads only.
                Unbenannt.PNG

                When the gaming wan is down there is a group (WAN1GW) with failover to wan2 which works fine. If the Gaming wan is back up it stays on wan2.

                Is tehre a possibility to force it to my Gamingwan again?

                1 Reply Last reply Reply Quote 0
                • P
                  pfpv last edited by

                  So, this is still needed in 2.5.0. I am surprised. So few people use failover?

                  The /etc/rc.kill_states script was very slightly modified in 2.5.0 (added utf8_encode to IP variables). I recreated the /etc/my_kill_states and left the /root/check_backup_wan untouched (they weren't wiped out after the upgrade) and everything seems to work as in 2.4.5.

                  I hope the next build will have this or something better built-in.

                  1 Reply Last reply Reply Quote 0
                  • A
                    Asulu last edited by

                    Yes it is:
                    https://redmine.pfsense.org/issues/855

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post

                    Products

                    • Platform Overview
                    • TNSR
                    • pfSense
                    • Appliances

                    Services

                    • Training
                    • Professional Services

                    Support

                    • Subscription Plans
                    • Contact Support
                    • Product Lifecycle
                    • Documentation

                    News

                    • Media Coverage
                    • Press
                    • Events

                    Resources

                    • Blog
                    • FAQ
                    • Find a Partner
                    • Resource Library
                    • Security Information

                    Company

                    • About Us
                    • Careers
                    • Partners
                    • Contact Us
                    • Legal
                    Our Mission

                    We provide leading-edge network security at a fair price - regardless of organizational size or network sophistication. We believe that an open-source security model offers disruptive pricing along with the agility required to quickly address emerging threats.

                    Subscribe to our Newsletter

                    Product information, software announcements, and special offers. See our newsletter archive to sign up for future newsletters and to read past announcements.

                    © 2021 Rubicon Communications, LLC | Privacy Policy