Failback from Primary WAN after failover to Secondary WAN



  • I have a 2 WAN connections configured with failover using gateway groups. This works very well to fail over, however the connections that get opened after the failover do not fail back automatically. I use a metered secondary WAN connection, and so not failing back is a problem as it creates meaningful expense for no reason.

    I noticed that when failover occurs it is actually fairly smooth, with connections restarting themselves very quickly after a failure is detected. I wanted the failback to happen in a similar fashion, vs. taking 1-2 minutes for connections to come back live.

    I did not use default gateway switching because I need to make sure only certain users failover. The rest of the users do not require a high availability connection.

    After searching the forums and watching some hangouts, I came to understand that failback is not implemented as of 2.4.3-RELEASE-p1.

    I got this working and am sharing the procedure for others.

    1. Modify the /etc/rc.kill_states script so that it no longer checks to see if a gateway is down.

    1a. Copy /etc/rc.kill_states to /etc/my_kill_states, and then edit my_kill_states.
    1b. Comment out the if statement in my_kill_states:
    #if (isset($config['system']['gw_down_kill_states'])) {
    and its closing brace line as well
    #}
    1c. Now /etc/my_kill_states will gracefully reset the state table when you pass it an IP address and an interface for the backup WAN. If the primary WAN is up and running, connections will automatically re-establish over the primary WAN. The entire my_kill_states file is duplicated at the end of this post.

    1. I created a script check_backup_wan that can be run by cron every few minutes. This script checks to see if there is live traffic on the backup WAN, and if so, then checks if the primary WAN is functioning. If the primary WAN is functioning with traffic on the backup WAN, it will use my_kill_states to kill connections on the backup WAN gracefully. These connections will then re-establish over the primary WAN.

    2a. My implementation of the check_backup_wan script is below.
    2b. Final step is to run the check_backup_wan script automatically via cron. I think a 2 minute interval is best as it gives time to close out connections.
    ===group
    /root/check_backup_wan

    #!/bin/sh
    # check_backup_wan script
    # mvneta0 is the 2nd WAN interface. mvneta2 is the primary WAN
    # 8.8.4.4 is set as the monitor IP on the primary WAN interface
    # The idea is to get the IP addresses of the primary and secondary WAN interfaces.
    # If the primary WAN IP address is not available, assume the primary WAN is still down.
    # Assuming the primary WAN is still up, check if there any live TCP connections on the backup WAN.
    # If live TCP connections are found on the backup WAN, check that the primary WAN is responding to
    # pings on the monitor IP address.  If the primary WAN is responding to pings, then kill the states
    # on the backup WAN, and they will automatically reconnect over the primary WAN.
    
    check_wan_time=`date "+%Y-%m-%d %H:%M:%S"`
    check_wan=8.8.4.4
    
    wan_ipaddress=`ifconfig mvneta2 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1`
    wan2_ipaddress=`ifconfig mvneta0 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1`
    
    echo 'primary, backup WAN IP address ' ${wan_ipaddress} '(primary) ' ${wan2_ipaddress} '(backup)'
    # check for valid primary WAN IP address.
    if [ -z "${wan_ipaddress}" ]; then
      echo ${check_wan_time} '... primary WAN is still down (no WAN IP)' | tee -a /var/log/check_backup_wan.log
      exit 0
    fi
    
    # check for active connections on backup_wan
    pfctl -i mvneta0 -ss | grep 'tcp'
    wan2_liveconn=`pfctl -i mvneta0 -ss | grep 'tcp'`
    if [ -n "${wan2_liveconn}" ]; then
    # found a tcp connection on the backup wan interface
      ping -c 2 -t 2 -S ${wan_ipaddress} ${check_wan} > /dev/null 2>&1
      wan1_resp=$?
      wan_resp=`expr ${wan1_resp}`
    
      echo 'primary WAN ping check (0 means passed)' ${wan1_resp}
    
      if [ ${wan_resp} -eq 0 ]; then
        echo ${check_wan_time} '... killing states and resetting connections on backup WAN' | tee -a /var/log/check_backup_wan.log
        /etc/my_kill_states mvneta0 ${wan2_ipaddress}
      else
        echo ${check_wan_time} '... primary WAN is still downi (pings failing)' | tee -a /var/log/check_backup_wan.log
      fi
    else
      echo ${check_wan_time} '... no active tcp connections found on backup WAN' | tee -a /var/log/check_backup_wan.log
    fi
    

    /etc/my_kill_states:

    #!/usr/local/bin/php-cgi -f
    <?php
    /*
     * my_kill_states
     * derived from:
     * part of pfSense (https://www.pfsense.org)
     * Copyright (c) 2004-2018 Rubicon Communications, LLC (Netgate)
     * All rights reserved.
     *
     * Licensed under the Apache License, Version 2.0 (the "License");
     * you may not use this file except in compliance with the License.
     * You may obtain a copy of the License at
     *
     * http://www.apache.org/licenses/LICENSE-2.0
     *
     * Unless required by applicable law or agreed to in writing, software
     * distributed under the License is distributed on an "AS IS" BASIS,
     * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     * See the License for the specific language governing permissions and
     * limitations under the License.
     */
    
    /* parse the configuration and include all functions used below */
    require_once("globals.inc");
    require_once("config.inc");
    require_once("interfaces.inc");
    require_once("util.inc");
    
    // Do not process while booting
    if (platform_booting()) {
    	return;
    }
    
    /* Interface address to cleanup states */
    $interface = str_replace("\n", "", $argv[1]);
    
    /* IP address to cleanup states */
    $local_ip = str_replace("\n", "", $argv[2]);
    
    if (empty($interface) || !does_interface_exist($interface)) {
    	log_error("rc.kill_states: Invalid interface '{$interface}'");
    	return;
    }
    
    if (!empty($local_ip)) {
    	list($local_ip, $subnet_bits) = explode("/", $local_ip);
    
    	if (empty($subnet_bits)) {
    		$subnet_bits = "32";
    	}
    
    	if (!is_ipaddr($local_ip)) {
    		log_error("rc.kill_states: Invalid IP address '{$local_ip}'");
    		return;
    	}
    }
    
    # for my_kill_states just assume the gateway is down and rebuild 
    #if (isset($config['system']['gw_down_kill_states'])) {
    	if (!empty($local_ip)) {
    		log_error("rc.kill_states: Removing states for IP {$local_ip}/{$subnet_bits}");
    		$nat_states = exec_command("/sbin/pfctl -i {$interface} -ss | " .
    			"/usr/bin/egrep '\-> +{$local_ip}:[0-9]+ +\->'");
    
    		$cleared_states = array();
    		foreach (explode("\n", $nat_states) as $nat_state) {
    			if (preg_match_all('/([\d\.]+):[\d]+[\s->]+/i', $nat_state, $matches, PREG_SET_ORDER) != 3) {
    				continue;
    			}
    
    			$src = $matches[0][1];
    			$dst = $matches[2][1];
    
    			if (empty($src) || empty($dst) || in_array("{$src},{$dst}", $cleared_states)) {
    				continue;
    			}
    
    			$cleared_states[] = "{$src},{$dst}";
    			pfSense_kill_states($src, $dst);
    		}
    
    		pfSense_kill_states("0.0.0.0/0", "{$local_ip}/{$subnet_bits}");
    		pfSense_kill_states("{$local_ip}/{$subnet_bits}");
    		pfSense_kill_srcstates("{$local_ip}/{$subnet_bits}");
    	}
    	log_error("rc.kill_states: Removing states for interface {$interface}");
    	mwexec("/sbin/pfctl -i {$interface} -Fs", true);
    #}
    

    ===



  • Thank you. I just got my failover WAN working. As it is also a metered connection I had noticed a lot of data use after failback and I was just looking into this.

    Was easy enough to deploy on my system. Just had to change the interface names to match mine.

    I triggered it manually to test and it seems to work fine. Will continue monitoring it.

    I was surprised that this was not already supported.

    thanks
    david


Log in to reply