Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Some connections survive killing all states on Tier 1 gateway recovery

    Scheduled Pinned Locked Moved Routing and Multi WAN
    3 Posts 2 Posters 507 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      dbykov
      last edited by

      Hi.

      I have a pfSense 2.5.0 x86 machine with 2 WAN connections.
      I set up a gateway group WAN_GWs with failover:
      WAN1: Tier 1 (~1gbit PPPoE connection over a GPON modem)
      WAN2: Tier 2 (100mbit Ethernet connection).

      NAT rules:
      LAN to WAN1
      LAN to WAN2

      Firewall rule:
      PASS LAN to any, gateway: WAN_GWs group

      Both "Flush all states when a gateway goes down" and "Reset all states if WAN IP Address changes" checkboxes in the System -> Advanced menu are checked.

      I my LAN I have a machine that has Wireguard connection.

      When both WAN links are up, everything works fine.

      If I disconnect the fiber link on WAN1 modem (the Ethernet interface on pfSense stays up), the corresponding gateway changes status to "Offline", all the traffic from LAN, including the aforementioned Wireguard connection now goes over the WAN2, like it should.

      Then, I restore the WAN1 link, the WAN1 gateway status becomes "Online" again, all new connections are routed over the WAN1, but the existing Wireguard connection is not, it remains on the slow WAN2 gateway (I see it both by ping values and states dump in Web GUI).

      In the System -> General log I see this:

      Mar 31 22:25:00	php-fpm	338	/rc.newwanip: Gateway, switch to: WAN1
      Mar 31 22:25:00	php-fpm	338	/rc.newwanip: Default gateway setting Interface WAN1 Gateway as default.
      Mar 31 22:25:00	php-fpm	338	/rc.newwanip: IP Address has changed, killing all states (ip_change_kill_states is set).
      Mar 31 22:25:00	check_reload_status	376	Reloading filter
      

      and I can confirm that states are flushed indeed - for example, my SSH connection to pfSense breaks at this moment, yet the Wireguard connection somehow survives and its traffic is still going over WAN2.

      If after WAN1 recovery I kill the states manually running the "pfctl -F states", the Wireguard connection finally switches to WAN1.

      I took a look at the /etc/rc.newwanip, and it actually calls the same "pfctl -F states" inside (line 236):

      /* If the IP address changed, kill old states after rules and routing have been updated */
      if ($curwanip != $oldip) {
      	if (isset($config['system']['ip_change_kill_states'])) {
      		log_error("IP Address has changed, killing all states (ip_change_kill_states is set).");
      		pfSense_kill_states(utf8_encode($oldip));
      		filter_flush_state_table(); // <-- KILLING ALL STATES HERE
      	} else {
      		log_error("IP Address has changed, killing states on former IP Address $oldip.");
      		pfSense_kill_states(utf8_encode($oldip));
      	}
      }
      

      /etc/inc/filter.inc, line 1344:

      function filter_flush_state_table() {
      	return mwexec("/sbin/pfctl -F state");
      }
      

      So, basically it calls the very same command I execute manually (except I use "-F states" as the man pfctl(8) says, but since there is no ambiguity it should work either way), yet it doesn't produce the same effect.

      Maybe there is some racing condition somewhere or am I just missing something here?

      1 Reply Last reply Reply Quote 0
      • D
        dbykov
        last edited by dbykov

        OK, I implemented a workaround for this problem.

        I wrote this little script:

        <?php
        
        require_once("interfaces.inc");
        
        $cached_def_gw_if_file = '/tmp/cached_def_gw_if';
        
        $current_gw_ip = route_get_default('inet');
        $current_gw_if = get_gateway_interface($current_gw_ip);
        $old_gw_if = file_get_contents($cached_def_gw_if_file);
        
        if ($old_gw_if === false) {
        	file_put_contents($cached_def_gw_if_file, $current_gw_if);
        	exit;
        }
        
        // Just in case the file was edited manually for test purposes and contains some whitespace
        $old_gw_if = trim($old_gw_if);
        
        if ($current_gw_if != $old_gw_if) {
        	log_error("Default gateway interface changed from $old_gw_if to $current_gw_if, killing old states...");
        	mwexec("/sbin/pfctl -F states");
        	file_put_contents($cached_def_gw_if_file, $current_gw_if);
        }
        
        function get_gateway_interface($gateway_ip) {
        	$interfaces = get_interfaces_with_gateway();
        	
        	foreach ($interfaces as $interface) {
        		$interface_gw = get_interface_gateway($interface);
        		
        		if ($gateway_ip == $interface_gw) {
        			$real_interface = get_real_interface($interface);			
        			return $real_interface;
        		}
        	}
        }
        
        ?>
        

        and created a cron task which executes this script every minute.

        I am not sure how reliable this "solution" will be, but a dozen of gateway switchover has shown that everything works as expected, the Wireguard connection is routed over Tier 1 gateway after it recovers.

        During my little research, I also found this post, but unfortunately samtoopid's script didn't work for me - it only kills states on WAN interface, and in my case it's not enough, only flushing all states makes Wireguard connection switch to new Tier 1 gateway.

        1 Reply Last reply Reply Quote 0
        • V
          Viper_Rus
          last edited by

          Does the script work on the latest version? It is very annoying that all VPNs remain on the backup line after the restoration of the main wan.

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.