Dual WAN Failover doesn't failover back to WAN 1 [Resolved]



  • Does this feature work or don't? Should be a simple set up and work without problems. It doesn't work in any of my firewalls, not a single one. I put a brand new Protectli at my office without any extra configuration but Dual WAN Failover. It switches to WAN 2 when WAN 1 goes down but doesn't switch to WAN 1 from WAN 2 when WAN 1 is up. What is the problem?



  • Hello!

    I have a simple dual-wan setup and it works without problems.

    John


  • LAYER 8 Global Moderator

    Going to move this to multiwan section. If you want help to why your setup is not working. Some details would be most helpful



  • A simple tip to check if there a states alive when the wan1 comes back alive

    New states will be on WAN1



  • This is a testing firewall no other configuratios besides Failover. WAN 1 Comcast DHCP, WAN 2 (OPT1) Cellular modem DHCP
    6.PNG 5.PNG 4.PNG 3.PNG 2.PNG 1.PNG



  • There are many discussions on this. Here is one example with a script to help kill states on the backup wan when the main wan comes back up.
    https://forum.netgate.com/topic/84269/multi-wan-gateway-failover-not-switching-back-to-tier-1-gw-after-back-online/87

    Unless you have some other problem, the failover to WAN2 and switch back WAN1 should be occurring. A problem many have seen, including myself is that the states which were on the backup WAN2 connection remain unless they are manually killed or naturally die over time. This could make it appear like the main WAN1 is not being used, but that's not the case. As mentioned by @noplan the new states should be on the primary WAN1 if it's back up and running.

    Edit: This doesn't really impact my scenario all that much since we typically have very little traffic. So when an event does occur and the primary WAN is back up, I go to Diagnostic => states and filter for my secondary WAN interface IP. Then I kill all those states to make sure no states are remaining on the link I don't want to be used.


  • LAYER 8 Global Moderator

    @Raffi_ said in Dual WAN Failover doesn't failover back to WAN 1:

    states which were on the backup WAN2 connection remain unless they are manually killed or naturally die over time

    Yeah this would be common misconception about not switching back to primary..



  • Why they wouldn't be killed automatically, lets say in 2 mins? I just got busy with something else can't check the states. Yesterday when I checked it wouldn't switch even after about an hour.



  • @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    Why they wouldn't be killed automatically, lets say in 2 mins? I just got busy with something else can't check the states. Yesterday when I checked it wouldn't switch even after about an hour.

    Because killing states is typically something you don't want to do unless the client/server connection is truly dead. You should be able to setup an automatic method to kill those states with the script mentioned in that thread. I never tried it, but you can read what other are saying about it there.

    Edit: That thread was only one example. If that doesn't solve the issue some searching will show other threads on the topic, possibly with other scripts if I recall.


  • LAYER 8 Global Moderator

    @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    Yesterday when I checked it wouldn't switch even after about an hour.

    Did you check that new states where using the wan 1? A State is really not going to die unless traffic stops for a long time, or the session is ended by the server or client with fin or rst, etc.

    So if you were checking via say a browser or something and what your IP was like going to whatsmyip . com or something.. That state would be still using the wan 2, and traffic would continue to route out that connection.

    You would need to make sure you shutdown any existing states using wan 2, or make sure you bring up a new session to validate which wan path you were talking.



  • So, in real setup in small offices where i have pfsense. I got static IP WAN 1 and DHCP WAN 2.
    I have IPsec, and IP Phones service that we pay for. When WAN 1 comes up, I would like Failover to switch to it lets say in 5 minutes. So that IPsec would up and running as fast as possible. I am not script savvy.



  • @pfrickroll If phones or vpn keep sending keepalives, these connections will only switch to wan1 either by manually killing them, or wan2 goes down.
    This is a wanted feature. Connections should be not dropped while exchanging data.



  • @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    So, in real setup in small offices where i have pfsense. I got static IP WAN 1 and DHCP WAN 2.
    I have IPsec, and IP Phones service that we pay for. When WAN 1 comes up, I would like Failover to switch to it lets say in 5 minutes. I am not script savvy.

    Unfortunately, there is no GUI method to do what you're asking for automatically. I wish there was too. You don't have to be script savvy. Did you even look at the thread? They spell everything out for you. I'm not script savvy at all but I got someone else's script running on pfSense (different script) but it's the same idea. If I can do it, anyone can do it. I believe in you :)


  • LAYER 8 Global Moderator

    Well in the default normal optimization for states.. Once a state is established it will stay open, even without any traffic for 24 hours

    https://pfsense-docs.readthedocs.io/en/latest/config/advanced-setup.html

    So unless the server/client involved in the conversation close the session/state with fin, or rst the state will stay open.. You could adjust the timeouts for established, but even in the aggressive mode your still looking at 5 hours.. For an established state without any traffic - and that counter would restart every time there is any traffic on that session.

    If you want all traffic to switch back to wan 1 after it comes up by forcing it - you would really need to clear the states for anything using wan2



  • @Raffi_ I skimmed through it but didnt find script there. I am just doing few things at a time at the moment.


  • LAYER 8 Global Moderator

    That link he provided took you right to the post with the script

    #!/bin/sh
    
    # get active gateway and current time
    CURRENT_TIME="$(date +"%c")"
    CURRENT_GW="$(netstat -rn | grep default | awk '{print $4}')"
    
    if [ $CURRENT_GW = "em2" ]; then
    	#check if WAN1 is up or not
    	WAN1_STATUS="$(pfSsh.php playback gatewaystatus brief | grep WANGW | awk '{print $2}')"
    	if [ $WAN1_STATUS = "none" ]; then
    		#WAN1 is back online, stop/start WAN2
    		echo "$CURRENT_TIME: Bringing down WAN2"
    		ifconfig em2 down
    		echo "$CURRENT_TIME: Sleeping for 30s"
    		sleep 30
    		echo "$CURRENT_TIME: Bringing up WAN2"
    		ifconfig em2 up
    	else
    		echo "$CURRENT_TIME: WAN1 is still down"
    	fi
    else
    	echo "$CURRENT_TIME: Nothing to do!"
    fi
    
    
    

    And just below post was another with the cron info and slightly modified script :) so not sure what link you followed?



  • Hello!

    There is also a built-in script /etc/rc.kill_states that can be modified.

    https://forum.netgate.com/topic/135614/failback-from-primary-wan-after-failover-to-secondary-wan

    John



  • @serbus said in Dual WAN Failover doesn't failover back to WAN 1:

    Hello!

    There is also a built-in script /etc/rc.kill_states that can be modified.

    https://forum.netgate.com/topic/135614/failback-from-primary-wan-after-failover-to-secondary-wan

    John

    @serbus thank you. I knew I remember reading about another script/method to get the job done. Now that I'm looking at these again, I might take a little time to give one a try. The built-in script seems to make more sense since it's already there.

    @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    I am just doing few things at a time at the moment.

    Understood. You mentioned this being a new setup which you were beginning to test. This would be the best time to try any of these options out, especially if you haven't put this network into production yet. That is the most ideal scenario for testing anything you're not familiar with.



  • Hello!

    There is a built-in system for packages that allows custom plugin code to be called on certain events.

    It is in /etc/inc/pfsense-utils.inc in function pkg_call_plugins

    It looks like several packages use the callback plugin (carp, certs). The gateway system also uses the callback in /etc/inc/gwlb.inc when there is a state change.

    It is calling for a package that has setup a plugin called "plugin_gateway". I dont see any official packages that have a plugin named that, but it could be the one referenced in this post:

    https://forum.netgate.com/topic/139455/list-of-hooks

    https://github.com/jazzl0ver/pfSense-pkg-gatewayhook

    I dont know why they would modify the official gwlb.inc code to call a plugin for a package that is not part of the official release...

    John



  • FYI, I ended up using the first script in the original thread I linked (take wan2 down and back up when wan1 is back up). I didn't do that because it was a better solution, but I found it easier to modify. I only had to change the defined interface for WAN2.
    https://forum.netgate.com/topic/84269/multi-wan-gateway-failover-not-switching-back-to-tier-1-gw-after-back-online/67?_=1601399952603

    The second script below it seems fundamentally better (killing states), but I have a DHCP wan2 and didn't want to use that since I wasn't sure how to modify it for my scenario.

    I haven't tested it yet since it's not big deal for me whether this works or not. I'm going to wait for a real event and see what happens.



  • @johnpoz said in Dual WAN Failover doesn't failover back to WAN 1:

    That link he provided took you right to the post with the script

    #!/bin/sh
    
    # get active gateway and current time
    CURRENT_TIME="$(date +"%c")"
    CURRENT_GW="$(netstat -rn | grep default | awk '{print $4}')"
    
    if [ $CURRENT_GW = "em2" ]; then
    	#check if WAN1 is up or not
    	WAN1_STATUS="$(pfSsh.php playback gatewaystatus brief | grep WANGW | awk '{print $2}')"
    	if [ $WAN1_STATUS = "none" ]; then
    		#WAN1 is back online, stop/start WAN2
    		echo "$CURRENT_TIME: Bringing down WAN2"
    		ifconfig em2 down
    		echo "$CURRENT_TIME: Sleeping for 30s"
    		sleep 30
    		echo "$CURRENT_TIME: Bringing up WAN2"
    		ifconfig em2 up
    	else
    		echo "$CURRENT_TIME: WAN1 is still down"
    	fi
    else
    	echo "$CURRENT_TIME: Nothing to do!"
    fi
    
    
    

    And just below post was another with the cron info and slightly modified script :) so not sure what link you followed?

    When i said I dont know about scripting I meant as I dont know anything pretty much.
    This script, do I run it in Diagnostics-Command Prompt-Execute PHP Commands or I put it somewhere in Diagnostics-Edit File?



  • @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    When i said I dont know about scripting I meant as I dont know anything pretty much.
    This script, do I run it in Diagnostics-Command Prompt-Execute PHP Commands or I put it somewhere in Diagnostics-Edit File?

    Alrighty, let me see if I can help.

    Edit, I forgot the most important step. Backup your config before doing anything you're not familiar with.

    First, take that script and copy/paste it into a text editor on your PC, e.g., Notepad ++.

    Now figure out what your WAN2 interface is. Go to Interfaces > WAN2. In my case it is em1.
    Substitute that in place of anywhere it says em2 in the script. If your WAN2 is by chance also on em2, then you're in luck and don't have to edit anything.

    Save that text file and then change the name to something like failover_script.sh

    Now to upload that file go to Diagnostic > Command prompt. Use the Upload File and chose that file you just made. It will by default go to /tmp/. For the sake of simplicity you can leave it there if you want. I think you should be able to run it from there.

    Now you have to create a cron job to run that script on a schedule. Download the cron package if you don't already have it. System > Package Manager >Available Packages

    Go to Services > Cron > Add
    Here is what mine looks like as an example.
    cdaaa5f3-e195-4eb1-b8be-a9cb82c4972b-image.png
    This is set to run every 2 minutes. You can adjust that as you want. Also note that my script is in the root folder and has a different name.
    So in your case the command should be
    /tmp/failover_script.sh >> /tmp/failover_script.log

    Again, I'm not an expert on this either so if someone can point out better ways to do this or if I'm wrong, please let me know.



  • @Raffi_ My WAN 1 - igb0, WAN2 - igb2
    Is this correct?

    #!/bin/sh
    
    # get active gateway and current time
    CURRENT_TIME="$(date +"%c")"
    CURRENT_GW="$(netstat -rn | grep default | awk '{print $4}')"
    
    if [ $CURRENT_GW = "igb2" ]; then
    	#check if WAN1 is up or not
    	igb0_STATUS="$(pfSsh.php playback gatewaystatus brief | grep WANGW | awk '{print $2}')"
    	if [ $WAN1_STATUS = "none" ]; then
    		#WAN1 is back online, stop/start WAN2
    		echo "$CURRENT_TIME: Bringing down igb2"
    		ifconfig em2 down
    		echo "$CURRENT_TIME: Sleeping for 30s"
    		sleep 30
    		echo "$CURRENT_TIME: Bringing up igb2"
    		ifconfig em2 up
    	else
    		echo "$CURRENT_TIME: igb0 is still down"
    	fi
    else
    	echo "$CURRENT_TIME: Nothing to do!"
    fi
    

    Interfaces.PNG



  • @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    @Raffi_ My WAN 1 - igb0, WAN2 - igb2
    Is this correct?

    #!/bin/sh
    
    # get active gateway and current time
    CURRENT_TIME="$(date +"%c")"
    CURRENT_GW="$(netstat -rn | grep default | awk '{print $4}')"
    
    if [ $CURRENT_GW = "igb2" ]; then
    	#check if WAN1 is up or not
    	igb0_STATUS="$(pfSsh.php playback gatewaystatus brief | grep WANGW | awk '{print $2}')"
    	if [ $WAN1_STATUS = "none" ]; then
    		#WAN1 is back online, stop/start WAN2
    		echo "$CURRENT_TIME: Bringing down igb2"
    		ifconfig em2 down
    		echo "$CURRENT_TIME: Sleeping for 30s"
    		sleep 30
    		echo "$CURRENT_TIME: Bringing up igb2"
    		ifconfig em2 up
    	else
    		echo "$CURRENT_TIME: igb0 is still down"
    	fi
    else
    	echo "$CURRENT_TIME: Nothing to do!"
    fi
    

    Close but not quite right. You missed two em2 lines. See below. I forgot to mention the WAN1 interface but it looks like you got that right.

    #!/bin/sh
    
    # get active gateway and current time
    CURRENT_TIME="$(date +"%c")"
    CURRENT_GW="$(netstat -rn | grep default | awk '{print $4}')"
    
    if [ $CURRENT_GW = "igb2" ]; then
    	#check if WAN1 is up or not
    	igb0_STATUS="$(pfSsh.php playback gatewaystatus brief | grep WANGW | awk '{print $2}')"
    	if [ $WAN1_STATUS = "none" ]; then
    		#WAN1 is back online, stop/start WAN2
    		echo "$CURRENT_TIME: Bringing down igb2"
    		ifconfig igb2 down
    		echo "$CURRENT_TIME: Sleeping for 30s"
    		sleep 30
    		echo "$CURRENT_TIME: Bringing up igb2"
    		ifconfig igb2 up
    	else
    		echo "$CURRENT_TIME: igb0 is still down"
    	fi
    else
    	echo "$CURRENT_TIME: Nothing to do!"
    fi
    


  • Never mind, you don't have to do anything with WAN1. That was right. Put that back to the way it was as shown below.

    #!/bin/sh
    
    # get active gateway and current time
    CURRENT_TIME="$(date +"%c")"
    CURRENT_GW="$(netstat -rn | grep default | awk '{print $4}')"
    
    if [ $CURRENT_GW = "igb2" ]; then
    	#check if WAN1 is up or not
    	WAN1_STATUS="$(pfSsh.php playback gatewaystatus brief | grep WANGW | awk '{print $2}')"
    	if [ $WAN1_STATUS = "none" ]; then
    		#WAN1 is back online, stop/start WAN2
    		echo "$CURRENT_TIME: Bringing down WAN2"
    		ifconfig igb2 down
    		echo "$CURRENT_TIME: Sleeping for 30s"
    		sleep 30
    		echo "$CURRENT_TIME: Bringing up WAN2"
    		ifconfig igb2 up
    	else
    		echo "$CURRENT_TIME: WAN1 is still down"
    	fi
    else
    	echo "$CURRENT_TIME: Nothing to do!"
    fi
    
    


  • @Raffi_ Are you sure? I thought any line without # I should modify WAN into my firewall interface name?



  • @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    @Raffi_ Are you sure? I thought any line without # I should modify WAN into my firewall interface name?

    I edited my script above. Only where it specified em2 is what had to be changed to igb2. references to WAN1 or even WAN2 is not hard coded to an interface so you should be able to leave that.



  • @Raffi_ said in Dual WAN Failover doesn't failover back to WAN 1:

    @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    @Raffi_ Are you sure? I thought any line without # I should modify WAN into my firewall interface name?

    I edited my script above. Only where it specified em2 is what had to be changed to igb2. references to WAN1 or even WAN2 is not hard coded to an interface so you should be able to leave that.

    It didn't switch after 10 mins

    states.PNG

    So, i after pfsesne reboot I checked in Diagnostics-Edit File and my uploaded script there is gone and failover_script.log is empty



  • @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    It didn't switch after 10 mins

    I guess I was wrong on the script. Sorry, looks like mine won't work either :/
    You should see WAN2 taken down and then brought back up after 30 seconds if WAN1 is running again.
    Maybe you will have to adjust those variables in that case. Let me know if you get it to work. I will have to adjust mine. At least you know how to work with scripts now. I'm sure you'll get it working.

    @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    So, i after pfsesne reboot I checked in Diagnostics-Edit File and my uploaded script there is gone and failover_script.log is empty

    I was afraid that leaving the script in /tmp/ might lose it on reboot, but I wasn't sure.
    What you can do to solve that is upload it again, and then after uploading go to Diagnostic > command prompt execute the command mv /tmp/failover_script.sh /root/
    That will move the file from /tmp/ to /root/. Then you will have to modify your command in the cron job for that new location, /root/failover_script.sh.



  • @Raffi_ In cron under command i have "/root/failover_script.sh. >> /tmp/failover_script.log"
    But in under root its "failover_script.sh.txt " Should i change cron job command to "/root/failover_script.sh.txt >> /tmp/failover_script.log"
    pfsense edit.PNG



  • @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    @Raffi_ In cron under command i have "/root/failover_script.sh. >> /tmp/failover_script.log"
    But in under root its "failover_script.sh.txt " Should i change cron job command to "/root/failover_script.sh.txt >> /tmp/failover_script.log"
    pfsense edit.PNG

    No, the file must be a .sh file in order to run. A .txt file will not run. That might be why it didn't work the first time. You probably want to move the log file to /root/ as well.



  • @Raffi_ I fixed everything but it doesn't work :(



  • @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    @Raffi_ I fixed everything but it doesn't work :(

    I can't really test it on my end so I can't really help much beyond that.



  • @Raffi_ Oh well, i will keep digging. I got 36 pfsense boxes. I don't have time manually rebooting/killing states when stuff like this happens tp be honest. My Sonicwalls handle this pretty easily. I am not network vet, so I honestly can't grasp the concept fully why pfsense is like that.



  • @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    @Raffi_ Oh well, i will keep digging. I got 36 pfsense boxes. I don't have time manually rebooting/killing states when stuff like this happens tp be honest. My Sonicwalls handle this pretty easily. I am not network vet, so I honestly can't grasp the concept fully why pfsense is like that.

    I'm sure you'll get it working. I would also suggest taking a look at the other script that was linked on a different thread mentioned above. That one was defined very well with instructions. Maybe you'll find it easier to follow/modify that one. Now that you have some understanding of how to go about it you might find that a better solution.



  • Thanks for your time and chewing everything out for me, I used cron for other things but didn't really pay attention to command option there. Now I do pretty well.



  • Hello!

    You could try the gateway_plugin interface if you dont mind being a guinea pig...:)

    Download https://github.com/jazzl0ver/pfSense-pkg-gatewayhook/releases/download/v0.1/pfSense-pkg-gatewayhook-0_1.txz

    Use Diagnostics -> Command Prompt -> Upload File to save the pkg file to the /tmp folder on your device, then

    pkg install /tmp/pfSense-pkg-gatewayhook-0_1.txz
    

    The package code is close, but not quite.

    Edit /usr/local/pkg/gatewayhook.inc

    The main function is missing an assignment statement and is not calling the gateway script with any parameters. The fixed function should look like :

    function gatewayhook_plugin_gateway($pluginparams) {
        $type = $pluginparams['type'];
        $name = $pluginparams['name'];
        $event = $pluginparams['event'];
        $interface = $pluginparams['interface'];
        $gatewayhooklock = lock("gatewayhook", LOCK_EX);
       syslog(LOG_NOTICE, "gatewayhook: " . GATEWAY_ALARM_CUSTOM_SCRIPT . " script started - $name $event $interface");
        mwexec(GATEWAY_ALARM_CUSTOM_SCRIPT . " $name $event $interface");
        unlock($gatewayhooklock);
        return 0;
    }
    

    Edit the gateway plugin script the package created - /usr/local/etc/rc.d/rc.gateway_alarm_custom

    The plugin script could look something like this :

    #!/bin/sh
    
    # put what needs to be done before exit line
    
    # arg 1 should be the gateaway name
    
    gwname=${1:-gwname}
    
    # arg 2 should be gateway.up or gateway.down
    
    event=${2:-gateway.unknown}
    
    # arg 3 should be the interface ... may not be present
    
    interface=${3:-interface}
    
    if [ $gwname == "WAN0" ] && [ $event == "gateway.up" ]
    then
       # clear the states on this interface
    
       /sbin/pfctl -i igb0 -Fs
    fi
    
    exit 0
    
    

    Basically, this is saying that when the plugin script is notified that WAN0 is UP, IGB0 should get all of its states cleared.

    John



  • @serbus said in Dual WAN Failover doesn't failover back to WAN 1:

    pkg install /tmp/pfSense-pkg-gatewayhook-0_1.txz

    Sure, few questions when I

    pkg install /tmp/pfSense-pkg-gatewayhook-0_1.txz
    

    Shell output

    Updating pfSense-core repository catalogue...
    pfSense-core repository is up to date.
    Updating pfSense repository catalogue...
    pfSense repository is up to date.
    All repositories are up to date.
    Checking integrity... done (0 conflicting)
    The following 1 package(s) will be affected (of 0 checked):
    
    New packages to be INSTALLED:
    	pfSense-pkg-gatewayhook: 0_1 [unknown-repository]
    
    Number of packages to be installed: 1
    
    Proceed with this action? [y/N]:
    

    How do i activate "yes"?

    Another question in script below, do i change any values to reflect my interface? For example WAN0?

    #!/bin/sh
    
    # put what needs to be done before exit line
    
    # arg 1 should be the gateaway name
    
    gwname=${1:-gwname}
    
    # arg 2 should be gateway.up or gateway.down
    
    event=${2:-gateway.unknown}
    
    # arg 3 should be the interface ... may not be present
    
    interface=${3:-interface}
    
    if [ $gwname == "WAN0" ] && [ $event == "gateway.up" ]
    then
       # clear the states on this interface
    
       /sbin/pfctl -i igb0 -Fs
    fi
    
    exit 0
    


  • Hello!

    You should just be able to hit "y" when it asks you to proceed.

    If your failover gateway group looks like:

    WAN_DHCP -> tier1 -> igb0
    OPT1_DHCP -> tier2 -> igb2

    and WAN_DHCP is coming back online after being down...
    and you want any states on OPT1_DHCP to be cleared...
    the script would look like...

    if [ $gwname == "WAN_DHCP" ] && [ $event == "gateway.up" ]
    then
       # clear the states on this interface
    
       /sbin/pfctl -i igb2 -Fs
    fi
    

    John



  • @pfrickroll said in Dual WAN Failover doesn't failover back to WAN 1:

    How do i activate "yes"?

    To make your life easier with more complex tasks like this, I would suggest enabling SSH under System > Advanced
    432ed971-2cb9-4594-bef8-a5a2596c5262-image.png

    Then use an SSH client to connect to pfSense such as Putty. When you login use the same admin credentials as you would when logging into the GUI. From the SSH terminal, use option 8 to get a shell prompt, then it's easier to follow instructions like the one above and providing inputs to prompts like the one you got.


Log in to reply