Multi-WAN Setup with 4G CradlePoint Not Working



  • Hi there,

    We've been trying to setup Multi-WAN with our pfsense box that we got from the official store (red little box), but we just can't get it to work.
    The following is the rough description of our setup:

    pfSense Box
    Main WAN Port- Ethernet Cable - Comcast Modem
    Failover Port - Ethernet Cable - CradlePoint 4G Modem

    pfSense Settings:
    System-Miscellaneous-Checked Allow Gateway Switching

    DNS:
    8.8.8.8 (None for Monitor IP)
    8.8.4.4 (None for Monitor IP)

    Interface:
    WAN - 123.0.0.123
    FAILOVERGW - 234.0.0.234

    Gateway:
    WANGW (Default) - 123.0.0.1
    FAILOVERGW - 234.0.0.1

    Gateway Group:
    Group1
      WANGW - Tier1
      FAILOVERGW - Tier2
    Criteria - Member Down

    Rules:
    LAN - Allow IPv4 All Protocol All Ports All Source All Destination with Advanced Settings Gateway set to Group1
    Moved above the default LAN Rule

    I first off checked the Status>Gateway and saw that the gateways were green for both individual interfaces and the group.
    I also tested via the Diagnostics and did the ping test for both WAN and FAILOVER interfaces and got 100% received. We also checked that the modem are both OK by connecting directly to our laptops so the connection is alive.
    If we change around Tier1 and Tier2 around, we confirmed that the IP Address changed by going to whatismyip. But still unplugging the Tier1 will not switch over to Tier2.
    We tried adding another Group (Group2) with the reversed Tier levels, and added both LAN Rules above the default rule but no luck.
    After we got stuck, we tried rebooting the pfSense box but it's still not working.

    When unplugging the Tier1 cable, it does show that the Tier1 Interface is down.
    Looking at the system logs, we can also see that the Tier1 interface went down, but nothing in the log comes up regarding the Tier2 interface (FAILOVER interface in this case).

    There are no packages installed in the box, it's just a simple setup. We do have IPSec set up and are planning to setup failover with DDNS, but we'll first have to get this working.
    If someone could shed some light on this that will be great.
    Thank you.



  • Your description of the setup sounds good, and specially since you can see your public IP change when you swap Tier 1<->2 in the Gateway Group.
    Maybe try selecting Trigger Level "Packet Loss or High Latency". (But I do expect that "Member Down" should work when you physically unplug the cable.)



  • Thank you very much for your help; glad to know that the setup appears to be OK.
    We'll try changing the trigger level and see if it will work. If not, I think we'll try factory resetting the router once to see if that will help. I'll post back the results after this is done.
    Thanks again!



  • OK, we've tried the following two, but it's still not working:

    • Changed Criteria to "Packet Loss or High Latency"
    • Did a Factory Reset and did everything over from scratch

    Just to test it out, we've set up the DDNS service on pfSense, and it was able to update to the new IP address in a matter of 10 - 20 seconds, but we were still not able to ping 8.8.8.8.
    We also again confirmed that switching the Tier levels around got us a new IP in a matter of seconds.

    Any help will be greatly appreciated.
    Thank you.



  • I am struggling to think of what would be causing it not to process the failover to FAILOVERGW, from your description.
    Post some screenshots of the gateways, gateway group and rule settings. Maybe that will jog a thought in my mind.
    Anyone else with an idea - feel free to comment…



  • Thank you again for your kind support; I've attached the screenshots of our setup. (Sorry for the messy editing)
    Please let me know if you need any other screenshots or info.

    P.S. We've changed the default LAN rule rather than adding a new one the second time around just to make sure that the default rule wasn't in effect.
























  • "Allow default gateway switching" should make the DNS servers specified in the General Setup page switch over to the other gateway automagically - when there are just 2 gateways there should be no ambiguity about that.
    Maybe for testing try specifying a gateway for each DNS server, that is what I normally do. I realise that if you want this to really be a failover link only, then you may not want one of the DNS server paths to be going out that all the time, but it would be good to test if that helps.
    So, is this only a DNS issue?
    When you pull the cable on WAN, can you ping to things using the IP address (note down the actual IP address of some sites other than 8.8.8.8 and 8.8.4.4… that resond to ping - I think pfsense.org even responds to ping.)?
    What comes in the system log when you pull the cable?
    There should be a bunch of messages about the WAN going down, and then about FAILOVER group membership...



  • I think it's not just a DNS issue, since we can't ping 8.8.8.8 from the computers behind the firewall, but we'll try also with 208.67.222.222 next time.
    I'll check the logs and will post a screenshot as well at that time. Last time I checked, I did see WAN going down but not about the FAILOVER group.
    Thank you!


  • Netgate

    Did you create outbound NAT rules for WAN2?  It's been a while but I'm pretty sure Multi-WAN isn't handled by Automatic Outbound NAT.

    (ETA: Automatic makes rules for multi-WAN.  I was all wet.  But if you have configured manual outbound you need to duplicate the WAN rules for WAN2).



  • OK, we did another test and got the logs, but I didn't change the DNS settings and wasn't able to get all the logs due to time constraint.
    We've also added IPSec rules, hope this doesn't matter. (At least for WAN/FAILOVER connectivity, and DNS for local PC is configured at 8.8.8.8)
    I've attached the logs, but please let me know if I should get another one with the logs right from when we unplugged the port.

    And yes, we have the outbound rule set at Automatic Outbound NAT, I think I'll go ahead and test out manual outbound NAT next time and make sure to copy the rules to the FAILOVER port.
    Thank you very much for the info!






  • I can't spot what is wrong there. I do something similar at home with "FAILOVER" being a device off OPT1 that has a 3G dongle in it for mobile data.

    A traceroute from a LAN client to some internet location when working normally and when it has supposedly failed over would be interesting - see what path the packet is trying to route. (Use the actual IP address if you have difficulty getting DNS when it is failed over)

    /tmp/rules.debug contains the pf rule set - take a copy of that before failover, and then during failover, and compare. pfSense should change things in that to push the traffic out a different interface.

    There has to be some simple setting somewhere that needs changing - this sort of thing works for me in a few locations.

    Anyone else got a good idea?



  • Leave it to automatic outbound NAT, if there is any problem there it's in the interface's config. You'll need at least one DNS server going out the second WAN. Though if you can't ping out by IP, that wouldn't fix the issue.

    How is your "Failover" interface configured under Interfaces>Failover? If it's static IP, you must select the gateway on that page. If it's not, you're telling the system that's not an Internet connection, which will break the automatic outbound NAT config as well as a variety of other things.



  • First of all, thank you everyone for all your help.
    And sorry, I haven't had a chance to test out the DNS side just yet, I'll post the results along with the traceroute and the rule set before/after the failover at that time.

    As far as the interface setup, yes it is set with Static IP with a separate gateway for the failover interface (FAILOVERGW).
    Thanks again.



  • OK, I finally got the chance to test this out again; it's still not working.
    The steps I did are as follows, along with the system logs.

    1. Pluged in both Cables to the pfSense Box
    2. Started pinging to 8.8.8.8
    3. Unpluged the cable from the Comcast Modem
    4. Checked to see if the dynamic dns was updated, got a ping back of the new IP in less than a minute
    5. Waited 20 minutes for the 8.8.8.8 pings to come back, but no response
    6. Pluged back the Comcast Modem's cable, ping to 8.8.8.8 returned instantaneously
    7. Checked to see if the dynamic dns was updated, got a ping back of the new IP in less than a minute
    8. IPSec VPN did not come back until I restarted the racoon service on the other side (pfSense also)

    Jan 23 16:07:54 check_reload_status: Linkup starting vr1
    Jan 23 16:07:54 kernel: vr1: link state changed to DOWN
    Jan 23 16:07:59 php: rc.linkup: Hotplug event detected for WAN(wan) but ignoring since interface is configured with static IP (... )
    Jan 23 16:08:12 check_reload_status: updating dyndns WANGW
    Jan 23 16:08:12 check_reload_status: Restarting ipsec tunnels
    Jan 23 16:08:12 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Jan 23 16:08:12 check_reload_status: Reloading filter
    Jan 23 16:08:21 php: rc.dyndns.update: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:22 php: rc.dyndns.update: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:22 php: rc.filter_configure_sync: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:22 php: rc.dyndns.update: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:22 php: rc.filter_configure_sync: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:22 php: rc.dyndns.update: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:22 php: rc.dyndns.update: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:22 php: rc.dyndns.update: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:22 php: rc.dyndns.update: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:23 php: rc.dyndns.update: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:24 php: rc.dyndns.update: phpDynDNS: updating cache file /conf/dyndns_FAILGROUP***'..'0.cache: ...***
    Jan 23 16:08:24 php: rc.dyndns.update: phpDynDNS (...): (Success) DNS hostname update successful.
    Jan 23 16:08:36 php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
    Jan 23 16:08:37 php: rc.newipsecdns: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:37 php: rc.newipsecdns: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:37 php: rc.newipsecdns: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:37 php: rc.newipsecdns: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:37 php: rc.newipsecdns: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:37 php: rc.newipsecdns: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:37 php: rc.newipsecdns: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:37 php: rc.newipsecdns: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:37 php: rc.newipsecdns: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:37 php: rc.newipsecdns: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:37 php: rc.newipsecdns: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:37 php: rc.newipsecdns: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:37 php: rc.newipsecdns: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:37 php: rc.newipsecdns: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:37 php: rc.newipsecdns: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:37 php: rc.newipsecdns: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:08:37 php: rc.newipsecdns: Default gateway down setting FAILOVERGW as default!
    Jan 23 16:08:37 php: rc.newipsecdns: MONITOR: WANGW is down, removing from routing group FAILGROUP
    Jan 23 16:24:38 check_reload_status: Linkup starting vr1
    Jan 23 16:24:38 kernel: vr1: link state changed to UP
    Jan 23 16:24:42 php: rc.linkup: Hotplug event detected for WAN(wan) but ignoring since interface is configured with static IP (... )
    Jan 23 16:24:42 check_reload_status: rc.newwanip starting vr1
    Jan 23 16:24:47 php: rc.newwanip: rc.newwanip: Informational is starting vr1.
    Jan 23 16:24:47 php: rc.newwanip: rc.newwanip: on (IP address: ...) (interface: WAN[wan]) (real interface: vr1).
    Jan 23 16:24:47 check_reload_status: Reloading filter
    Jan 23 16:24:49 check_reload_status: updating dyndns WANGW
    Jan 23 16:24:49 check_reload_status: Restarting ipsec tunnels
    Jan 23 16:24:49 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Jan 23 16:24:49 check_reload_status: Reloading filter
    Jan 23 16:25:00 php: rc.dyndns.update: phpDynDNS: updating cache file /conf/dyndns_FAILGROUP***'..'0.cache: ...***
    Jan 23 16:25:00 php: rc.dyndns.update: phpDynDNS (..***): (Success) DNS hostname update successful.
    Jan 23 16:25:12 php: rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.

    And, sorry for not getting this info out initially, but we did try to install squid and run it first, although we didn't succeed so we stopped it and uninstalled this.
    After that we even did a factory reset, but could it be that this is still running in the background?

    Any help is greatly appreciated, thank you.


  • Netgate

    Did you leave the same ping running and failover the ports?  I don't think failover moves states from one interface to the other does it?  That would be a trick, considering the inside global address changes.

    Did you try stopping the ping and starting a new one while it was failed over?

    It minimizes downtime but it's not magic.



  • Thank you for your support!
    We've tried stopping and starting a new command prompt to ping, but it was still not working.
    Then we've noticed that the new version 2.2 came out, so we tested it out and we are now to browse the web just fine. Thank you again for all your support!

    But now when setting up the VPN with the multi-wan setup, it's not working.
    If someone can shed some light on this problem, that would be great.
    Here's our setup:

    1. Setup Dynamic DNS for the Failover Group
    2. Setup VPN with Interface as the Failover Group
    3. Setup VPN on the other side with Remote Gateway as the Dynamic DNS hostname

    The VPN is working fine if everything is connected.
    When uplugging the main WAN line, Dynamic DNS does get updated; pings start going to the alternate IP, and the pfSense on the other side does try to connect to the alternate IP.
    However, although the IPSec Status does show the Local IP as the alternate IP address, the VPN connection status gets stuck at "Connecting…".
    Looking at the logs, it does show that it is attempting to still connect with the main IP. Is there a setting we missed somewhere?



  • After writing the above, it seems that the VPN connection has also become unstable… it will lose connection about 3-4 times a day.
    When the connection gets lost, I was still able to access the local firewall from the other side of the VPN Connection, but the local users weren't able to ping the other side. Restarting the racoon service on the local firewall seems to fix the issue temporarily.
    I'll update this post if I find anything new; any advice is welcome.
    Thank you!



  • Now it seems that I can't even connect from the other side as well…
    As I look at my VPN status it says connected, but there was 4 child SA entries with 0 Bytes-in. Now when I look at my Key Exchange version it says V1 so I'll try to switch to V2 and see how it goes.
    I'll update the results here, thank you,



  • OK, here's the final update!
    After going through the VPN forum, the general consensus was the IPSec in version 2.2 is somewhat broken, so we changed the VPN settings to OpenVPN.
    We tested the failover, and it switched over to the failover port and re-established the VPN connection in about 15 seconds. All is good! (Now I'm regretting getting the dynamic dns service set up, but anyway)
    So I'll mark this as officially solved, thank you very much for all your support.
    Best regards.