OpenVPN Client Multi-WAN failover

sseidel

Hi,

this is all highly experimental and buggy, but since it is still very useful and I saw so encouraging messages from the admins that they're working on improving this [1][2], I thought I'd share this.

This has been asked numerous times without sufficient answers, simple because it wasn't possible in pfSense 1.2.x [3] and advanced hackery makes it somewhat possible in 2.0.

Situation:

pfSense is used as OpenVPN Client (Server side is not relevant)
you have more than one WAN (e.g. DSL and 3G)
you need high availability of the OpenVPN connection, i.e. if DSL fails, it should still be able to access the VPN

Problem:

even though Multi-WAN can be set up easily in pfSense, any policy-based routing (needed for failover and load-balancing) will not apply to traffic generated by pfSense itself
so: the internal OpenVPN client is not able to use failover, even if a client in the LAN making the OpenVPN connection will do it just fine

Solution:

set up your two WAN interfaces
make sure that the WAN gateway is pingable, otherwise choose a pingable target and create a GW_PINGABLE gateway under System->Routing
set up failover using a gateway group (System->Routing->Groups), select the primary WAN as Tier 1 and the secondary WAN (or the GW_PINGABLE) as Tier 2. Let's call the group MUWAN_GW.
check that both interfaces are up and the gateways pingable under Status->Gateways->Gateway Groups
set up your OpenVPN client and select LAN as interface and 1194 as local port (important) and let it use UDP (have never tried with TCP actually)

Now head on to the firewall. I will describe a very basic and rough setup which works but needs to be refined for production use.

Firewall->NAT->Outbound: Select AON and click save. I prefer to do the rules from scratch and delete the pre-created ones, but if you do not, make sure that you soon-to-be-created rules are on top of all the others.
create two rules, one for your secondary WAN, one for WAN. Select "Static Port". Enter at least protocol UDP and maybe ports 1194 as source and destination port (we route all traffic through the VPN, so I usually don't care about any NAT rules). I also set source and destination to any.
again, either make sure that these are the only two rules or that they're on top
Firewall->Rules->Floating: create one rule, select interfaces WAN, LAN and your secondary WAN (don't ask), Direction out, Protocol UDP, and as Gateway select MUWAN_GW (the group you had created before). If you want, adjust source and destination port range, or destination IP respectively.

In action:
It's not ideal, but it works. Your OpenVPN connection should start up using WAN. Now make it "disconnected", but do not pull the ethernet cable (more under Caveats). After the ping timeout, your OpenVPN connection will restart with the secondary WAN. However, it will not switch back when you reconnect your WAN. Only when the secondary WAN goes down (if it is 3G/UMTS/other PPP, you can disconnect the connection in Status->Interfaces) it will re-negotiate after the ping timeout on the WAN.

(Hint: if you enable the "Dynamic IP" option on the OpenVPN server, all TCP connections will be kept open, so except for a short "hang", users/clients will not notice the change)

Caveats:

Do not pull Ethernet cables: if an Ethernet link is down (esp. on WAN), you will get the dreaded "no buffer space available" errors on your OpenVPN log and it will not reconnect. I have no idea how to solve this, except that you put an ethernet switch very close to your pfSense router. That way, the modem/router on WAN can fail even completely (power down etc.) and the Ethernet connection on the pfSense will still be "up", but the Gateway Ping will detect that it is not usable.
There's no way back: Once the WAN link has failed and OpenVPN uses the secondary, it will never switch back unless the secondary WAN will also go down. Of course, you will want to make sure that this only happens once the primary WAN is back up.
Side effects: I haven't tested it all, there could be some side effects, especially if you rely on other services on the same host, or use NAT or try to be creative with your firewall rules.

Looking forward to feedback :)

Stefan

References:
[1] http://redmine.pfsense.org/issues/1206
[2] http://forum.pfsense.org/index.php/topic,31125.msg167503.html (for DNS queries, but basically the same)
[3] pfSense book, ch. 11.3.3

jimp

I was trying to do that last week and couldn't make it work, though my floating rules were a bit different and I didn't have it set to listen on LAN.

What you have is probably the best way to make it happen, though you are right, it won't automatically fail back. There is no good way to ensure it will fail back cleanly. There is an open ticket to allow a user to choose to kill all states on the secondary link when the primary comes back up. That's not ideal for most people, but for people with an expensive secondary (like 3G) it may be a good choice.

sseidel

Thanks for the feedback Jim. I also had a config working once where at least the fallback was immediate, but then it would only fall back once.

Some further things I found out:

This config works even with more than 2 WAN, just add another interface, include gateway in group, add NAT rule and include the additional WAN(s) in the floating rule.
killing the OpenVPN state and then sending SIGUSR1 makes the reconnect faster, and also makes a (somewhat) graceful fallback to the primary WAN (when it is up) without forcing the secondary WAN to go down, maybe one could add this as an afterfilterchange command (I think it is called like that):

pfctl -k <lan ip="">-k <openvpn server="" ip="">killall -USR1 openvpn</openvpn></lan>

Have only tried this on the command line, though.

Stefan