Load Balancing and Failover with 2 pfSense and 2 OpenVPN servers

Skytaxi

Dear all,

First of all, if my post is duplicated, please point me to the solved topic. I don't find out the solved topic yet, so many many much :(

My topology:

2 pfSense server with the CARP fail over. Using 1 VIP to represent 2 pfSense servers.
2 external OpenVPN servers. 1 pfSense connect to 1 OpenVPN server succeed.
Internal clients use the VIP for default routing. Clients –> pfSense (VIP) --> External OpenVPN --> Internet

Is it possible to do the load balancing and fail over between 2 pfSense server and 2 OpenVPN server as follow detail:

Client A, for example, connect to pfSense 1 (use VIP) and go to the Internet by OpenVPN server 1. If OpenVPN server 1 goes down, then A connect to OpenVPN server 2.
In the case both of OpenVPN servers alive, connection will be loaded between 2 servers by Client Source IP address.
In any case, if 1 pfSense and 1 OpenVPN still alive, it's still fine!

Derelict

@Skytaxi:

Dear all,

First of all, if my post is duplicated, please point me to the solved topic. I don't find out the solved topic yet, so many many much :(

My topology:

2 pfSense server with the CARP fail over. Using 1 VIP to represent 2 pfSense servers.

2 external OpenVPN servers. 1 pfSense connect to 1 OpenVPN server succeed.

Internal clients use the VIP for default routing. Clients –> pfSense (VIP) --> External OpenVPN --> Internet

Is it possible to do the load balancing and fail over between 2 pfSense server and 2 OpenVPN server as follow detail:

Client A, for example, connect to pfSense 1 (use VIP) and go to the Internet by OpenVPN server 1. If OpenVPN server 1 goes down, then A connect to OpenVPN server 2.

In the case both of OpenVPN servers alive, connection will be loaded between 2 servers by Client Source IP address.

In any case, if 1 pfSense and 1 OpenVPN still alive, it's still fine!

Establish two OpenVPN instances with tunnel networks only. No local or remote networks.
Create OpenVPN assigned interfaces for each tunnel
Create a gateway group for both OpenVPN gateways (Load balance, tiered failover, whatever)
NAT the outbound traffic on the tunnels
Policy-route traffic to the gateway group.

This is really no different than traditional Multi-WAN.

If an HA failover event occurs, the primary will drop the OpenVPN connections and the secondary will assume the CARP VIPs and bring up both OpenVPN connections.

It will not load balance based on source address but by the gateway weights, sticky connections, etc.

Skytaxi

Thank you for your reply. I make things much complex myself. I will try this way and show my result after done.

Skytaxi

Here my settings:

2 WAN interface, 1 LAN interface.
Each VPN tunnel per WAN.
After succeed peering 2 tunnels, I create 1 Gateway Group on this 2 tunnel interface.
Add a firewall rule (ex: 8.8.8.8 dst ip), use this Gateway Group as Gateway.

I ping 8.8.8.8, it's ok, then tcpdump on both external OpenVPN server, I see traffic in the one.
Status -> OpenVPN, then stop this tunnels ==> ping lost (the remain tunnel is ok).

Derelict

Your pings will not continue if you shut down one of the VPNs. You have to create a new firewall state. Stop the ping and restart it.

There is no way to move a state from one "WAN" to another. Transition for your users will be far from hitless.

A quick test is:

Change the rule from the gateway group to VPN1_GW. Can you ping 8.8.8.8?

Change the rule from the gateway group to VPN2_GW. Can you ping 8.8.8.8?

If that doesn't work, the GW group won't work either.

Skytaxi

My apologize while using ping to making test result. I prefer to use ifconfig.io to see which is your public IP address. It's ok now.

tunnel_1 down –--> tunnel_2 will be chosen automatically. I notice that client got a "short time out" before tunnel_2 connection succeed. I use Packet loss trigger level (VPN usually high latency), maybe this time out came from interval Packet loss check???
tunnel_2 alive –--> tunnel_1 still be using ----> make tunnel_1 down -----> use tunnel_2

Panerist

I am trying to configure a similar setup…

@Derelict:

Establish two OpenVPN instances with tunnel networks only. No local or remote networks.
Create OpenVPN assigned interfaces for each tunnel
Create a gateway group for both OpenVPN gateways (Load balance, tiered failover, whatever)
NAT the outbound traffic on the tunnels
Policy-route traffic to the gateway group.

…but only managed to do the first 3 steps so far. Could you please elaborate a bit on the proper NAT and policy-routing setting in this case? Please see the scheme attached.

Thank you!

netw_scheme.gif_thumb

Derelict

You probably do not neet any outbound NAT in your case. That is more for connecting to OpenVPN providers such as PIA as multiple WANS.

pfSense A:
Firewall > Rules, LAN (10.88.88.0 interface)

pass rule source LAN net dest 10.55.55.0/24 Advanced set the gateway to GG_B
pass rule source LAN net dest 10.77.77.0/24 Advanced set the gateway to GG_C

If you are only initiating connections from A to B/C you are done. reply-to will handle the reply traffic at the remote ends. Else you have to do the same on both B and C.

pfSense B:
Firewall > Rules, LAN (10.55.55.0 interface)

pass rule source LAN net dest 10.88.88.0/24 Advanced set the gateway to GG_A

pfSense C:
Firewall > Rules, LAN (10.77.77.0 interface)

pass rule source LAN net dest 10.88.88.0/24 Advanced set the gateway to GG_A

If you want pfSense B LAN to be able to speak with pfSense C LAN, you need to also add these policy routes to the LAN interface:

pfSense B:

pass rule source LAN net dest 10.77.77.0/24 Advanced set the gateway to GG_A

pfSense C:

pass rule source LAN net dest 10.55.55.0/24 Advanced set the gateway to GG_A

Or use aliases for the destination networks, etc.

Panerist

Thanks for the quick reply.

Originally I tried to configure policy-based routing (PBR) exactly as you suggested, but with no luck. That's why I thought I missed something important.

I simplified the setup a bit and tried to configure PBR just for one OpenVPN tunnel first, not for a gateway group. Again, it is not working. At the same time, if I add the static routes on both ends, everything works just fine.

It seems there is something wrong either with some other settings or PBR is not working in this particular case.

I'm pretty sure that the rest of my settings for all three sites are default. Outbound NAT is in automatic rule generation mode. No Snort, no Suricata, no Traffic Shapers, no pfBlockerBG. The firewall rules for PBR are in the top of the list on the LAN tab (IPv4 any protocol from LAN net to 10.55.55.0/24 gateway WAN_A1-to-WAN_B and the like).

Does it make sense to which interface OpenVPN server listens to? In my case, it sits on the localhost (pfSense A on the scheme) and there is a port forwarding enabled from each of the two WANs to localhost.

What direction would you suggest to dig deeper in for troubleshooting?

Derelict

Yes, the port forward on the Multi-WAN side is how I would do it too.

Probably packet captures on the OpenVPN interfaces. See what's happening there.

"Doesn't work" isn't a lot of information to go on.

Panerist

Somehow it worked, but only partially. Last time I tried to ping by FQDN instead of IP addresses. As I understand, there will be no connectivity between pfSense boxes in such setup, because policy-based routing do not apply to traffic originated on the firewall itself. So, DNS Resolver on pfSense Site A could not query DNS Resolvers on Sites B and C.

Now there is ping between server and client networks in both directions, but no ping between the client networks.

ping results:
A to B: from 10.88.88.5 to 10.55.55.5 = YES
A to C: from 10.88.88.5 to 10.77.77.7 = YES
B to A: from 10.55.55.5 to 10.88.88.5 = YES
C to A: from 10.77.77.7 to 10.88.88.5 = YES
B to C: from 10.55.55.5 to 10.77.77.7 = NO
C to B: from 10.77.77.7 to 10.55.55.5 = NO

Results of “tcpdump -ni [interface] icmp” while pinging from 10.77.77.7 to 10.55.5.5:

pfSense C, LAN interface: ICMP echo request only
pfSense C, OpenVPN assigned interface for WAN_A1-to-WAN_C tunnel: ICMP echo request only
pfSense C, OpenVPN assigned interface for WAN_A2-to-WAN_C tunnel: ICMP echo request only
pfSense A, OpenVPN assigned interface for WAN_A1-to-WAN_C tunnel: ICMP echo request only
pfSense A, OpenVPN assigned interface for WAN_A2-to-WAN_C tunnel: ICMP echo request only
pfSense A, OpenVPN assigned interface for WAN_A1-to-WAN_B tunnel: ICMP echo request and ICMP echo reply
pfSense A, OpenVPN assigned interface for WAN_A2-to-WAN_B tunnel: ICMP echo request and ICMP echo reply
pfSense B, OpenVPN assigned interface for WAN_A1-to-WAN_B tunnel: ICMP echo request and ICMP echo reply
pfSense B, OpenVPN assigned interface for WAN_A2-to-WAN_B tunnel: ICMP echo request and ICMP echo reply
pfSense B, LAN interface: ICMP echo request and ICMP echo reply

So, it seems that the ICMP request is properly routed in one direction…
10.3.8.2 -> 10.3.8.1 10.2.8.1 -> 10.2.8.2
10.77.77.7 -> 10.77.77.1 -> or -> or -> 10.55.55.1 -> 10.55.55.5
10.5.8.2 -> 10.5.8.1 10.4.8.1 -> 10.4.8.2
…but the reply fails to return back:
10.3.8.2 10.3.8.1 10.2.8.1 <- 10.2.8.2
10.77.77.7 10.77.77.1 X or <- 10.55.55.1 <- 10.55.55.5
10.5.8.2 10.5.8.1 10.4.8.1 <- 10.4.8.2

The following settings are differ (or not mentioned) from those in your previous posts:

Site A – VPN / OpenVPN / Client Specific Overrides (otherwise no connectivity at all):
Client_B (apply to ovpns1 and ovpns5):
iroute 10.55.55.0 255.255.255.0
Client_C (apply to ovpns2 and ovpns6):
iroute 10.77.77.0 255.255.255.0
Site A – Firewall / Rules / OpenVPN:
Empty
Site A – Firewall / Rules / OpenVPN WAN_A1-to-WAN_B:
Protocol Source Port Destination Port Gateway Queue
IPv4 * * * 10.77.77.0/24 * GG_C none
IPv4 * * * * * * none
Site A – Firewall / Rules / OpenVPN WAN_A1-to-WAN_C:
Protocol Source Port Destination Port Gateway Queue
IPv4 * * * 10.55.55.0/24 * GG_B none
IPv4 * * * * * * none
Site A – Firewall / Rules / OpenVPN WAN_A2-to-WAN_B:
Protocol Source Port Destination Port Gateway Queue
IPv4 * * * 10.77.77.0/24 * GG_C none
IPv4 * * * * * * none
Site A – Firewall / Rules / OpenVPN WAN_A2-to-WAN_C:
Protocol Source Port Destination Port Gateway Queue
IPv4 * * * 10.55.55.0/24 * GG_B none
IPv4 * * * * * * none
Site B – Firewall / Rules / OpenVPN:
Empty
Site B – Firewall / Rules / OpenVPN WAN_A1-to-WAN_B:
Protocol Source Port Destination Port Gateway Queue
IPv4 * * * * * * none
Site A – Firewall / Rules / OpenVPN WAN_A2-to-WAN_B:
Protocol Source Port Destination Port Gateway Queue
IPv4 * * * * * * none
Site C – Firewall / Rules / OpenVPN:
Empty
Site C – Firewall / Rules / OpenVPN WAN_A1-to-WAN_C:
Protocol Source Port Destination Port Gateway Queue
IPv4 * * * * * * none
Site C – Firewall / Rules / OpenVPN WAN_A2-to-WAN_C:
Protocol Source Port Destination Port Gateway Queue
IPv4 * * * * * * none
Sites A,B,C – System / Advanced / Firewall & NAT (just in case, not sure if this could influence):
NAT Reflection mode for port forwards: Disabled
Enable NAT Reflection for 1:1 NAT: Unchecked
Enable automatic outbound NAT for Reflection: Unchecked

I must be doing something wrong. Would be very grateful for your help.

Derelict

Can you narrow all that down to one specific site/link and one specific problem? Maybe just repaste that with what you're referring to in bold or something?

DNS forwarder is more compatible in that situation because you can set a specific domain override to source from a specific source address (one that is interesting to OpenVPN, like LAN address). You can do a similar thing in the resolver using the Outgoing Network Interfaces setting using something like LAN there but that affects all queries all the time and is not granular like the domain overrides in the forwarder.

In all honesty such a situation, especially at a larger site, is really a sign you are out-growing DNS resolver on the firewall itself and might be better off using DNS resolvers on an inside network instead. They can be a simple pfSense install there too, just used for unbound. All of those strange sourced-from-firewall and policy routing issues just solve themselves in that case. Same can apply to RADIUS servers, etc. Moving them to something on LAN is often really the best, most-reliable solution.

Panerist

Sure. Just one problem - no connectivity between OpenVPN client networks (i.e. Site_B and Site_C, or 10.55.55.0/24 and 10.77.77.0/24).

Derelict

~~It looks like you have to add 10.55.55.0/24 as a remote network on the 10.77.77.0/24 side and 10.77.77.0/24 as a remote network on the 10.55.55.0/24 side. The routing table has to have those remote networks in its routing table so it knows to send that traffic over OpenVPN. This can be done using Local Networks in the CSOs.

OpenVPN should handle it from there.

This is why it is often easier to design your network so you can route a supernet into OpenVPN and subdivide it using CSOs.

Like a policy route for 10.64.0.0/16 into the gateway group.

CSO:

Local Networks 10.64.0.0/16 (Places a route in the client's routing table sending all traffic for 10.64.0.0/16 into OpenVPN)
Remote Networks: 10.64.0.0/22 (Places an iroute sending that network to that client - client site can then use 10.64.0.0/24 to 10.64.3.0/24 as local networks interesting to OpenVPN. Longer subnets so they will be used locally over the /16.) That would allow you to do 64 /22 client sites without touching the server other than CSOs.~~

I would also avoid specifically setting iroutes in CSOs in current pfSense versions. Putting them in as Remote Networks there does that for you.

Derelict

pfSense B:
Firewall > Rules, LAN (10.55.55.0 interface)

pass rule source LAN net dest 10.88.88.0/24 Advanced set the gateway to GG_A
pass rule source LAN net dest 10.77.77.0/24 Advanced set the gateway to GG_A

pfSense C:
Firewall > Rules, LAN (10.77.77.0 interface)

pass rule source LAN net dest 10.88.88.0/24 Advanced set the gateway to GG_A
pass rule source LAN net dest 10.55.55.0/24 Advanced set the gateway to GG_A

I am off-base in my last post because I did not fully consider the load balance group. You would not want to push the /16 using a CSO there. You would want to do something like this:

pfSense B:
Firewall > Rules, LAN (10.16.0.1 interface)

pass rule source LAN net dest 10.16.0.0/22 Advanced set no gateway (bypass policy routing for local networks)
pass rule source LAN net dest 10.16.0.0/16 Advanced set the gateway to GG_A

pfSense C:
Firewall > Rules, LAN (10.16.4.1 interface)

pass rule source LAN net dest 10.16.4.0/22 Advanced set no gateway (bypass policy routing for local networks)
pass rule source LAN net dest 10.16.0.0/16 Advanced set the gateway to GG_A

In that case you would be maintaining one OpenVPN server at Site A for both sites. Just a different way to do it that can be easier especially as the number of client sites grows.

For instance if you were to add a 10.66.66.0/24 client site you would have to add the OpenVPN Servers, assign new interfaces, add a new GW Group on the server, and add policy routes to all client sites for 10.66.66.0/24.

Panerist

Thanks for the advice on simplifying the networks ranges stuff - will do this later. But I guess my routing issue between the OpenVPN clients' networks is not related to this.

Let's revert back to an example with no ping from 10.55.55.5 to 10.77.77.7, if you don't mind:

IP packets from 10.55.55.5 are properly PBR'ed to 10.77.77.7. In pfSense_B's firewall logs I see the packets from 10.55.55.5 to 10.77.77.7 passed (green V sign) on LAN interface. In pfSense_A's firewall logs I see the packets from 10.55.55.5 to 10.77.77.7 passed on either WAN_A1-to-WAN_B interface or WAN_A2-to-WAN_B, depending on which particular route was chosen by GG_B gateway group. In pfSense_C's firewall logs I see the packets from 10.55.55.5 to 10.77.77.7 passed on either WAN_A1-to-WAN_C interface or WAN_A2-to-WAN_C. "tcpdump icmp" on 10.77.77.7 shows it receives ICMP echo requests from 10.55.55.5 and send the replies.

But, IP packets from 10.77.77.7 are not PBR'ed back to 10.55.55.5. In pfSense_A's firewall logs for WAN_A1/2-to-WAN_C interfaces I don't see packets from 10.77.77.7 to 10.55.55.5 at all - nether passed, nor blocked. I think this is the reason. The "reply packets" are not visible for firewall rules on pfSense_A which should have PBR'ed them to the proper VPN tunnel. As proof - if I add a static route for 10.55.55.0/24 to, say, WAN_A1-to-WAN_B gateway, then 10.55.55.5 can ping 10.77.77.7.

Derelict

Replies are not subject to policy-based routing - only initial connections are. In your case replies will be controlled by reply-to.

I would not place too much emphasis on the firewall logs unless you are actually seeing blocks. Passed replies will not be logged.

Packet captures are your friend here. Going to be a little tricky since you never know which of the two interfaces to capture on. You might want to policy route out one of them on each side until you figure out what the problem is.

Diagnostics > Packet Capture on the 10.77.77 interface. Do you see the replies?

Capture on the OpenVPN interface at the 10.77.77 side. Do you see the replies?

Capture on the OpenVPN interface on the 10.55.55 side. Do you see the replies?

Capture on the 10.55.55 interface. do you see the replies?

Panerist

Tried Packet Capture - the same effect :(

| # | Host | Interface | Packet Capture | Packet Capture |
| | | | ICMP request | ICMP reply |
| 1 | pfSense_C | LAN (10.77.77.) | YES | YES |
| 2 | pfSense_C | OpenVPN (looking to pfSense_A) | YES | YES |
| 3 | pfSense_A | OpenVPN (looking to pfSense_C) | YES | YES |
| 4 | pfSense_A | OpenVPN (looking to pfSense_B) | YES | NO |
| 5 | pfSense_B | OpenVPN (looking to pfSense_A) | YES | NO |
| 6 | pfSense_B | LAN (10.55.55.) | YES | NO |

As I understand, Packet Capture is a tcpdump under hood. So the results are not surprising, as I've already tcpdump'ed from the same interfaces from cli.

Panerist

I found the same problem in one of the posts here in the forum. However it had not been solved:

https://forum.pfsense.org/index.php?topic=40672.0