HA in Azure
-
Has anyone successfully deployed pfSense as a site-to-site VPN solution, highly available, in Azure?
I have had a good go at it, firstly traditional CARP virtual IP's cannot exist in Office 365...they won't sync with each other and they just become both as master. We can use the "HA option" that is just to synchronise the configuration from node to another, but full high availability for the site to site tunnels seems impossible.
The HA option we have is basically to use MS load balancers, these are supposed to be guaranteed and we use them to publish the public IP and spray incoming connections on UDP 500/4500 between the two pfSense boxes. Of course, what you have a problem with then is you need to create a route inside Azure on how to get to the far side of the tunnel...and what IP address do you use?
Microsoft came up with an idea of HA Ports load balancing for the internal load balacner, this is basically forwarding all traffic that hits the balancer to the end device. The idea behind it is you can use the internal IP of the load balancer as your endpoint for the routing table...of the far side subnet. Then when you route to that IP you get distributed between the two pfSense boxes.
Okay so we have a public load balancer to distribute incoming connections, internal HA load balancer to distribute outgoing connections...we have client IP affinity on these balancers to ensure they don't swap between. We have pfSense set to replicate it's config so the ipsec tunnels are replicated, I also set the pfSense to be the responder only, so we hit the box assigned by the external load balancer and there's no conflict on bringing up the tunnels. Everything is connected and you think great...
Except, it doensn't quite work properly...my guess is asymetric routing because I don't know if (or how) the affinity logic is the same for the internal and the external load balancer. What I noticed happen was I had 2 services published on another load balancer (HTTP and HTTPS) that was using 2 vips...one service would work over the tunnel and one wouldn't.
This only seems to me like the logic on the internal balancer decided that the route path back came from a different Client IP combo and sprayed one connection to one pfSEnse box and the other to the 2nd pfSense box. The incoming packets are coming in on node A, and the outgoing packets for the different services were going via Node A and Node B. It was very random in testing, sometimes HTTP would work and others HTTPS would work...if I turned off the "passive" pfSense box then everything would work fine (because, presumably, the balancer takes it out of the pool).
Anyone else have any better experience with this? I was looking at other ideas like maybe using BGP, but messing around with vti over the ipsec tunnel wasn't giving me the results I wanted so that the active pfSense box could announce the active routes to Azure. Furthermore it relies on having a virtual network gateway (MS VPN) which costs more money and cancels the point of having pfSense as the replacement.
The theory for me that would work well is if we could configure a "check port" similar to what KEMP provides. A check port is just basically a service that can listen on any port you want, on the primary node only, and when that node goes down the secondary will listen on that port. What that does is give the MS load balancer something to probe that is pretty static, if the load balancer probes that say port 8444 is open on either of the pfSense nodes then send the traffic to that node. I don't know of anything built in to pfSense that already does this?