CARP/Pfsync Across Multiple Sites

ehuk

I'm not sure what the benefit would be. If you are advertising your public address space out from both WAN circuits, and you have internal address space… You shouldn't really need to sync anything between sites.

What devices are acting as the WAN routers/BGP peers? If you lose the site, the cross-connection of 1Gb would mean you could still access all of your resources externally.

I guess I am confused as to the network layout, could you provide a detailed diagram?

Hi Mark,

The BGP side of things is controlled by the upstream, who will basically present a /27 range on a VLAN that is accessible from both site A and site B. The WAN connection takes a different route to the 1 Gbps circuit.

Behind the firewall will sit a number of virtual machines, the idea is to take snapshots of the VMs from primary site A to the secondary site B, so that in the event the primary site fails we can bring the VMs up on site B using the same IP addresses to avoid changing DNS entries.

Hope that and the attached diagram will make it a little clearer. I don't see why it won't work, but thought I would see if anyone else is doing something similar.

Thanks.

topology.jpg_thumb

MikeX

Hmm.. Ok so as long as the routing nastiness of BGP is out of your hands….

I would say.... try to use pfsync, without CARP... if that makes sense.

Both sites would run as the production LAN space, with a separate routable management space (so you can access your VM consoles, SAN management interfaces, etc..).

You wouldn't need CARP because you wouldn't be failing addresses over... the addresses already are active but not accessible via the WAN.

You're probably saying... "But they are both on that VLAN, wouldn't that mean I need to do a manual failover/intervention in the event of a network outage."

Yes. :)

What you are talking about is DR, which is not always an automatic process. I can think of a number of reasons why you wouldn't want an automatic process for this, but the biggest is the accidental failover of your pfsense clusters to the secondary site, then a whole mess of data issues when clients start hitting the secondary nodes.

Instead, get the configurations synced, and a solid process to 'flip the switch' at the secondary site to bring the routing online.

Are you using some sort of dynamic routing protocol between the ISP router and your pfsense boxes, or just static routes?

ehuk

Hi Mike,

Manual intervention is inevitable, however I want it under our control as much of it as possible. For example, if we just use pfsync and not CARP, then in the event of a failover we may need our upstream to flush their ARP entries to make sure the WAN IPs move over to Site B. The IPs would need to be the same on both sides otherwise our IPSec tunnels won't come up and some clients will need to update their DNS entries.

An unplanned failover would be the worst case scenario, is it possible to make CARP less 'sensitive'? I suppose we could also work with our upstream and try and setup some sort of IPSLA to monitor the primary site's next hop, and only bring up the Site B WAN port if there is no response from Site A, just doesn't seem very elegant.

As this is only two sites, we will only use static routing, I don't think it is worth going down the OSPF route until we add a 3rd site.

Thanks.

MikeX

I'll admit I don't know too much about the back end of CARP… but even in a CARP failover, the MAC address changes, yes? So you would be in the same boat I feel.

Maybe I'm wrong... I'm trying to look up some documentation on it so it might be a while before I get back to you. :)

ehuk

@MikeX:

I'll admit I don't know too much about the back end of CARP… but even in a CARP failover, the MAC address changes, yes? So you would be in the same boat I feel.

Maybe I'm wrong... I'm trying to look up some documentation on it so it might be a while before I get back to you. :)

Well the MAC address corresponds to the VIP so it wouldn't change (I believe?) I will look into the CARP docs.

jimp

The CARP MAC does not change during failover. It is based on the VHID and is shared between the nodes.

MikeX

Well.. that answers that question.

My only other concern would be… does your ISP use an actual layer 2 VLAN which is shared... or is it a form of layer 2 over layer 3... or is it some sort of route reflector?

ehuk

@MikeX:

Well.. that answers that question.

My only other concern would be… does your ISP use an actual layer 2 VLAN which is shared... or is it a form of layer 2 over layer 3... or is it some sort of route reflector?

I believe for these two sites it will be a stretched L2 VLAN (dark fibre between the sites). All the L3 stuff (including the SVIs for the VLAN) and BGP speakers are located in different sites with better connectivity.

The 1 Gbps private circuit will be a QinQ to start with, before switching to an MPLS circuit.

I will let you know what we decide to do and how it works out.

huwjrr

Curious to know what you did here in the end and how it went. Cheers.

binary_bandit

@ehuk

Would you mind updating us? How did this work out.

We're considering the same thing.

best,

James

huwjrr

@binary_bandit I went with a solution roughly as explained by Mike here.. the advice came from elsewhere, but the comments were basically the same.

Have two sites both routable always, each with its own carp cluster (no pfsync across sites, not necessary for me), but only one is routed to at a time. I allow my upstream provider to route for me, but could do this myself later by enabling/disabling an IP at either site and have them route to that instead. Each site is completely independent and although they advertise 3 public ranges they both have their own native/local range of public ips too.

Really the concensus from everyone I've spoken to is to do this with switches and bgp not pfsense, which is a huge bottleneck - but it does work.