CARP/Pfsync Across Multiple Sites



  • Hi Guys,

    We are trying to build a resilient network across two data centres. We have a 1 Gig circuit between the two sites, and our external IP range is available on both sites.

    My question is that we want to run 2 x Pfsense boxes at each site, and if one site completely fails we would like the secondary site to take over, so I was planning on having all 4 pfsense boxes in a CARP cluster.

    Pfsync will be done by a dedicated VLAN spanned across the two sites (the link is about 1ms).

    Can anyone see any issues here? Has anyone successfully done this?

    Thanks,



  • I'm not sure what the benefit would be. If you are advertising your public address space out from both WAN circuits, and you have internal address space… You shouldn't really need to sync anything between sites.

    What devices are acting as the WAN routers/BGP peers? If you lose the site, the cross-connection of 1Gb would mean you could still access all of your resources externally.

    I guess I am confused as to the network layout, could you provide a detailed diagram?



  • @MikeX:

    I'm not sure what the benefit would be. If you are advertising your public address space out from both WAN circuits, and you have internal address space… You shouldn't really need to sync anything between sites.

    What devices are acting as the WAN routers/BGP peers? If you lose the site, the cross-connection of 1Gb would mean you could still access all of your resources externally.

    I guess I am confused as to the network layout, could you provide a detailed diagram?

    Hi Mark,

    The BGP side of things is controlled by the upstream, who will basically present a /27 range on a VLAN that is accessible from both site A and site B. The WAN connection takes a different route to the 1 Gbps circuit.

    Behind the firewall will sit a number of virtual machines, the idea is to take snapshots of the VMs from primary site A to the secondary site B, so that in the event the primary site fails we can bring the VMs up on site B using the same IP addresses to avoid changing DNS entries.

    Hope that and the attached diagram will make it a little clearer. I don't see why it won't work, but thought I would see if anyone else is doing something similar.

    Thanks.




  • Hmm.. Ok so as long as the routing nastiness of BGP is out of your hands….

    I would say.... try to use pfsync, without CARP... if that makes sense.

    Both sites would run as the production LAN space, with a separate routable management space (so you can access your VM consoles, SAN management interfaces, etc..).

    You wouldn't need CARP because you wouldn't be failing addresses over... the addresses already are active but not accessible via the WAN.

    You're probably saying... "But they are both on that VLAN, wouldn't that mean I need to do a manual failover/intervention in the event of a network outage."

    Yes. :)

    What you are talking about is DR, which is not always an automatic process. I can think of a number of reasons why you wouldn't want an automatic process for this, but the biggest is the accidental failover of your pfsense clusters to the secondary site, then a whole mess of data issues when clients start hitting the secondary nodes.

    Instead, get the configurations synced, and a solid process to 'flip the switch' at the secondary site to bring the routing online.

    Are you using some sort of dynamic routing protocol between the ISP router and your pfsense boxes, or just static routes?



  • Hi Mike,

    Manual intervention is inevitable, however I want it under our control as much of it as possible. For example, if we just use pfsync and not CARP, then in the event of a failover we may need our upstream to flush their ARP entries to make sure the WAN IPs move over to Site B. The IPs would need to be the same on both sides otherwise our IPSec tunnels won't come up and some clients will need to update their DNS entries.

    An unplanned failover would be the worst case scenario, is it possible to make CARP less 'sensitive'? I suppose we could also work with our upstream and try and setup some sort of IPSLA to monitor the primary site's next hop, and only bring up the Site B WAN port if there is no response from Site A, just doesn't seem very elegant.

    As this is only two sites, we will only use static routing, I don't think it is worth going down the OSPF route until we add a 3rd site.

    Thanks.



  • I'll admit I don't know too much about the back end of CARP… but even in a CARP failover, the MAC address changes, yes? So you would be in the same boat I feel.

    Maybe I'm wrong... I'm trying to look up some documentation on it so it might be a while before I get back to you. :)



  • @MikeX:

    I'll admit I don't know too much about the back end of CARP… but even in a CARP failover, the MAC address changes, yes? So you would be in the same boat I feel.

    Maybe I'm wrong... I'm trying to look up some documentation on it so it might be a while before I get back to you. :)

    Well the MAC address corresponds to the VIP so it wouldn't change (I believe?) I will look into the CARP docs.


  • Rebel Alliance Developer Netgate

    The CARP MAC does not change during failover. It is based on the VHID and is shared between the nodes.



  • Well.. that answers that question.

    My only other concern would be… does your ISP use an actual layer 2 VLAN which is shared... or is it a form of layer 2 over layer 3... or is it some sort of route reflector?



  • @MikeX:

    Well.. that answers that question.

    My only other concern would be… does your ISP use an actual layer 2 VLAN which is shared... or is it a form of layer 2 over layer 3... or is it some sort of route reflector?

    I believe for these two sites it will be a stretched L2 VLAN (dark fibre between the sites). All the L3 stuff (including the SVIs for the VLAN) and BGP speakers are located in different sites with better connectivity.

    The 1 Gbps private circuit will be a QinQ to start with, before switching to an MPLS circuit.

    I will let you know what we decide to do and how it works out.


Log in to reply