Secondary takes over from functional master?



  • I have a new cluster that works great now except for CARP. The secondary wants to steal the CARP master role and I wind up with dual masters.

    I have them connected straight across with a cable on a dedicated Ethernet port I just labeled "hbeat", and have the first set up with an IP of 10.10.10.2, and the other as 10.10.10.3. The master is set up to replicate config across to .3 and that works great, the config is identical. However, I'm unsure as to how CARP actually communicates.

    As soon as I plug in the interfaces on the secondary, it takes over all or a few interfaces and creates a situation where both nodes claim to be master. This breaks communcations, obviously.

    The one thing I don't control is our ISP's equipment. It's a switch with two ports active, and I just plug in there. Is that something that might contribute to this? I've set up an identical setup elsewhere to another ISP and had no such issues.

    Both ports are active, I can move the primary between 1 and 2 and have no issues (if the primary is the only one connected).

    Edit: I set the VHID's up as 1 through 5. Perhaps the Cisco I'm connecting to is messing with that? Maybe I should hike that number up by tacking on a zero, just to be safe?


  • Netgate

    CARP does not have a "heartbeat" interface. The direct-connect cable often called SYNC is for XMLRPC and state sync. The status of that interface is irrelevant to the function of the CARP VIPs on the traffic interfaces.

    CARP requires good layer 2/multicast through the switching gear between the ports the CARP VIPs are on.

    If the secondary is going MASTER it is probably not seeing the advertisements from the primary arriving on its WAN port.

    Yes, that could certainly be something in the ISP switch.

    You can packet capture for just CARP on the ISP interface and see what other VHIDs might be out there. I generally use the last octet of the CARP IP address as the VHID. If everyone does that on a /24 or smaller collision avoidance is self-regulated.

    You can also packet capture on the secondary interface to be sure it is seeing the advertisements from the primary. If it is not, CARP will not function and you will get dual-master.



  • So does every interface communicate via multicast on that VLAN? I have the machines set up on HP switches internally, with the separate VLAN's running "untagged" for the VLAN I want to route through that interface.

    Ie, is it enough that the WAN port (vhid 1) isn't able to communicate with the slave (assuming that's the issue) to cause the other shared IP's to not sort themselves out either?


  • Netgate

    I added more to the post above. You might want to re-read it.

    It is possible to have just one VIP go dual-master if there is a layer 2 problem on that network.

    Every interface primary/secondary pair needs to be able to communicate multicast 224.0.0.18 to each other for CARP to function.



  • Thanks again.

    Yeah, the behavior is just plain odd. As soon as I enable CARP on the secondary, even with every network wire physically pulled except LAN, it starts shifting around the master roles. The WAN cable isn't even connected but it still set that to master, as well as DMZ. It happens without me having anything on the ISP switch except the primary.  And the LAN port is in the default setting for a HP switch, VLAN 1 untagged.

    I shifted VHID 1-5 to 10-14 and it's still weird.

    I'm going to have to just leave the secondary physically disconnected and wait for the weekend to mess around with this some more, I think, can't keep disrupting traffic and making the users livid. Should have tested better before going live I guess, just didn't expect these issues.  :)


  • Netgate

    With an interface pulled the CARP status should be INIT, not MASTER. Not sure what you are seeing there.



  • Me either. :) Oh well, I'll do some testing tomorrow, fortunately saturdays are not very busy. Every other weekday people are working from 5 am to 2 am, so not a lot of maintenance windows.

    One thing that may be an issue is that I connected the firewalls to separate switches. I have two switches in the main rack configured in such a way as to have redundant paths to every other switch, with spanning tree in RSTP mode, and the switches are trunked (HP's variant of trunked, multiple ports joined together and with all VLAN's tagged, default VLAN untagged, ie default). Maybe there's an issue with broadcasts due to that.

    So step one is to reconfigure the switches, shuffle around some ports and get both firewalls on the same one just to eliminate that potential source of failure.

    ISP confirmed the switch we connect to is pure layer 2, so there's nothing there that should interfere.


  • Netgate

    If you're bored you can just plug a laptop with wireshark into the switchports that the secondary should be plugged into. You should be seeing CARP advertisements from the primary. If not it's not going to work.



  • Good tip, I'll try that as well.



  • Do you sync the VIP's? If so that could be the cause…
    Had some issues with that in the past, see: https://forum.pfsense.org/index.php?topic=102740.msg572905#msg572905