Hypothetical "warm spare" setup using only XMLRPC sync (pfSync turned off)
-
I have a few office locations that have WAN configurations that are not suitable for traditional CARP protocol that would be used for automatic failover. Specifically, I have one location that gets a single WAN IP address via DHCP (and both DHCP on WAN and a single IP address on WAN are not really suitable for CARP), and I have another location that uses a PPPoE link. I am currently running "cold spare" routers in these locations (a backup router that is preconfigured. Just power on and move cables to cold spare).
The big downside of the cold spare setup is that it is a management pain in the neck to keep the cold spare's configuration up to date with the production router's configuration.
In a normal CARP/HA cluster you'd have two routers running at all times, with the backup router receiving state tables and configuration changes from the primary router. Both routers would have physical connections to WAN and LAN networks, obviously, so that failover would be swift and automatic.
Hypothetically, let's say I setup a backup router as a 'warm spare' by configuring an interface for sync between the two, as you would with a normal HA cluster. The primary router would be physically wired up normally. The backup router would have no physical connections, other than the connection between the backup router's sync interface and the primary router's sync interface. In HA settings on the primary and backup, don't bother turning on pfSync. However, do go ahead and setup XMLRPC sync on the primary only.
(Side note, since I am dealing with DHCP on WAN in these locations, I would have the backup router clone the mac address on the WAN port, at least, of the primary router's WAN port, just to avoid any DHCP issues with it seeing a new mac address. Might not cause a problem, but don't care to find out in a failover situation.)
In theory, this would synchronize the configuration from the primary to the backup automatically. If the primary ever failed, all I would need is one staff member on site to move cables from the failed primary to the same ports on the secondary to restore function. States would not come over, but the manual moving of cables is slow enough to effectively kill all existing states anyway, so it's not relevant and is simply a limitation that will have to be accepted.
Has anyone ever tried this type of configuration, and does it work?