Additional questions on CARP/HA behavior when using a single public IP
-
This forum was very helpful in assisting me getting this setup, seen in this thread: https://forum.netgate.com/topic/167420/multi-wan-high-availability-question
I have this setup and am noticing some behaviors I am not sure are normal in a CARP/HA setup, so I wanted to ask about them. First of all, my current configuration:
We are using Netgate 6100s. ISP1 is primary ISP and has a /29 subnet of public IPs. It is configured conventionally for carp/ha as described in Netgate's documents. ISP2 is secondary and has a single public IP 200.200.200.200. The gateways are arranged into a gateway group. Both are Tier 1. Primary ISP has weight 10, secondary ISP has weight 1. The gateway group is set as the system's default gateway.
For configuration of HA on the secondary ISP, I set the port (WAN2) address to on the primary router 10.255.255.1/30 and 10.255.255.2/30 on the secondary router. The carp VIP is 200.200.200.200. I further performed the outbound NAT and DHCP configuration as described in Netgate's documentation, and also added an additional outbound NAT rule translating traffic from 10.255.255.0/30 to source from the secondary ISP's CARP IP address.
pfSync and XMLRPC were also configured as described in Netgate's documents.
My primary ISP is current NOT attached to these routers. We are in the process of migrating from another firewall system to Netgate. As of now, the primary ISP is still connected to the current production router/firewall system. The secondary ISP is attached to my Netgates. Obviously, WAN1 reads down on both routers. WAN2 reads up on the primary and down on the secondary. So far, this conforms to expectations.
Internet access via the secondary ISP works fine. I have a workstation on the LAN that has internet access as expected. I can simulate failover by shutting down the primary router. When I shutdown the primary router, failover occurs as expected. The secondary router takes over and internet access is uninterrupted for the workstation system for the most part (when running a continuous ping, it is not uncommon for a single packet to be lost, but the interruption is minimal).
It is in the "failback" to primary router that things get odd. This behavior might be expected due to using an unusual configuration for WAN2 to work around having only a single public IP address to work with.
First question. When the primary router comes back online from being shutdown, it does not automatically reassume the role of master. Is that expected behavior?
Second question. The primary router, in CARP status, has a warning that CARP demotion events have occurred and presents me with a button to reset carp demotion status. Clicking the button causes the primary to re-assume the master role. However, this causes internet access to be completely cutoff for the workstation in the LAN. As best I can tell, this is due to the routing table having ISP1 gateway as the default route. I presume this has something to do with the fact that both gateways read down on the primary router after it comes back up, which is not unexpected. I can easily correct the issue by restart dpinger service. This restores internet access and corrects the routing table entries so that ISP2 gateway has the default route. Is this expected behavior for HA failover, or is this an artifact of using the private IP workaround to make this work with a single public IP?
For what it's worth, this failback mode is acceptable for my use case. If this is a limitation imposed by using a private IP on WAN2 with a public CARP VIP on WAN2, it's not a problem. Failback procedure will simply be to wait until after business hours and perform the failback manually as described above. No problem, but the reason I am asking the questions is that if it isn't working automatically due to a misconfiguration, I'd like to track it down and change it.