IPSEC on Dual WAN not spotting failover [I think]
-
Hi,
I know I must be missing something obvious here, but for the life of me ...
I have two sites I wish to connect via an IPSEC VPN.
One site has a single WAN connection [let’s call this the ‘remote site’] and one has two [the ‘main site’].
The Main Site has a Gateway Group set up with two tiers with a single address in each …[say] address 1.1.1.1 is in tier-1 and 2.2.2.2 is in tier-2.
I have Dynamic DNS set up so that VPN.mydomain is set to whichever of 1.1.1.1 and 2.2.2.2 are active at any given time. TTL on this record is set to 60 seconds. We update it every 60 seconds.
Both VPN end points are set up using IkeV2. Retransmission timeout is set to 3 seconds and retries 5 … so 4 ‘ticks’ a minute.
On the Remote Site, I have ‘Remote Gateway’ and ‘Peer Identifier’ set to VPN.mydomain.
On the Main Site I have ‘Interface’ in the ‘IKE Interface’ section set to the GateWay group ID, and ‘my identifier’ set to VPN.mydomain in ‘Phase 1 Proposal’ setup.
On both VPN gateways, ‘Child SA Start Action’ is set to initiate a tunnel connection, and ‘Child SA Stop Action’ is set to restart/reconnect.
When I initially start the VPN gateways, regardless of whether Tier-1 or Tier-2 is the active Gateway, the tunnel comes up just fine.
Let’s say the tunnel is operating over Tier-1 …
If I shut the interface hosting Tier-1 at the Main Site, within a minute or two, the end point address on the gateway at the Remote Site changes from 1.1.1.1 to 2.2.2.2, as expected. However, ‘local host ID’ [in the status screen] at the Main Site remains stuck on 1.1.1.1, even though the gateway group has definitely changed to Tier-2 [2.2.2.2] and the DNS is resolving VPN.mydomain correctly.
All the logs have to offer is what we already know … the Main Site endpoint ID doesn’t match what the Remote endpoint is expecting.
I’ve waited over 15 mins to be sure that we’re not looking at DNS/Gateway timeout issues. I’ve also tried clicking on the ‘connect P1’ buttons on both gateways, but no joy.
The only way I have found to ‘fix’ this problem is to restart the IPSEC service on the main site, and within seconds, everything starts working again.
Same happens when I change back from Tier-2 to Tier-1.
Any suggestions re- what I’m doing wrong here?
Thanks.
Si.