IPSEC failover delay with CARP

Thale · Oct 19, 2023, 3:00 AM

In my lab to prove a change to our connections between sites, I've set up a two-location network with Site A using a single pfSense box with dual WAN and Site B using an HA configuration, also with dual WAN. Routed IPSEC VTI is used between the sites (using FRR OSPF for routing). Everything is working fine with acceptable failover times except for IPSEC during a CARP failover.

When an interface on the primary router fails, CARP and FRR fail over to the secondary router very quickly. IPSEC status on both sides shows the tunnels are still connected. For Site 2, the primary router shows connected and the secondary router which just became the CARP master shows disconnected. I am unable to ping over the VTI addresses. Once the tunnel times out with DPD, then a new tunnel is quickly established using the backup router and everything starts working again. Failover time is too long, however. I did play very briefly with IKEv2 retransmission parameters in the IPSEC advanced settings, but that didn't seem to help. IPSEC takes about 3 minutes to be re-established when failing from the primary router to the secondary, but only about 90 seconds when recovering back to the primary.

I can manually start an IPSEC connection from IPSEC status on the secondary router, and that will establish one of the VTI connections and allow the dynamic routing to be established, but the whole point is to failover nicely without manual intervention.

It seems like IPSEC tunnels should either move with a CARP failover or else fail much sooner, but that's not what I'm seeing. Is there a setting somewhere that I'm overlooking that helps with this?

luckman212 · Oct 19, 2023, 3:00 AM

@Thale Did you ever find any settings that help with this? I'm experiencing the same thing, VTI tunnels + 23.05.1

Thale · Oct 19, 2023, 5:13 PM

@luckman212 In the sense that I found that it couldn't be done like this with the results that I wanted, yes. In effect, this seems to be how HA is intended to work.

We changed our approach and avoid using the CARP interface for any IPSEC traffic. We have a separate VTI tunnel connecting from both the primary and secondary router to each of the routers at the remote location. This requires a separate public IP for each router on each WAN, of course, and if both locations have dual routers then it requires a second virtual IP (not CARP) for each router as well. For example, routers A & B are at one location, and routers C & D are at a second location. A1.1 is the primary WAN1 interface on router A, A1.2 is the secondary IP address for WAN1 on router A. A1.1 connects to C1.1, B1.1 connects to D1.1, A1.2 connects to D1.2, B1.2 connects to C1.2. Repeat for WAN2 connections. Then do it all again to cross them (A1.1 to C2.1, B1.1 to D2.1, etc.). All VTI tunnels are up all the time. Then use your routing settings to weight the routes as needed. Remember to exclude your VTI addresses from being published by your routing protocol, or you may get some weird things like routing traffic over an existing VTI tunnel to get to a second VTI endpoint address in an attempt to establish one of the other tunnels, which of course fails.

The routing protocol then becomes the primary determining factor in failover time. For each situation where both locations have 2 WANS and 2 routers, I have 16 VTI tunnels connecting the 4 routers so that I have full redundancy between routers and WANs. If you have only 1 router or only 1 WAN, or if you can't get enough public IP addresses from your ISP, it gets simpler very quickly.