It has been a long time and I totally forgot I posted this. I was searching for a solution when I came across my old post. I'm posting this as a future reference for myself or anyone who is looking for a similar solution. That said, this may not be the best solution, but it is the best one I have at this time.
In my most simplified example there are 3 routers (these routers are also pfSense firewalls).
R1, R2 and R3:
R1: CARP Primary at site A
R2: CARP Secondary at site A
R3: Standalone router at site B
A first hop redundancy protocol (FHRP) like CARP, VRRP, or HSRP allows clients to have a highly available default gateway. At site A we have a bunch of servers connected to a L2 switch whose default gateway is pointing to the CARP VIP for their associated VLAN.
Setting up a single VTI IPSec tunnel between site’s B physical WAN address and the WAN VIP at Site A, results in two main issues:
When CARP fails to the secondary making it the Master, OSPF failover involves services starting based on the CARP status, neighbor establishment, forming adjacencies, ect. Having a dynamic routing protocol like OSPF that has HA capabilities baked in and relying on CARP seems wrong. I’m not even sure bidirectional forwarding detection would have any benefit in this config. At least with FRR, OSPF advertises the physical address of the FW (as it should) and not the VIP.
Managing R2 when it is functioning as the backup from site B results in an asymmetric routing issue. Basically, the client’s network traceroute path is not the same as the reverse path which is obvious since you would not want R2 default gateway pointing to R1. This makes pfSense management GUI unusably slow. Levering sloppy states looks like a poor work around to me.
To circumvent these issues I think the best approach is disabling IPsec sync between the clustered FW at site A and manually creating two VTI tunnels.
R3 -> R1
R3 -> R2
When setting up OSPF in this configuration you must change the cost of the R2 VTI interface on R3 to a number that is higher than R1 VTI interface. This cost determines the preferred router to receive the routes from. R1 in this case, since it is lower. If R1 dies and R2 takes over as CARP Master, OSPF independently identifies the failure as well and elects R2.
Since the VTI tunnels are simultaneously connected to R3 from R1 and R2, the VTI address of R1 and R2 can be used to manage the firewalls from site B. This eliminates the asymmetric routing issue, and allows the connecting client interface to be the same interface as the one you are logging into. ie., if you are on the LAN network you access the firewall using the LAN IP, but if you are at site B you access via the IPSec VTI IP.
This does create a DNS issue where you need to create a Split-Brain DNS configuration so that depending on your client address you resolve to a different address. Something like this:
Add-DnsServerClientSubnet -Name "SiteAMgmt" -IPv4Subnet "10.0.10.0/24"
Add-DnsServerClientSubnet -Name "SiteALan" -IPv4Subnet "10.0.20.0/24"
Add-DnsServerClientSubnet -Name "SiteADmz" -IPv4Subnet "10.0.30.0/24"
Add-DnsServerZoneScope -ZoneName "Contoso.com" -Name "NotSiteA"
#VTI transit IP defined
Add-DnsServerResourceRecord -ZoneName "Contoso.com" -A -Name "R1" -IPv4Address "R1 VTI IPv4 address" -ZoneScope "NotSiteA"
Add-DnsServerResourceRecord -ZoneName "Contoso.com" -A -Name "R2" -IPv4Address "R2 VTI IPv4 address" -ZoneScope "NotSiteA"
Add-DnsServerQueryResolutionPolicy -Name "NotSiteAPolicy" -Action ALLOW -FQDN "eq,R1.Contoso.com,R2.Contoso.com" -ClientSubnet "NE,SiteAMgmt,SiteALan,SiteADmz" -ZoneScope "NotSiteA,1" -ZoneName "Contoso.com"
The default zone scope would contain the A record for R1 and R2 address on the management VLAN like normal.
Lastly, this configuration does create a dilemma. Say R1 does not crash but instead a DMZ interface or something goes down and triggers a CARP failover event to R2. Unlike VRRP that can run in a Active\Active config where half the VLAN\INTs on one router and the other half on the secondary, CARP does not appear to be able to do this. Subsequently, OSPF on R1 will continue to advertise routes even though it is the CARP backup. Another situation that could cause this same issue is just putting R1 into maintenance mode. To rectify this, I used the “CARP Status IP” in FRR’s Global setting in R1 to shutdown the FRR services when in backup status. However, I left the “CARP Status IP” configuration on R2 set to None.