IPSec taking long time to connect after CARP IP failover.
-
Hello, I have working CARP virtual IP's across two Netgate 8200-max devices. Failover of the IP's works fine.
It seems that it does automatically connect IPSec after failover, but it took over 3 minutes. Hoping to make it in less than a minute. Possible?
I am using pfsync, not using xmlrpc config sync (hoping it's not required for IPSec failover). The CARP failover itself is practically instant.
Any info is appreciated, thank you.
I found another old post with basically the exact same issue: https://forum.netgate.com/topic/174067/ipsec-failover-delay-with-carp but no solution other than spawning two separate tunnels (one for each host) which is going to be a pain (and more expensive). Hoping any "new" info has come out since then.
Original issue:
However, after CARP failed over, IPSec on the standby didn't react (it didn't even try to connect).I waited a minute after IP failover (literally about 60s) for it to do something, nothing happened. So I then manually clicked "Connect P1 and P2's" on the new-master and IPSec came up fine, everything worked.Do I need to configure something else to get it to connect IPSec automatically after CARP fails over? -
@emmdee
Do you use an interface that is based on CARP? And also the identifier if the connection is restricted? -
Thanks for the response.
So I tried this again with more patience and it DID eventually connect but it literally took 3 minutes for IPSec failover to take place. That is not at all ideal.
Any advice to get it connect IPSec faster on failover?
The CARP IP failover was instant. The issue seems to be is IPSec tunnel takes multiple minutes to come up after failover. IPSec is connecting to an AWS VPC VPN.
-
I've had pretty extensive experience with this and the IPsec connection is usually very very near flawless when a failover event occurs, so this does sound like abnormal behavior.
Firstly though, why do you not want to use xmlrpc? That's what this is for, managing an HA cluster. XMLRPC will sync IPsec settings, so I would guess if you're not using that, then maybe the IPsec settings aren't matched between the 2 firewalls.
If you're absolutely sure your IPsec settings are 100% identical between the two, then the next step is to tell us what your settings are for the interface being used.
I'd also say check under Status > CARP and be sure your state creator host IDs actually match between both firewalls.
-
@planedrop said in IPSec taking long time to connect after CARP IP failover.:
Thanks I appreciate you discussing this with me and sharing your own experience with IPSec failover.
Unfortunately my post keeps getting flagged as spam so I can't give you a real proper reply here. I'll try to do it in sections.
I've had pretty extensive experience with this and the IPsec connection is usually very very near flawless when a failover event occurs, so this does sound like abnormal behavior.
Well that's at least a glimmer of hope that 3 minutes isn't normal, thanks.
CARP failover is basically instant, IPSec is about 3 minutes and my goal is 30 seconds -- what sort of IPSec failover timing are you seeing in your fleet? I thought it was interesting I was hitting 3 minutes just like the other user I had linked in the OP.
Firstly though, why do you not want to use xmlrpc?
The reason for not using XMLRPC is because we already manage a fleet of about 50 Netgate hosts via an existing configuration automation engine which utilizes Netgate ECL via config.xml templating. Introducing a one-off factor of configuration management via Netgate's internal config sync will force a lot of the automation engine to be re-engineered.
Since only this one site is using Netgate HA due to constraints at the facility, re-engineering the entire automation stack for this one site is not at all the ideal solution. At all other sites, HA is taken care of via fully redundant networking stack, so Netgate provided HA setup isn't required there. This is the only site with Netgate HA enabled.
Since XMLRPC config sync appears to be optional for HA and failover, we didn't go down the route of re-engineering the existing automation systems. Probably saves a lot of time for people that manually configure Netgates, but our fleet is already managed automatically via templates.
That's what this [xmlrpc] is for, managing an HA cluster.
I thought it was just to sync the config, while CARP/PFSYNC handle the actual HA portion. I didn't see anywhere in the documentation that XMLRPC is required for HA to operate properly, am I mistaken? I do have pfsync and CARP enabled. IPSec does fail over successfully, it just takes 3 minutes to do so.
EDIT: I went ahead and enabled XMLRPC sync for IPSec and Virtual IP's just for troubleshooting. It made no difference in the IPSec failover timing.
XMLRPC will sync IPsec settings, so I would guess if you're not using that, then maybe the IPsec settings aren't matched between the 2 firewalls.
EDIT: I went ahead and enabled XMLRPC sync for IPSec and Virtual IP's just for troubleshooting. It made no difference in the IPSec failover timing.
Once thing I will note is that the Phase2 config is "VTI" mode on the IPSec connection, that appears to nullify the ping host setting in IPSec config which states "Can trigger initiation of a tunnel mode P2, but does not trigger initiation of a VTI mode P2. "
I'll add more if this post doesn't get flagged as spam when trying to submit....
-
@planedrop said in IPSec taking long time to connect after CARP IP failover.:
Continued from the previous post....... I couldn't post this all at once without getting spam flagged.....
If you're absolutely sure your IPsec settings are 100% identical between the two, then the next step is to tell us what your settings are for the interface being used:
Please note the one diff above regarding IPSec settings.
The interface that IPSec runs on is the WAN interface, below is the interface configs for each fw and the virtul IP configs too.I'd also say check under Status > CARP and be sure your state creator host IDs actually match between both firewalls.
I gave the state creators identifiers in the HA config of
a
andb
respectively for each fw, they show up as such in the CARP status. CARP fails over nearly instantly without any problem.For example, on FW A:
State Creator Host IDs: a (This node) b
On FW B:
State Creator Host IDs: a b (This node)
-
Are you using pfSense CE or Plus? I think that is my first follow up question, Plus is supposed to have some more "stuff" in it to help with IPsec failover delays, as mentioned in the docs.
It's been a while since I've had to failover a node for testing so I could be remembering wrong but I think it was near instant failover. But the docs do mention it could take until the timeout of the tunnel if the peer is the one initiating.
Do you have dead peer detection enabled and do you know if the other side of the tunnel does? That should in theory cause the peer to initiate the tunnel again quickly.
Also, as far as I can tell, the backup node in the HA cluster should become an initiator when it's status changes to Master; I'm sure it is, but can you confirm (when in failover) that the primary says Backup and the secondary says Master? Just to be 100% sure that is working.
Finally, from what I am seeing, I think it should work just as well without XLMRPC so that's the good news.