Issues with IPSec and CARP Failover
Having a strange issue with our IPSec tunnels and not sure where to go at this point.
Before going forward, for reference, this started after upgrading to 2.2.5 and has not resolved since.
Summary of what I did I did this past weekend:
Both sites run master/slave firewall setups. Updated both site01 and site02 to 2.3.1_1.
Both upgrades at site01 went OK, no issues.
Took site2fw2 down. Upgraded site2fw1. Confirmed IPSec tunnel works. Took down site2fw1, booted site2fw2. Upgraded. Confirmed tunnel works. Powered down site2fw2. Powered up site2fw1. Everything connects and is stable. Booted site2fw2 and tunnel drops on site2fw1. Attempt to kill connection from site1fw1 end and reboot…no dice. IPSec tunnel now only connects through site2fw2 and will not work properly if site2fw1 is online at all.
With each upgrade I did since 2.2.5 (keep in mind these tunnels and firewalls worked flawlessly prior to 2.2.5. From 2.0.2 and onward, I'd had no problem) I experienced the same issues. First load of a new firmware and site2fw1 works great. As soon as I bring up site2fw2, things start going offline.
Rebooted fw1 and disabled old tunnel and made new tunnel. I have to login to Site1 fw1 and tell it to disconnect and reconnect the tunnel and then all is OK for awhile again.
Tested OK for about 5-10 minutes, then web interface locked up on fw1. Tried rebooting the web configuration interface from the box itself by plugging straight in with a keyboard and monitor, but didn't help. Had to initiate a reboot from the box itself.
fw2 took place as master during the reboot, but the tunnel would not re-connect. Had to re-start the ipsec service on fw2, then all was OK.
fw1 came back online and resumed as master, but ipsec tunnel remained connected on fw2. Forced the service to stop running on fw2 so it would drop the tunnel. fw1 would not pick up the tunnel.
Took fw1 offline. Started ipsec service on fw2 again. ipsec tunnel reconnected OK.
Tried repeating everything a few times and didn't get any good results. I will say all throughout the IPSec start/stops, another tunnel at Site1 that I have to a partner site (not running pfsense) would come up and down with it each time, on both site1fw1 and site1fw2. This tunnel is not present in site2 at all.
Few other observations:
When fw1 was up as master, the internet connection would cut out every few minutes. Yet, in one of the rare instances of the ipsec tunnel being up on fw1, the tunnel would stay up as I could connect to site1fw1 from site2. Could have just been a coincidental ISP issue, but mentioning in case that's indicative of something else as well.
Booting site2fw1 up without site2fw2 on at all does not change anything. fw1 will not establish an ipsec connection, even after restarting site1fw1. The only times I've observed it establishing an ipsec connection successfully is after a new software version upgrade and when making a new tunnel. It seems as soon as the new tunnel copies over to site2fw2, it starts to muck up.
It seems to me to not be a ipsec settings problem either. I'm having the same issues with this new tunnel as I did the the original one. Perhaps something related to the CARP/failover? Were there any changes to that within 2.2.5 that have carried over to the latest?
Few other observations.
When site2fw1 is trying to connect, I see both a IKEv2 initiator and responder entry in the IPSec Status on site1fw1. site2fw2 connects so fast I never see these entries.
site2fw2 IPSec connects to both site1fw1 and site1fw2.
The log files on either end just keep repeating entries of "IPSEC ignoring acquire, connection attempt pending" and "IPSEC received NO_PROPOSAL_CHOSEN" when trying to connect with site2fw1.
Not sure where else to go with this, short of re-loading site2fw1 from scratch.
Thank you in advance for any help that can be given!