IPsec not reconnecting after site failure
-
So I have two sites (main and remote) with both having a pfsense box and are connected via an site-to-site IPsec tunnel using routed VTI. The main site uses a public static IP but the remote site is behind a CGNAT (so private IP assigned to WAN interface). To make the tunnel work, I had to have a DDNS entry for the remote site WAN interface and put that as the peer identifier in the main site IPsec settings. I also had to check "Responder only" on the main site IPsec settings. I have DPD check on both sides.
So to establish the connection, I have to click the Connect button under Status -> IPsec. After this, if I restart either of the pfsense boxes I don't have any issues with the remote pfsense box reconnecting and re-establishing the IPsec tunnel. The problem is when either of the site has an Internet outage for say more than an hour, the tunnel does not automatically get reconnected. I have to do the manual "Connect" process again under Status -> IPsec.
I also don't use the "automatically ping host" feature in the phase 2 settings of both sides because I already have gateway monitoring (by pinging the IPsec interface IP on the far side) set. I read somewhere that this does the same thing with routed VTI.
@jimp Any ideas how I can solve the reconnection failure?
-
@kevindd992002 Did you make progress on this? There is a restart on child close option, but I have tried that and still do not get consistent connections. https://redmine.pfsense.org/issues/9767#note-1
-
@bbrendon said in IPsec not reconnecting after site failure:
@kevindd992002 Did you make progress on this? There is a restart on child close option, but I have tried that and still do not get consistent connections. https://redmine.pfsense.org/issues/9767#note-1
I know I resolved this in the past but sorry I forgot what I did because I have since transitioned to using WireGuard. It's way faster than both OpenVPN and IPSec for a 200Mbps link between the two sites.
-
I was about to start a topic for this. I have your exact issue verbatim, so you saved all the typing! I've also been able to recreate the issue in a lab environment. If anyone wants to see any logs, just let me know how to collect the data you want to see and I'll be happy to share it.
-
So I've been trying to figure this out on my lab environment. It seems when the responder-only (site A) is taken offline, the other side (Site B) goes into "connecting" status for 5 minutes. If site A is brought back online within that time, the tunnel will reconnect. Otherwise, Site B changes to "Disconnected" state and it makes no further attempt to contact site A. These are the last few lines in Site B's log:
Jul 11 16:30:06 rtr2 charon[69811]: 16[IKE] <con1000|2> giving up after 5 retransmits
Jul 11 16:30:06 rtr2 charon[69811]: 16[IKE] <con1000|2> establishing IKE_SA failed, peer not responding
Jul 11 16:30:06 rtr2 charon[69811]: 16[MGR] <con1000|2> checkin and destroy IKE_SA con1000[2]
Jul 11 16:30:06 rtr2 charon[69811]: 16[IKE] <con1000|2> IKE_SA con1000[2] state change: CONNECTING => DESTROYING
Jul 11 16:30:06 rtr2 charon[69811]: 16[MGR] checkin and destroy of IKE_SA successfulI've tried playing with DPD and reauth values, but they make no difference. It's always 5 minutes and log shows the same giving up after 5 attempts. I'm not sure what setting is causing it to stop retrying so quickly.
-
@shellbr There is another thread going on about this. Someone suggested a script.
https://forum.netgate.com/post/992563