Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    IPsec not reconnecting after site failure

    Scheduled Pinned Locked Moved IPsec
    6 Posts 3 Posters 2.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K
      kevindd992002
      last edited by

      So I have two sites (main and remote) with both having a pfsense box and are connected via an site-to-site IPsec tunnel using routed VTI. The main site uses a public static IP but the remote site is behind a CGNAT (so private IP assigned to WAN interface). To make the tunnel work, I had to have a DDNS entry for the remote site WAN interface and put that as the peer identifier in the main site IPsec settings. I also had to check "Responder only" on the main site IPsec settings. I have DPD check on both sides.

      So to establish the connection, I have to click the Connect button under Status -> IPsec. After this, if I restart either of the pfsense boxes I don't have any issues with the remote pfsense box reconnecting and re-establishing the IPsec tunnel. The problem is when either of the site has an Internet outage for say more than an hour, the tunnel does not automatically get reconnected. I have to do the manual "Connect" process again under Status -> IPsec.

      I also don't use the "automatically ping host" feature in the phase 2 settings of both sides because I already have gateway monitoring (by pinging the IPsec interface IP on the far side) set. I read somewhere that this does the same thing with routed VTI.

      @jimp Any ideas how I can solve the reconnection failure?

      B 1 Reply Last reply Reply Quote 0
      • B
        bbrendon @kevindd992002
        last edited by

        @kevindd992002 Did you make progress on this? There is a restart on child close option, but I have tried that and still do not get consistent connections. https://redmine.pfsense.org/issues/9767#note-1

        K 1 Reply Last reply Reply Quote 0
        • K
          kevindd992002 @bbrendon
          last edited by

          @bbrendon said in IPsec not reconnecting after site failure:

          @kevindd992002 Did you make progress on this? There is a restart on child close option, but I have tried that and still do not get consistent connections. https://redmine.pfsense.org/issues/9767#note-1

          I know I resolved this in the past but sorry I forgot what I did because I have since transitioned to using WireGuard. It's way faster than both OpenVPN and IPSec for a 200Mbps link between the two sites.

          1 Reply Last reply Reply Quote 0
          • S
            shellbr
            last edited by

            I was about to start a topic for this. I have your exact issue verbatim, so you saved all the typing! I've also been able to recreate the issue in a lab environment. If anyone wants to see any logs, just let me know how to collect the data you want to see and I'll be happy to share it.

            1 Reply Last reply Reply Quote 0
            • S
              shellbr
              last edited by

              So I've been trying to figure this out on my lab environment. It seems when the responder-only (site A) is taken offline, the other side (Site B) goes into "connecting" status for 5 minutes. If site A is brought back online within that time, the tunnel will reconnect. Otherwise, Site B changes to "Disconnected" state and it makes no further attempt to contact site A. These are the last few lines in Site B's log:
              Jul 11 16:30:06 rtr2 charon[69811]: 16[IKE] <con1000|2> giving up after 5 retransmits
              Jul 11 16:30:06 rtr2 charon[69811]: 16[IKE] <con1000|2> establishing IKE_SA failed, peer not responding
              Jul 11 16:30:06 rtr2 charon[69811]: 16[MGR] <con1000|2> checkin and destroy IKE_SA con1000[2]
              Jul 11 16:30:06 rtr2 charon[69811]: 16[IKE] <con1000|2> IKE_SA con1000[2] state change: CONNECTING => DESTROYING
              Jul 11 16:30:06 rtr2 charon[69811]: 16[MGR] checkin and destroy of IKE_SA successful

              I've tried playing with DPD and reauth values, but they make no difference. It's always 5 minutes and log shows the same giving up after 5 attempts. I'm not sure what setting is causing it to stop retrying so quickly.

              B 1 Reply Last reply Reply Quote 0
              • B
                bbrendon @shellbr
                last edited by

                @shellbr There is another thread going on about this. Someone suggested a script.
                https://forum.netgate.com/post/992563

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.