Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    AWS VPC second tunnel drops after certain amount of time (therefore receiving AWS notifications regarding VPN connections now and then)

    Scheduled Pinned Locked Moved IPsec
    18 Posts 5 Posters 3.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T
      TomWork
      last edited by

      I am wondering if I am the only one having this issue or not.

      After some time, the 2nd IPsec tunnel is disconnected on the PFSense side. Click connect and they reconnect fine. Other than that everything works fine. It isn't VPC related because we have 3 different VPCs and consequently 6 differents tunnels and after some time we always end up with only one tunnel up on each VPC...

      We tried to ping a target to keep both tunnels up but same deal. (Automatically ping host in P2 advanced config)

      • Netgate SG-3100
      • 2.4.1-RELEASE (arm)

      Any ideas what it could be?

      Cheers,
      Thomas

      1 Reply Last reply Reply Quote 1
      • DerelictD
        Derelict LAYER 8 Netgate
        last edited by Derelict

        Look at the IPsec logs and see who is requesting it be torn down and why.

        Chattanooga, Tennessee, USA
        A comprehensive network diagram is worth 10,000 words and 15 conference calls.
        DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
        Do Not Chat For Help! NO_WAN_EGRESS(TM)

        1 Reply Last reply Reply Quote 0
        • T
          TomWork
          last edited by

          Hi,

          Unfortunately, it does not happen often and the logs are quite verbose. By default it seems to only save a few thousands lines locally on the pfsense. If had some information I would have shared it.

          Thoms

          1 Reply Last reply Reply Quote 0
          • DerelictD
            Derelict LAYER 8 Netgate
            last edited by

            Right. Sometimes you need to log to an external server to solve issues like this.

            Chattanooga, Tennessee, USA
            A comprehensive network diagram is worth 10,000 words and 15 conference calls.
            DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
            Do Not Chat For Help! NO_WAN_EGRESS(TM)

            1 Reply Last reply Reply Quote 0
            • T
              TomWork
              last edited by

              Is there a way to auto-reconnect IPsec tunnel instead of staying in a disconnected state?

              Today, It has just happened again on 2 of the 3 pair of tunnels. So it took 12 to 14 days to happen.

              1 Reply Last reply Reply Quote 0
              • S
                shalles
                last edited by

                Hi Thoms,

                we have a similar problem with some AWS tunnels. Before the tunnel goes down i see the following message:

                DPD check timed out, enforcing DPD action
                

                Then it looks like the CHILD_SA is restartet, but one minute later the tunnel goes down.

                IKE_SA con24000[762] state change: ESTABLISHED => DELETING
                IKE_SA con24000[762] state change: DELETING => DELETING
                IKE_SA con24000[762] state change: DELETING => DESTROYING
                

                AWS Support tells me, that also their DPD detection has been triggered the same time.

                I really don´t know why this is happening and where to look further.

                Regards,
                Sebastian

                1 Reply Last reply Reply Quote 0
                • A
                  asdfasdf
                  last edited by

                  Same thing happens to us. From what I can tell the tunnel in the pair that goes down is always the one that isn't actively being used to pass traffic. Once it's down it never gets reconnected without a person going in and hitting the 'Connect' button. My guess is this is due to the automatic host pings only going through the 'active' tunnel which is still up. In the AWS-provided configuration for other firewalls (Juniper for example), I've noticed that they are essentially pinging the inside IP of the AWS VPN gateway for each tunnel individually. I'm assuming that is what keeps the passive tunnel connected (or reconnected) when you use other firewalls.

                  1 Reply Last reply Reply Quote 0
                  • DerelictD
                    Derelict LAYER 8 Netgate
                    last edited by

                    Have you tried setting the auto-ping addresses in pfSense to the APIPA addresses on the other side of the tunnels?

                    Chattanooga, Tennessee, USA
                    A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                    DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                    Do Not Chat For Help! NO_WAN_EGRESS(TM)

                    1 Reply Last reply Reply Quote 0
                    • S
                      shalles
                      last edited by

                      Hi,

                      yes we already use the APIPA address from the other side of the tunnel.
                      In our setup also the active tunnel is going down from time to time and never comes up.

                      I solved it with a script which is triggered via cron every minute. It checks all tunnels and
                      if a tunnel is down the script brings it up via "ipsec up con#"

                      Regards,
                      Sebastian

                      1 Reply Last reply Reply Quote 0
                      • A
                        asdfasdf
                        last edited by

                        What shalles posted is probably a far more elegant solution than what we ended up doing.

                        We're using static routing so no bgp. I couldn't find a way to ping the remote apipa address using our existing configuration, so what I ended up doing was adding an additional phase 2 entry for each tunnel where the remote IP of this additional phase 2 entry was that tunnel's remote apipa address. Then for this additional phase 2 entry I also put the remote apipa address as the automatic ping target. This has been in place for 4 days now and all tunnels are connected despite AWS bringing some of them down during this time. In our case pfsense never showed all AWS tunnel pairs still connected for longer than 24 hours so for us this has been a huge improvement.

                        1 Reply Last reply Reply Quote 0
                        • T
                          TomWork
                          last edited by

                          Hi,

                          Glad to see we are not the only one and that this issue is real.

                          Like you guys, no BGP, only static routes, and yes it seems to always kill the tunnel without traffic (I would tend to believe triggered by AWS).

                          @Derelict : We ping hosts in the remote network (AWS VPC EC2 instances) and the tunnel still dies. I am unsure how pinging the remote end of the tunnel would help - unless the host pings go via the "active" tunnel making the purpose of the ping useless (which actually from a routing is logical). Finally how can we see the APIPA assignment? E.g: 169.254.12.32/30 is the network assigned. Both IPs .33 and .34 are not pingable from the pfsense side; plus I don't know which one is the pfsense end or the aws end. (ipsec statusall does not display anything more than the Status / IPsec interface).

                          @asdfasdf : I will try your trick but how do you find the remote APIPA address?

                          @shalles : Would you mind sharing your script? Ideally what you implemented should be part of pfsense/strongswan aka : auto-reconnecting IPsec tunnels. At the time, I was really thinking I was missing that feature. But it seems it's a strongswan design choice : https://wiki.strongswan.org/issues/1501

                          PS: I ended up enabling remote logs soon after @Derelict suggestion but I've never found the time to follow-up. Once a week or so, we go clicky-clicky in the interface to get some of these tunnels up again after receiving AWS emails about it ;-) It isn't as a bad @asdfasdf but still an annoyance.

                          Let's keep this thread going.

                          1 Reply Last reply Reply Quote 0
                          • A
                            asdfasdf
                            last edited by

                            In your AWS account under VPC -> Site-to-Site VPN Connections is that list of VPNs where you can select one and download the configuration for it. Probably you did what I did originally which was pick pfsense as the Vendor and Platform and that version of the config does not show you the inside IPs. I re-downloaded the config choosing Generic for the Vendor and Platform. In that version of the config file you should see the heading "Inside IP Addresses", once for each tunnel. Under that heading there should be "Customer Gateway" which is pfsense's inside IP (local) for that tunnel and "Virtual Private Gateway" which is AWS's inside IP (remote) for that tunnel.

                            T 1 Reply Last reply Reply Quote 0
                            • T
                              TomWork @asdfasdf
                              last edited by

                              @asdfasdf Thanks! I will give it a go when time allows.

                              1 Reply Last reply Reply Quote 0
                              • S
                                shalles
                                last edited by

                                @TomWork
                                Sure, but it´s not very sophisticated ;-)
                                I additionally installed the package Cron. So i can manage cron runs via the gui.

                                Regards,
                                Sebastian

                                restart_ipsec.txt

                                T 1 Reply Last reply Reply Quote 0
                                • T
                                  TomWork @shalles
                                  last edited by

                                  @shalles Thanks mate, that's good! It does not need to be sophisticated. It needs to work and be tested which you obviously did. Thanks for sharing. I am trialling @asdfasdf solution on a set of tunnels to see if that helps. If not, I will trial your way which we planned to do initially but were hoping that there was a hidden feature for auto-reconnections. The good thing is that we won't have to code it because you already did it! ;-)

                                  Thanks all, I will revert to this thread in a week or two once we have more data points to see what works and what doesn't - as a workaround.

                                  Cheers,
                                  Thomas

                                  1 Reply Last reply Reply Quote 0
                                  • A
                                    asdfasdf
                                    last edited by

                                    We've been running with the additional phase 2 entry for about 3 weeks now and each pair of tunnels is still showing both in the pair connected. Prior to this the longest that both tunnels in a pair stayed connected was more like a couple of days. I'm not sure if this is the most correct solution or not but we're planning to just stick with it at this point.

                                    1 Reply Last reply Reply Quote 0
                                    • T
                                      TomWork
                                      last edited by TomWork

                                      Hi,

                                      Unfortunately I could not make @asdfasdf 's phase 2 trick work. Probably a PEBKAC issue. I'll let someone else test and confirm. We are therefore using @shalles script which should work via cron (only tested once manually at this point in time). You can find below a slightly modified version to simplify the logging and remove verbosity.

                                      In /usr/local/bin/ipsec-tunnel-guard.sh :

                                      #!/bin/sh
                                      tunnels=$( /usr/local/sbin/ipsec statusall | /usr/bin/grep dpddelay | /usr/bin/cut -d':' -f1 | /usr/bin/tr -d ' ' )
                                      
                                      for i in $tunnels; do
                                        if /usr/local/sbin/ipsec status $i | /usr/bin/grep -q 'no match'; then
                                          echo "tunnel $i down"
                                          /usr/local/sbin/ipsec up $i
                                        fi
                                      done
                                      

                                      Then in /etc/cron.d/ipsec-monitor-guard, we will run the above script every 5min to re-up any down tunnel.

                                      */5 * * * *   root   /usr/local/bin/ipsec-tunnel-guard.sh | logger -t ipsec-tunnel-guard
                                      

                                      Thank you @shalles and @asdfasdf for your help. Much appreciated. I am sure it will help others. Hopefully our problem will now be sorted out.

                                      S 1 Reply Last reply Reply Quote 0
                                      • S
                                        sepp_huber @TomWork
                                        last edited by sepp_huber

                                        @tomwork Thank you for sharing this great script, we have the same problem with the AWS tunnels ;-)
                                        We have a CARP HA setup and wanted to have it on both nodes.
                                        Therefore we need a check, that the script only starts down tunnels only if the CARP state is MASTER and is not active on the BACKUP node.
                                        Here it is - there may be better solutions but it works

                                        #!/bin/sh
                                        
                                        # check for MASTER
                                        master=`ifconfig | grep "carp: MASTER"`
                                        if [ -z "$master" ]; then
                                          echo "CARP Backup => exit script"
                                          exit;
                                        fi
                                        echo "CARP Master verifying IPSec tunnels..."
                                        
                                        tunnels=$( /usr/local/sbin/ipsec statusall | /usr/bin/grep dpddelay | /usr/bin/cut -d':' -f1 | /usr/bin/tr -d ' ' )
                                        
                                        for i in $tunnels; do
                                          if /usr/local/sbin/ipsec status $i | /usr/bin/grep -q 'no match'; then
                                            echo "tunnel $i down"
                                            /usr/local/sbin/ipsec up $i
                                          fi
                                        
                                        1 Reply Last reply Reply Quote 0
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.