AWS VPC second tunnel drops after certain amount of time (therefore receiving AWS notifications regarding VPN connections now and then)

TomWork

I am wondering if I am the only one having this issue or not.

After some time, the 2nd IPsec tunnel is disconnected on the PFSense side. Click connect and they reconnect fine. Other than that everything works fine. It isn't VPC related because we have 3 different VPCs and consequently 6 differents tunnels and after some time we always end up with only one tunnel up on each VPC...

We tried to ping a target to keep both tunnels up but same deal. (Automatically ping host in P2 advanced config)

Netgate SG-3100
2.4.1-RELEASE (arm)

Any ideas what it could be?

Cheers,
Thomas

Derelict

Look at the IPsec logs and see who is requesting it be torn down and why.

TomWork

Hi,

Unfortunately, it does not happen often and the logs are quite verbose. By default it seems to only save a few thousands lines locally on the pfsense. If had some information I would have shared it.

Thoms

Derelict

Right. Sometimes you need to log to an external server to solve issues like this.

TomWork

Is there a way to auto-reconnect IPsec tunnel instead of staying in a disconnected state?

Today, It has just happened again on 2 of the 3 pair of tunnels. So it took 12 to 14 days to happen.

shalles

Hi Thoms,

we have a similar problem with some AWS tunnels. Before the tunnel goes down i see the following message:

DPD check timed out, enforcing DPD action

Then it looks like the CHILD_SA is restartet, but one minute later the tunnel goes down.

IKE_SA con24000[762] state change: ESTABLISHED => DELETING
IKE_SA con24000[762] state change: DELETING => DELETING
IKE_SA con24000[762] state change: DELETING => DESTROYING

AWS Support tells me, that also their DPD detection has been triggered the same time.

I really don´t know why this is happening and where to look further.

Regards,
Sebastian

asdfasdf

Same thing happens to us. From what I can tell the tunnel in the pair that goes down is always the one that isn't actively being used to pass traffic. Once it's down it never gets reconnected without a person going in and hitting the 'Connect' button. My guess is this is due to the automatic host pings only going through the 'active' tunnel which is still up. In the AWS-provided configuration for other firewalls (Juniper for example), I've noticed that they are essentially pinging the inside IP of the AWS VPN gateway for each tunnel individually. I'm assuming that is what keeps the passive tunnel connected (or reconnected) when you use other firewalls.

Derelict

Have you tried setting the auto-ping addresses in pfSense to the APIPA addresses on the other side of the tunnels?

shalles

Hi,

yes we already use the APIPA address from the other side of the tunnel.
In our setup also the active tunnel is going down from time to time and never comes up.

I solved it with a script which is triggered via cron every minute. It checks all tunnels and
if a tunnel is down the script brings it up via "ipsec up con#"

Regards,
Sebastian

asdfasdf

What shalles posted is probably a far more elegant solution than what we ended up doing.

We're using static routing so no bgp. I couldn't find a way to ping the remote apipa address using our existing configuration, so what I ended up doing was adding an additional phase 2 entry for each tunnel where the remote IP of this additional phase 2 entry was that tunnel's remote apipa address. Then for this additional phase 2 entry I also put the remote apipa address as the automatic ping target. This has been in place for 4 days now and all tunnels are connected despite AWS bringing some of them down during this time. In our case pfsense never showed all AWS tunnel pairs still connected for longer than 24 hours so for us this has been a huge improvement.

TomWork

Hi,

Glad to see we are not the only one and that this issue is real.

Like you guys, no BGP, only static routes, and yes it seems to always kill the tunnel without traffic (I would tend to believe triggered by AWS).

@Derelict : We ping hosts in the remote network (AWS VPC EC2 instances) and the tunnel still dies. I am unsure how pinging the remote end of the tunnel would help - unless the host pings go via the "active" tunnel making the purpose of the ping useless (which actually from a routing is logical). Finally how can we see the APIPA assignment? E.g: 169.254.12.32/30 is the network assigned. Both IPs .33 and .34 are not pingable from the pfsense side; plus I don't know which one is the pfsense end or the aws end. (ipsec statusall does not display anything more than the Status / IPsec interface).

@asdfasdf : I will try your trick but how do you find the remote APIPA address?

@shalles : Would you mind sharing your script? Ideally what you implemented should be part of pfsense/strongswan aka : auto-reconnecting IPsec tunnels. At the time, I was really thinking I was missing that feature. But it seems it's a strongswan design choice : https://wiki.strongswan.org/issues/1501

PS: I ended up enabling remote logs soon after @Derelict suggestion but I've never found the time to follow-up. Once a week or so, we go clicky-clicky in the interface to get some of these tunnels up again after receiving AWS emails about it ;-) It isn't as a bad @asdfasdf but still an annoyance.

Let's keep this thread going.

asdfasdf

In your AWS account under VPC -> Site-to-Site VPN Connections is that list of VPNs where you can select one and download the configuration for it. Probably you did what I did originally which was pick pfsense as the Vendor and Platform and that version of the config does not show you the inside IPs. I re-downloaded the config choosing Generic for the Vendor and Platform. In that version of the config file you should see the heading "Inside IP Addresses", once for each tunnel. Under that heading there should be "Customer Gateway" which is pfsense's inside IP (local) for that tunnel and "Virtual Private Gateway" which is AWS's inside IP (remote) for that tunnel.

TomWork

@asdfasdf Thanks! I will give it a go when time allows.

shalles

@TomWork
Sure, but it´s not very sophisticated ;-)
I additionally installed the package Cron. So i can manage cron runs via the gui.

Regards,
Sebastian

restart_ipsec.txt

TomWork

@shalles Thanks mate, that's good! It does not need to be sophisticated. It needs to work and be tested which you obviously did. Thanks for sharing. I am trialling @asdfasdf solution on a set of tunnels to see if that helps. If not, I will trial your way which we planned to do initially but were hoping that there was a hidden feature for auto-reconnections. The good thing is that we won't have to code it because you already did it! ;-)

Thanks all, I will revert to this thread in a week or two once we have more data points to see what works and what doesn't - as a workaround.

Cheers,
Thomas

asdfasdf

We've been running with the additional phase 2 entry for about 3 weeks now and each pair of tunnels is still showing both in the pair connected. Prior to this the longest that both tunnels in a pair stayed connected was more like a couple of days. I'm not sure if this is the most correct solution or not but we're planning to just stick with it at this point.

TomWork

Hi,

Unfortunately I could not make @asdfasdf 's phase 2 trick work. Probably a PEBKAC issue. I'll let someone else test and confirm. We are therefore using @shalles script which should work via cron (only tested once manually at this point in time). You can find below a slightly modified version to simplify the logging and remove verbosity.

In /usr/local/bin/ipsec-tunnel-guard.sh :

#!/bin/sh
tunnels=$( /usr/local/sbin/ipsec statusall | /usr/bin/grep dpddelay | /usr/bin/cut -d':' -f1 | /usr/bin/tr -d ' ' )

for i in $tunnels; do
  if /usr/local/sbin/ipsec status $i | /usr/bin/grep -q 'no match'; then
    echo "tunnel $i down"
    /usr/local/sbin/ipsec up $i
  fi
done

Then in /etc/cron.d/ipsec-monitor-guard, we will run the above script every 5min to re-up any down tunnel.

*/5 * * * *   root   /usr/local/bin/ipsec-tunnel-guard.sh | logger -t ipsec-tunnel-guard

Thank you @shalles and @asdfasdf for your help. Much appreciated. I am sure it will help others. Hopefully our problem will now be sorted out.

sepp_huber

@tomwork Thank you for sharing this great script, we have the same problem with the AWS tunnels ;-)
We have a CARP HA setup and wanted to have it on both nodes.
Therefore we need a check, that the script only starts down tunnels only if the CARP state is MASTER and is not active on the BACKUP node.
Here it is - there may be better solutions but it works

#!/bin/sh

# check for MASTER
master=`ifconfig | grep "carp: MASTER"`
if [ -z "$master" ]; then
  echo "CARP Backup => exit script"
  exit;
fi
echo "CARP Master verifying IPSec tunnels..."

tunnels=$( /usr/local/sbin/ipsec statusall | /usr/bin/grep dpddelay | /usr/bin/cut -d':' -f1 | /usr/bin/tr -d ' ' )

for i in $tunnels; do
  if /usr/local/sbin/ipsec status $i | /usr/bin/grep -q 'no match'; then
    echo "tunnel $i down"
    /usr/local/sbin/ipsec up $i
  fi