IPsec and Tiered Gateway Groups
-
I have pfsense 2.3.4 on an SG-2440. It has 2 wan connections configured with basic tiered gateway. No load balancing, shaping, or CARP. There is an ipsec tunnel that connects to a remote Cisco ASA (ver 9.1) using the gw group as the local endpoint. The ipsec on the asa side is configed to "respond only" mode.
Problem: The ipsec tunnel continues to use it's original local endpoint regardless of which one is "active". Stopping/starting ipsec appears to resolve issue.
Scenario: Tier 1 ISP fails due to monitor IP not responding, pfsense marks gateway as down and traffic is routed through tier 2 ISP. Tunnel is eventually torn down due to lost service. When pfsense trys to rebuild tunnel, it continues to use the tier 1 (offline) isp. I have confirmed this in the logs.
Workaround I'm using: Go to Status -> IPsec. Click stop service and wait until stopped. Click start service and wait until started. Click Connect VPN. VPN Successfully connects using the correct ISP. This is also needed when the tier 1 ISP comes back online. If service is not stopped/started, vpn continues to try to use incorrect gateway.
This is a horrible workaround since it requires manual intervention when the ISP goes up/down. Can anyone give guidance on how to resolve? I'm open to any solutions.
Thanks!
-
We have this same issue, only restarting IPSec doesn't solve the issue for us.
We have to reboot the entire pfSense box in order for it to clear it's cached endpoints and try to use the new ones.
I've even attempted create a new phase 1 connection with the new endpoints, and it still uses the end points cached on the existing phase 1 connection somehow.
It's a real problem, particularly in HA environments where you don't want to shut down their entire network to restore a single VPN connection.
-
I have found that you may need to wait about 60 seconds after the service is restarted. I don't have a auto ping setup, so the tunnel doesn't try to reconnect automatically in my test environment, not sure if that plays a part.
-
Kind of surprised I haven't gotten more feedback on this, I thought this would have been a common configuration. Am I going about this the wrong way?
-
I was able to look into this a little more today. I believe the issue is related to the file /usr/local/etc/ipsec.conf not getting updated when the gateway is changed. I tested by watching the left ip address:
Disabled the primary wan
Verified "left" still showed x.x.x.x. Tunnel does not work.
Restarted service via web page. Verified "left" now shows correct ip y.y.y.y and tunnel works. -
After much troubleshooting on and off the last few weeks, I think I have this working "good enough". Our old ASA 5505s handle failing over slightly more reliably, but in the end the cost and flexibility of the pfsense devices seem to be worth it.
I think one of the main issues was DNS resolution. The firewall was unable to update the dynamic dns service when it failed over to the backup wan. One way to resolve that is to configure "Enable default gateway switching" under advanced, but that seemed to cause issues with a static route I had setup to allow the firewall itself to reach the remote lan of the ipsec tunnel.
My resolution was to:
1. Make sure DNS servers are configured on all interfaces under System-> General Setup. (Why can't there be duplicates?)
2. Under Services->DNS Resolver, check "enable forwarding mode"
3. Set DPD to a lower retry rate to cause broken tunnel to be torn down quicker.
4. Enabled Cisco Extensions under vpn advanced. Not sure if this helped anything.
5. Installed cron and changed the rc.dyndns.update task to run every 3 minutes. This probably didn't fix anything, but it helped troubleshoot.Remaining issues:
1. When the primary WAN comes back online, the tunnel never seems to renegotiate on that circuit automatically (maybe im not waiting long enough?). This isn't critical in my view.
2. IPsec is currently using not very secure aggressive mode, more testing needed to check main mode.And maybe someday I'll be able to look into some sort of ipsec over gre connection that handles failures more gracefully, but that looks quite complex since I don't have much knowledge in the field.
Hope this helps someone.