PFSense 2.5 problems with Site-to-Site AWS VPN connection
-
Last week, we stood up a pair of bare metal PFSense 2.5 servers in HA mode, to bridge traffic between a VLAN in our colo and a VPC in AWS using their managed Site-To-Site VPN service. However, since trying to set up the VPN connection, we have had nothing but very strange problems. The hardware are SuperMicro servers with Dual Xeon E5620 CPUs, and Intel 10GBit interfaces on both the WAN and LAN. Basic networking for the servers is working beautifully right now. I configured the VPN on Amazon's side per the instructions, downloaded the PFSense configuration instructions, and set up the IPSec VPN connection per those instructions, however the VPN will never connect. In fact, if I start a packet capture of all traffic to the AWS VPN endpoint, then go to Status-IPSec-Overview and click the "Connect VPN" button, I actually see zero traffic to the AWS VPN endpoint whatsoever - so to the best of my knowledge, it never even attempts to make the connection.
swanctl --list-conns:
bypass: IKEv1/2, no reauthentication, rekeying every 14400s local: %any remote: 127.0.0.1 local unspecified authentication: remote unspecified authentication: bypasslan: PASS, no rekeying local: 172.31.92.0/24|/0 remote: 172.31.92.0/24|/0 con100000: IKEv2, no reauthentication, rekeying every 25920s, dpd delay 10s local: 66.152.77.120 remote: 52.207.141.26 local pre-shared key authentication: id: 66.152.77.120 remote pre-shared key authentication: id: 52.207.141.26 con100000: TUNNEL, rekeying every 3240s, dpd action is hold local: 172.31.92.0/24|/0 remote: 10.50.0.0/16|/0
swanctl --load-all --file /var/etc/ipsec/swanctl.conf --debug 1:
loaded ike secret 'ike-0' no authorities found, 0 unloaded no pools found, 0 unloaded loaded connection 'bypass' loaded connection 'con100000' successfully loaded 2 connections, 0 unloaded
/var/etc/ipsec/swanctl.conf:
connections { bypass { remote_addrs = 127.0.0.1 children { bypasslan { local_ts = 172.31.92.0/24 remote_ts = 172.31.92.0/24 mode = pass start_action = trap } } } con100000 { fragmentation = yes unique = replace version = 2 proposals = aes256-sha512-ecp521 dpd_delay = 10s dpd_timeout = 40s rekey_time = 25920s reauth_time = 0s over_time = 2880s rand_time = 2880s encap = no mobike = no local_addrs = 66.152.77.120 remote_addrs = 52.207.141.26 pools = local { id = 66.152.77.120 auth = psk } remote { id = 52.207.141.26 auth = psk } children { con100000 { dpd_action = trap mode = tunnel policies = yes life_time = 3600s rekey_time = 3240s rand_time = 360s start_action = trap remote_ts = 10.50.0.0/16 local_ts = 172.31.92.0/24 esp_proposals = aes256-sha512-ecp521 } } } } secrets { ike-0 { secret = 0sOHQyWGROdFE4ZzBfSXhrQW5pYm1wcEc1b1YyQ05nWHQ= id-0 = %any id-1 = 52.207.141.26 } }
What's interesting to me is the ipsec.log file - I see no entries for IKE - it's mostly CFG entries, with a few KNL, LIB, JOB entries. A dump of my log from earlier testing is available at:
https://pastebin.com/F8XHwvaw
Some other troubleshooting I've done:
Connected the primary of these machines with IPSec to another, older PFSense server (2.4.5) successfully
Connected that older PFSense 2.4.5 to an identically configured AWS Site-To-Site VPN in our dev account without difficulty.
My next test I'm about to do is stand up a new 2.5 pfsense and try to connect it to the same VPN in our dev account to see if that works.
Does anyone have any ideas I have not thought of?
-
Do you see it try to connect if you initiate via traffic (e.g. ping from 172.31.92.x to 10.50.x.x)?
There are several issues with IPsec on 21.02/2.5 that might affect what you are seeing, such as one bug which prevents the connect button on the status page from working properly in cases like yours.
You can install the System Patches package and then create entries for the following commit IDs to apply the fixes:
-
Thanks Jim!
Those patches seem to have done the trick! Our S2S VPN is now up and passing traffic between our in-house DC Vlan and an EC2 instance inside the AWS VPC. Have some issues with IPSec failover (shift the primary into CARP maintenance mode, and we only drop a single ping during the swithover, but put it back to normal, and traffic stops) but I will investigate that separately.
Quick question - will applying those patches imply any additional steps when we upgrade to the next release?
-
No, when there is another release you can simply remove those patches (don't revert them, just remove the entries).
-
Having the same issue here.
Applied the patches which helped a bit. Before the patches 5 tunnels out of 15 were connecting. After the patches 8 tunnels are connection out of 15.
Once the patches were applied if I tried to use the green button to connect the VPN under Status/IPsec after selecting a VPN connection and clicked on it all the other VPN vanished from the Status/IPsec and a message "Collecting IPsec status information." was displayed. That messaged stayed there until a cold restart of the server. If I waited long enough the GUI would crash with a message "504 Gateway Time-out".
I will revert to the last version for now until the next version come along.
Hope this bit o info helps the community.
-
Thank you,
After the upgrade (from 2.4.5_p1) to 2.5.0 only 1 of 3 IPSec tunnels were up.
After applying this patches now all 3 tunnels are back alive (whitout further changes) and showing up in the widget.
2 x pfSense 2.5.0 (with patches) to pfSense 2.4.5_p1 (IKEv2)
1 x pfSense 2.5.0 (with patches) to strongswan 5.7.2-1 on Debian 10.8 (IKEv2)I've enbled the "auto apply" option on this patches but didn't reboot since patching.
So for me it fixes the problem(s). -
@jimp -Thanks for solution this seems to have resolved the connectivity issue. I have another issue which is causing IPSec to disconnect. Also ipsec service is not rebooting unless entire pfsense instance is rebooted. but it looks like different issue i'll troubleshoot and raise different thread if required.
Thank you so much for the help.