Random issues with IPsec

  • We have several tunnels (in the tens) between pfSense IPsec <> AWS IPsec spanning over 2 different pfsense setups at two different sites (with different internet connections) and we always have random issues with the IPsec tunnels disconnecting and not reconnecting, disconnecting but still show as connected and no traffic goes through, sometimes the whole IPsec service is restarted to no avail and then stuff starts reconnecting on its own.

    We have been encountering the same issue for well over a year now and have tried IPsec best practices from the netgate wiki but issues still popped up anyway, also switched the tunnels from the default AWS pfsense config to using the latest tech/encryptions etc for the tunnels but both default and new configs have issues. It doesn't seem the be the reqid issue mentioned in the forum a few days back.

    Cannot really pinpoint when or why these disconnections happen and cannot track anything in the logs either, would be really grateful for any pointers on where to start looking into this.

  • I think this is not an IPSec issue, it is a network issue between Site A and Site B.
    I had similar issues with a ddos protection which blocked the traffic temporarily when a ddos was detected in the same subnet.

    Maybe you should try a ping from site A to site B (Over WAN IP) and monitor if there is any issue.

    Have no expirience with AWS, but maybe this helps.

    Are these using routed IPsec (VTI)?

    If so, on 2.4.5, changing the Child SA Close Action to Restart/Reconnect may help. See https://redmine.pfsense.org/issues/9767 for more.

    It's also possible that you not being able to reconnect is fixed by https://redmine.pfsense.org/issues/9954 (also in 2.4.5).

    And if you were using the "restart" service button for IPsec, that doesn't stop/start the service, it refreshes the config. So try stopping and then starting as separate actions.

  • @jimp Ok so we have updated to pfsense 2.4.5 and switched from policy based to routed ipsec following the guidelines. We have a couple of questions though:

    • How can we monitor the new vti gateway? Seems like we can reach the first IP on the /30, but not the second. If we monitor the gateway, it will show as down even though it's working fine

    • Under the interfaces status page, we can see outbound traffic on the new interfaces, but the inbound traffic is always 0 on all new ipsec interfaces, even though these are working fine

    Any pointers where to start looking would be appreciated!

    Monitoring the VTI should be no different than any other gateway, though I know some people have had issues here and there. If the gateway monitoring started before the VTI interface was up it might have a bad state table entry for traffic leaving the wrong direction. You can check for that (Diag > States, search for the remote inside VTI IP address, then kill those states)

    Traffic counting won't work properly on a per-interface basis, the way VTI is hooked into FreeBSD the traffic doesn't flow in both directions on the VTI interfaces, it all goes over enc0 and some gets counted twice. Similar problems prevent things like NAT and per-interface rules from working as expected on VTI.

  • You'll also need to allow incoming ICMP echo requests on the IPSec tab of the remote firewall

  • @jimp Thanks re heads up the direction flow. Was thinking that might be related to the monitoring issues. With regards to the gateway monitoring, as an example we have, with being the local IP and being the remote IP. The gateway monitor should reach but it doesn't. can be reached fine. These tunnels are connecting to AWS on the other side, if it makes any difference. Since can't be reached, the gateway shows this as offline if we monitor it.

    This is connecting to AWS on the other end

  • If i'm not mistaken, with Site-to-Site VPNs on AWS you can only use link-local addresses i.e. from 169.254.x.x for your inside tunnel interfaces with VTIs.

  • @marcquark Thanks for the pointed. Have switch from my /30 to the AWS set /30 and gateways are now reachable and monitoring fine!