Stale WG session ?
-
@chudak said in Stale WG session ?:
@eirikrcoquere
Yes in my case
And i saw somebody saying it's the same problem on Android
I need to give it a try. Are there any good up-to-date tutorials for setting things up in pfSense and iPhone? Last time I tried the handshake went well but I had no internet.
-
@eirikrcoquere After monitoring this for a few more days I think the stale connection in my case may be related to transitions between the home network that is behind the firewall and external mobile/wifi networks. The official WG client on my Android phone is configured with an always-on VPN connection to pfsense on the home network, using a dynamic DNS address that maps to the WAN IP.
When I move from the home network to an external network there is typically no issue, but when connecting back to the home wifi the WG session often goes stale. Sometimes it's immediate, other times it is after some hours. Disabling/enabling the interface within the wireguard client fixes the issue so I'm not sure if it is an underlying issue with the way the home network is configured (pure NAT), or whether there is something in the handoff between networks that goes amiss under certain conditions. It does not seem related to e.g. changes in the IP address on the WAN interface because that typically remains stable for months on end. I don't think I've encountered any stale WG sessions when connected to outside networks.
-
@hvbakel said in Stale WG session ?:
@eirikrcoquere After monitoring this for a few more days I think the stale connection in my case may be related to transitions between the home network that is behind the firewall and external mobile/wifi networks. The official WG client on my Android phone is configured with an always-on VPN connection to pfsense on the home network, using a dynamic DNS address that maps to the WAN IP.
When I move from the home network to an external network there is typically no issue, but when connecting back to the home wifi the WG session often goes stale. Sometimes it's immediate, other times it is after some hours. Disabling/enabling the interface within the wireguard client fixes the issue so I'm not sure if it is an underlying issue with the way the home network is configured (pure NAT), or whether there is something in the handoff between networks that goes amiss under certain conditions. It does not seem related to e.g. changes in the IP address on the WAN interface because that typically remains stable for months on end. I don't think I've encountered any stale WG sessions when connected to outside networks.
I will keep an eye on this use case. Off top of my head, I've seen it while being on my home network or T-Mobile cellular, but not 100% sure yet.
Thx
-
Obviously my lab is rebooted quite often as part of the normal daily development cycle. However, my kit has been up for 21 days without reboot thanks to some timing with some traveling and remote work over the past few weeks. All my tunnels to Mullvad, IVPN, etc have persisted this entire time. So, this might be a clue that there is something funky with the WireGuard Go implementation, which is what provides WG support for iOS and Android. I don't have the tooling currently setup to work on the iOS/Android ports, but I'm going to reach out to some people that do and see what they think. There was an issue that was identified by Kyle Evans a few months ago with the FreeBSD kernel implementation that could lead to a stale WG state...but right now it's really hard to tell where the problem lies.
-
@cmcdonald I've tried to do some additional troubleshooting at times when the WG session has gone stale. When this happens, the android client shows repeated log messages stating that the "Handshake did not complete after 5 seconds, retrying". If I do nothing, the handshake process typically completes eventually after maybe ~5 mins.
In my case, the issue only seems to occur (at least I've only noticed it) when the phone connected to my IOT WiFi network that is behind the firewall. When looking at the state of the WG port at the time the handshake issue occurs, I see the following:
IOT udp <WAN_IP>:51420 -> <LAN_IP>:39844 MULTIPLE:SINGLE 33 / 270 4 KiB / 34 KiB
If I kill this state, the next handshake will succeed and the state then changes to:
IOT udp <WAN_IP>:51420 -> <LAN_IP>:39844 MULTIPLE:MULTIPLE 246 / 212 47 KiB / 46 KiB
I'm not sure if any of this helps shed any light on the issue and I'm no expert, but I wonder if there is perhaps an underlying issue in NAT reflection for the WAN address?
-
@hvbakel I switched to using a split DNS setup with a host override for the dynamic DNS name to point to the internal firewall address rather than the WAN address. Cautiously optimistic that this may have resolved the handshake issues I was seeing when connected to the internal network, as I've not encountered any since switching. I will keep monitoring.
-
@hvbakel Cheered too soon I'm afraid and the split-DNS solution also does not solve the periodic issue with handshake failures and sessions going stale. The issue also persists after upgrading to the recently released 2.6/22.01 version of pfSense.
-
I believe I finally got to the root of the issue on my end. As background, my goal was to have my phone to remain connected to the home network through wireguard when leaving the house. The official iPhone wireguard client has the option to conditionally connect on network changes, but this feature is not available in the Android client. While it is possible to use e.g. tasker to control wireguard tunnels this is more convoluted and requires location access to read out the wifi ssid. Therefore I was looking for a way to have an always-on WG connection whether the phone is connected to the home IOT WiFi, or to an external network. The WAN interface has a hostname registered through Dynamic DNS and my intial attempt was to use NAT reflection to maintain the WG connection when switching between external networks and the internal IOT WiFi. Unfortunately, it seems that the NAT reflection for WG is rather unstable and will periodically lose the ability to do handshaking when on the internal WiFi network, requiring a manual off/on toggle of the WG connection to get things working again.
The alternative solution I tried next was to turn off NAT reflection altogether and use split DNS instead. While this works in the sense that it maintains a stable connection on the internal network without handshaking issues, it leads to a new problem because once the WG connection has been established it expects the host IP address to remain the same. Therefore, the connection is lost when moving between internal/external networks because the split DNS will change the IP address.
My solution to this issue was to switch from the official WG Android Client to VPN Client Pro. This VPN client has two options to force a WG reconnection and re-resolution of the host DNS when switching networks and before re-establishing handshakes. This, in combination with the split DNS solution has finally resulted in stable WG connections on internal and external networks. VPN Client Pro also has extensive options to conditionally activate the WG tunnel when connected/disconnected to certain networks, though this again requires location access and location to be always on. One downside to the VPN Client Pro is that it requires a subscription, but in my view it's a small cost compared to the benefits it brings.
In summary, at least in my case, the stale connection issues were limited to connections on the home network and related to instability of NAT reflection with WG tunnels. It does not seem to be an issue with WG itself. A split DNS setup with re-resolving of hostnames allows for seamless transitions between networks with an always-on VPN connection inside or outside the firewall.
-
@hvbakel i think you hit the spot...!!! Thanks.
-
After @hvbakel analysis, I wonder how to apply it on iPhone...
-
I was experiencing this issue on my iPhone as well. Setting a keep-alive interval of 25 seconds in the peer configuration in pfSense did the trick. It's been perfect for the past 48 hours. No more stale sessions, no more toggling the WireGuard connection on and off.
Hope this helps someone.
-
@johnnytheguy much appreciated!
-
I see no difference
-
@chudak Don't know what to tell you. For me, it's been fixed for three days now. Maybe 25 seconds is the wrong value for your network? Also, sometimes different problems can have the same symptoms.