VPN for IMS stops working randomly (EE Wi-Fi calling)
I’m trying to troubleshoot Wi-Fi Calling on EE in the UK from an Android phone, via pfSense out to the Internet. The issue is that after a random amount of time, the established VPN connection used stops working and calls cannot be received or made.
I’ve used pfTop to filter on port 4500 and can see keep alives being sent every 20 seconds, and I’ve also adjusted the UDP time out to silly high values but that made no difference. Watching pfTop it appears keep alives stop being sent and at the that point the connection is dead, however the NAT ports still remain open as I’ve set a high timeout for troubleshooting.
I next did a packet capture on port 4500 traffic to see if that gave any more information.
What I can see are NAT-keepalive’s going out every 20 seconds as expected, these are never responded to which I think is by design as they are only meant to keep NAT connections open. Every couple of minutes a ‘’Informational MID=n Initiator Request” is sent, a reply comes back as ‘Responder Response’ which must be a hello are you still there type of exchange.
Then, at some point after being connected for a while (random time can work okay for a few hours or sometimes a lot less), an Initiator Request is sent but is never replied to, several more are sent over the course of a minute with no replies received, at this point it appears the Android phone treats the VPN connection as dead, which is when the keep alives also stop, which makes sense if the connection is dead.
Interestingly, given the higher than normal timeout for UDP I added so NAT remains open for incoming packets on the established ports, if I call my mobile from another phone, the packet capture shows incoming UDP packets on port 4500 trying to get a responce, of course these get no reply from the phone which has terminated the VPN. If I look at my mobile phone it confirms it has lost Wi-Fi calling.
My question is, is this likely a problem my side? I’m thinking given packets still arrive and are received and traverse the NAT, that everything is okay my end, i.e. it isn’t a configuration issue with pfSense and a route is still open from EEs network back to mine.
I’m not suffering any packet loss on my network that might explain why the Initiator Requests fail to get a response, and I’ve tried two different access points on the network.
My thoughts are an issue with EE and their network, perhaps they are having periods of packet loss, but before I raise this with them I wanted to be sure I had checked enough my end to have a very high confidence level the problem isn't with me.
Excellent tools in pfSense by they way for troubleshooting, made getting this far very easy.
Vanguard772 last edited by
On the EE site pal it says there are two ports might be the problem
UDP 500 (IKEv2)
UDP 4500 (IKEv2)
Thank you for the link. I should have said that port 500 is opened up initially to set up the VPN connection and once established isn't used again and all traffic goes over port 4500 (just how it works with IPSEC Vpns) , hence only monitoring port 4500 as there is no problem establishing the connection, just at some point it stops.
Vanguard772 last edited by
@phil_d Sorry mate thats all I could find. Best waiting for someone with a better understand on this to reply. Good luck.
I would not expect to have to do anything special in pfSense to have that work. Unless you have deliberately locked down the outgoing connections this should all be the internal client connecting out which is allowed by default.
The default UDP state timeouts should also be fine for a 20s keep-alive though setting them higher won't hurt.
I can only suggest running a packet capture on the WAN to try to catch the failing connection. Is that last response actually coming back? Is the packet unusually large etc?
Have you tested this at a different location behind a different firewall?
Not deliberately locked anything down, it's pretty much as default on the firewall rules.
No, the last response isn't coming back. It seems the phone is doing dead peer detection every couple of minutes and the request goes out which can be seen in the packet capture with a response, and the connection can go for hours where this is working fine according to the captures, but then randomly these requests receive no responses, there are 4 attempts from the phone with no response, and at the point the phone then, understandably, drops the VPN connection. If I make a call to my phone though soon afterwards (whilst the firewall still hasn't timed out the connections), a stream of data packets arrive all okay, but of course the phone ignores it as it has taken down the VPN.
I'm not getting any packet loss on the network in my tests, and the request/response either works first time, or stops completely, there is no inbetween, i.e. it has never just tried twice being successful on the second go, it's either all or nothing, so not pointing to random packet loss.
So at EEs end (the VPN server) still thinks the VPN is live after the phone has decided it's dead. This could either be because the server is getting the dead peer detection packets and it's just the response being lost, or it isn't getting the request packets, but just has a longer timeout before it considers the link dead. I can't know the answer to this without seeing what is happening the other end of course.
The only thing I have done is added traffic on port 4500 to be recognised for the qVoip queue, so will take that out just in case something funny is going on there.
I'm pretty sure the problem is not with pfSense, just wanted to get some feedback on the troubleshooting and if my conclusions are in the right direction, that the problem is outside of my control.
There are people complaining that WiFi calling on EE can be unreliable, so I think it is a problem their end. I suspect they have stats for dead connections and dropped clients but given the nature of WiFi calling on mobile phones and people in and out of hotspots and WiFi connections dropping and reconnecting all the time, any other other problems just get lost in that.
Mmm, it does seem unlikely we can do anything to help much in pfSense then.
If you are at some other location, not behind pfSense, and it still behaves the same then that would confirm it.