Intermittent TLS Handshake problem
-
Dear pfSense Community,
We've been running a small pfSense box as our main router/gateway in our fairly normal small business network and we're loving it so far! There are no proxies invloved and the pfSense box does normal NAT for all the clients in our subnet.
But there is one problem I've noticed and I can't quite pinpoint what the exact cause of the problem is, so I'm really hoping someone on here can point me in the right direction.Roughly maybe once or twice a day internet connectivity seemingly drops. Throughout the entire network TV-Streams freeze and websites can't be loaded from any clients. I've been able to drill this down to a TLS handshake problem when HTTPS sites are being loaded. This is further evidenced by the fact that other services still work. ICMPv4 pings work, RDP connections work, DNS works.
Then after anywhere between 2 and 15 minutes everything magically returns to normal again. I've also been able to confirm that a restart of pfSense makes everything work normally again.I've started to prepare and run packet captures when the outages happen. My testing method has been to fetch a simple html page via HTTPS from one of our own servers that is public. I'll attach the captures of one client and also from pfSense taken at the same time for the same TCP Stream. I'll also attach the output of the pfSense Packet Capture GUI. The client capture is taken with Wireshark and the pfSense capture is taken through the packet capture GUI and then downloaded.
On the client side everything looks normal until the client sends the "Client Hello" for the TLS handshake. After that the server never responds with its "Server Hello" and the only thing the client receives after that are "TCP Previous segment not captured" and keep-alive's.
On pfSense I can see the "Server Hello", but Wireshark prints a warning for that packet saying "IPv4 total length exceeds packet length". For that same packet, the packet capture GUI prints "truncated-ip - 18 bytes missing!"
I've already disabled hardware checksum offload, but it doesn't seem to have changed anything.
Client Capture
pfSense Capture
pfSense Packet Capture GUI output:
14:35:06.063207 IP [pfSense WAN IP].37163 > [Server WAN IP].443: tcp 1 14:35:06.086529 IP [Server WAN IP].443 > [pfSense WAN IP].37163: tcp 0 14:35:10.724287 IP [pfSense WAN IP].65405 > [Server WAN IP].443: tcp 0 14:35:10.748028 IP [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 0 14:35:10.748645 IP [pfSense WAN IP].65405 > [Server WAN IP].443: tcp 0 14:35:10.750626 IP [pfSense WAN IP].65405 > [Server WAN IP].443: tcp 298 14:35:10.777189 IP [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 0 14:35:10.780290 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 1460 14:35:10.780492 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 1460 14:35:10.780602 IP [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 101 14:35:10.781052 IP [pfSense WAN IP].65405 > [Server WAN IP].443: tcp 0 14:35:10.824723 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 1460 14:35:11.064614 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 1460 14:35:11.540664 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 1460 14:35:12.500638 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 1460 14:35:14.388982 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 1460 14:35:16.097261 IP [pfSense WAN IP].37163 > [Server WAN IP].443: tcp 1 14:35:16.121156 IP [Server WAN IP].443 > [pfSense WAN IP].37163: tcp 0 14:35:18.292615 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 1460 14:35:20.782296 IP [pfSense WAN IP].65405 > [Server WAN IP].443: tcp 1 14:35:20.805890 IP [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 0 14:35:24.948658 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].37163: tcp 1460 14:35:25.972668 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 1460 14:35:26.134393 IP [pfSense WAN IP].37163 > [Server WAN IP].443: tcp 1 14:35:26.158210 IP [Server WAN IP].443 > [pfSense WAN IP].37163: tcp 0 14:35:30.819387 IP [pfSense WAN IP].65405 > [Server WAN IP].443: tcp 1 14:35:30.843502 IP [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 0 14:35:36.171555 IP [pfSense WAN IP].37163 > [Server WAN IP].443: tcp 1 14:35:36.195961 IP [Server WAN IP].443 > [pfSense WAN IP].37163: tcp 0 14:35:40.856519 IP [pfSense WAN IP].65405 > [Server WAN IP].443: tcp 1 14:35:40.880213 IP [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 0 14:35:41.076905 IP truncated-ip - 18 bytes missing! [Server WAN IP].443 > [pfSense WAN IP].65405: tcp 1460
I would be eternally grateful if anyone here has an idea what could be the cause of the issue.
Regards,
Mo -
MTU issues ?
-
@mopritz said in Intermittent TLS Handshake problem:
IPv4 total length exceeds packet length
that quite often is just red herring.. Did you set "Packet Length" in your capture? Or did you leave it at zero.
After that the server never responds with its "Server Hello"
Did the server actually get the hello?? You need to validate that the server actually got this hello if your not seeing an answer.. If the client sent it, but you get no answer - did the server get and send a response, but client never got it. Or did the server never get it - where was the packet lost, etc.
-
@gertjan said in Intermittent TLS Handshake problem:
MTU issues ?
I don't know how I haven't tried changing the MTU yet. I've just changed it to 1400 on the WAN interface. Guess I'll see over the next days whether that's it or not. From looking up some examples of packet captures with MTU issues it does look like a strong possibility though. Thanks for the nudge :)
@johnpoz said in Intermittent TLS Handshake problem:
that quite often is just red herring.. Did you set "Packet Length" in your capture? Or did you leave it at zero.
I left it at 0.
Did the server actually get the hello?? [...]
Since I can see the "Server Hello" in the pfSense packet capture, I just assumed everything was good on that front.
Regards,
Mo -
@gertjan said in Intermittent TLS Handshake problem:
MTU issues ?
So after two weeks of testing, and the problem still occuring, it seems like it's not exclusively an MTU issue. I can't rule out that it has to do with MTU, but over the last 14 days I've continually lowered the WAN MTU and I'm now at 1000. The problem still occurs in just the same way it did before.
The thing that seems weird to me is that it's not a continuous problem. It happens about once a day, and then no TLS handshakes work, no matter with which server. And it doesn't work for a few minutes. And then it just magically works again. At our full bandwidth, 1 Gb/s.
Our pfSense box is plugged into our ISP provided modem/router combo. The cable modem/router combo is set to "bridge mode", so pfSense gets the public IP via DHCP.
Are there any more things I could look for?
Regards,
Mo -
It's hard to imagine anything in pfSense causing this, it looks like some upstream issue to me.
Have you tried using a vpn and connecting over that? Does that also fail?
Steve