Intermittent website timeout
-
@johnpoz I thought it seemed like an isp thing, but it seems weird that it would be both my DSL and cable connections. I marked the cable gateway as down to force everything over the DSL and the problem didn't go away any faster than just waiting for it to clear up on its own.
-
1472 is normal.. And yeah if you set to not fragment 1473 would fail on a normal 1500 mtu connection.. You have the overhead..
Lets go over this again - if you just sniff on the client and you see retrans, you don't know where it got lost.. Pfsense maybe never got it... You have to sniff upstream of your client to find out why its retrans.. It only retrans because it didn't get an answer..
-
What are you pinging there? If it's an intermittent PMTU problem it could be anywhere in the path between you and the web server.
Look at the pcap you referenced before. Are the retransmissions only when the packets get large?
1472 ICMP payload passing and 1473 being rejected as too large is exactly what one would expect with a normal ethernet MTU of 1500.
-
I'll sniff on the router next time it happens to see what I can find and try some more MTU tests too.
-
You could look at the capture you already took if you still have it.
-
Thanks for the help @Derelict and @johnpoz.
I have a packet capture from the router during an episode and I spent some time looking over it. Some traffic gets through, but there are lots of retransmissions, resets, "ACKed unseen segments", etc. The errors show up for things going out to the internet, but also for some traffic between VLANs.
For the local traffic, DNS has almost no problems, but things like NFS traffic has lots of errors.
Packet size doesn't seem to make a difference.
-
So your using NFS over tcp?
starting a sniff in the middle of any traffic is going to show stuff like unseen segments for acks.. When the sniff missed the first part, etc.
If your see loads of retrans in your converstations even between your own local network - you might want to dig into this..
You will want to catch the full conversation.. Start with say sniffing on the client and the server at same time... And do say a NFS transfer... In this transfer are you seeing lots of retrans? Did the server not ack in a timely manner - did the server send ack to that seq number, but the client never saw it - or saw it delayed?
-
Well, I ran out of time trying to figure this one out and just replaced the 2558 board with a 3558 one. We've been running for several hours now with no problems. Previously we would have had several.
I'm hoping it was just something funky with the old motherboard or its NICs.
-
Follow up post
Replacing the router didn't solve the problem. It was running smoothly for several hours, then we started having lots of dropped TCP packets again. I realized that when I was marking the cable gateway as down to force traffic over our DSL line it didn't change the traffic flow. I think because I used failover rules in gateway groups instead of sending the traffic directly to the interface (I should look into this more). This lead me to think that the traffic problem was happening with both IPs. It turns out the only problem was with the cable connection. The ISP came out yesterday and replaced the modem. Now it seems fixed (12+ hours), but we'll see what happens.
For reference, here is a clean packet capture from the router showing the problem. The TCP connection would handshake properly, but after the
Client Hello
and itsACK
, only a handful of packets make it though, not nearly enough to establish a connection.
-
@dansherman said in Intermittent website timeout:
The ISP came out yesterday and replaced the modem
That would have ZERO to do with problems on your own local network.
For the local traffic, DNS has almost no problems, but things like NFS traffic has lots of errors.
-
That would have ZERO to do with problems on your own local network.
Yes. The main problem was the internet access timing out; the internal problem only surfaced when I was looking into the packet dumps. There still might be an issue there, but I think its more likely that I wasn't looking at a full conversation.
We've had zero issues with the NFS uses, so I'm chalking it up to my lack of experience with reading packet captures.
Thanks for the help!