TCP Retransmission flooding LAN network

brookheather

I have another simple example of this TCP Retransmission flooding - I was watching Netflix on an LG TV that is wired to my switch. At 23:38 I turned off the TV and ten minutes later noticed the internet wasn't working for a minute. Looking at the Wireshark log for that time period I can see a data packet arriving at 23:47 (nearly ten minutes later) for the TV IP address 192.168.1.126 which immediately causes thousands of TCP Retransmission packets to be sent from the pfSense router - these all have the identical sequence number and acknowledgement numbers of the original data packet.

This only happens occasionally - when I've tried to reproduce it by watching Netflix and then turning off the TV I can see the data packet come 10 minutes later but the router usually just sends a single TCP Retransmission packet not a flood so I think there is some bug in the router network implementation?

As a workaround I have reduced the ARP table timeout from 20 minutes to 5 minutes so the disconnected devices are now removed from the router ARP table before this data packet arrives. This seems to have stopped the flooding so far as the data packet can't be sent to the IP address of the TV. I added a system tunable for net.link.ether.inet.max_age with a value of 300 seconds.

DaddyGo

@brookheather

I'm glad you were able to move forward with this issue...

if I remember correctly, for a long time there were serious problems with smart TVs from big manufacturers (LG, Samsung, etc.)
(it was a disaster until LG webOS 2, but I don't think newer versions are very appropriate either)

What I remember is that,
it was not possible to turn off the Wifi from the TV operation menu and all kinds of unmanageable packages raced on the network, even when the TV is in standby mode

Since I don't need such smart TV features, I solved this thing by connecting to an empty separate switch port via ethernet, thus turning off WiFi and this switch port doesn't lead anywhere, but it maintains the ethernet connection.
So I tricked the TV with this switch trick.

I still need to tell you - the segmented network is secure

I connect and distribute such stream video channels in a separate VM environment on the coaxial network of TVs with a DVB-T / T2 modulator

PS:

your ASUS devices are also SOHO routers, which also work in AP mode
a better choice than these and more configurable, for example UBNT WiFi APs,
the APs are also capable of genarating strange traffic, so they also need to be separated

https://forum.netgate.com/topic/128481/best-wireless-ap

johnpoz

@brookheather said in TCP Retransmission flooding LAN network:

CP Retransmission packet not a flood so I think there is some bug in the router network implementation?

Pfsense doesn't duplicate packets or send them on its own.. If your seeing a retran, that came from the client sending it.. Pfsense just sends it on..

If your seeing a flood traffic towards your TV.. Whis is 192.168.1.10 in this sniff? Pfsense wouldn't even see traffic from 2 devices on the lan.. Is pfsense 192.168.1.10? You have some sort of bridge setup?

You doing some sort of nat reflection with source natting?

brookheather

@johnpoz The 192.168.1.10 IP is my Windows 10 server running the Wireshark logging plus Plex and other media services. The pfSense router is on 192.168.1.254. There is no bridge setup - not sure what "nat reflection with source natting" means? The setup is pretty vanilla - an FTTP ONT connected to a pfSense router which is connected to a Netgear managed gigabit switch which has three Asus wireless access points attached - each of which has multiple other wired devices attached.

Are you saying the flood of TCP Retransmission packets is actually coming from the external IP address? I have an external ping running every second to my WAN port and it shows no sign of any increased internet traffic when the flooding is happening - the latency remains low - if there was a flood of packets that saturated my download then I would expect to see an uptick in the latency of the pings.

ts_itops

Hi, did u guys find a solution for this? We have the same problem with Aruba APs and randomly when some clients disconnect after 10 minutes the network gets flooded with tcp retransmissions, we already have different vlans. Network architecture is juniper

brookheather

@ts_itops Reducing the ARP table timeout from 20 minutes to 5 minutes fixed it for me - I have since changed my setup to Unifi switches and wireless access points but haven't bothered to retest with the default ARP timeout. Have you tried a 5 minute ARP timeout?

ts_itops

@brookheather Its already at 5 minutes in our configuration, but we will try to reduce it further, this is a huge problem here at the moment and were searching for days for a solution

johnpoz

@ts_itops If you have something on your network that really wants to talk to 192.168.1.100 for example, and its mac changes to something else.. Then yeah that could cause a lot of traffic to the wrong mac.

But normally IPs and mac combo's don't change very often.. The only thing lowering the arp table cache from 20 to 5 minutes would do would be point IP X to whatever the new mac is a bit faster.

But normally when a device gets a new IP it would send a gratuitous arp - which should update the cache saying hey 192.168.1.100 is at mac xyz..

If the device went away, then a short mac arp table cache would prevent traffic from being sent because their would be no mac for it. But if some device on your network is sending the traffic, the arp cache timeout on pfsense would have nothing to do with that.. Devices like windows normally have a arp cache timeout of like 30 seconds only.

ts_itops

@johnpoz yeah, this makes sense. But i cannot explain how the device sends packets when it left the network. f.e. i left our campus on the evening and drove home, next day i see on our wireshark tracker that my phone sent 15 minutes after i left the building the tcp retransmission storm, this lasted for about 5 minutes then it stopped, all cpus on the switches went to 100% for the duration of the storm, but its not with every device, not even with mine all the time

johnpoz

@ts_itops so you take your phone from network X to network Y, and then on Y you see a storm of retrans still trying to talk to IP 192.168.1.100, even say when your now on 192.168.2/24 ?

Or your seeing this traffic on network X, even though your phone is no longer on the X network?

And this traffic comes from pfsense, or goes through pfsense? If it comes through or from pfsense then yeah the arp cache on pfsense would still think phone IP with mac xyz is still there and sure could continue to send traffic even if phone is no longer on the network. In such a case then sure lower arp cache time on pfsense would lower the amount of time such traffic could be sent.