gateways flapping due to delay / highdelay
-
Netgate 3100 23.09.1-RELEASE (arm)
I receive dozens of these notifications every weekend:
Notifications in this message: 3 ================================ 03:05:57 MONITOR: GW_1 is available now, adding to routing group SecondaryFailover 8.8.8.8|XXX.XXX.XXX.XXX|GW_1|695.716ms|474.411ms|5%|online|delay 03:05:58 MONITOR: GW_1 has high latency, omitting from routing group SecondaryFailover 8.8.8.8|XXX.XXX.XXX.XXX|GW_1|701.145ms|479.108ms|5%|down|highdelay 03:06:01 MONITOR: GW_1 is available now, adding to routing group SecondaryFailover 8.8.8.8|XXX.XXX.XXX.XXX|GW_1|695.249ms|487.598ms|5%|online|delay
It happens when a download / upload script runs on a client behind my firewall.
The script executes:
rsync --bwlimit=256000 -av --partial --partial-dir=rsync.tmp rsync://YYY.YYY.YYY.YYY aws s3 cp --profile myAWSprofile myfile.gz s3://myAWSbucket/myfile.gz
Both the rsync download and myfile.gz are about 9 GB in size, so nothing extreme.
The network seems to be performing fine during the run.I've already relaxed most of the monitoring parameters:
Any suggestions on how to improve the above and safely handle this weekly flapping?
-
@adamw Can you use traffic shaping to deprioritize the backup traffic?
What is your bandwidth in relation to your bwlimit? IIRC bwlimit was a bit odd to me…bytes instead of bits…?
-
We are on 150 Mb/s symmetric fibre line.
Before introducing bwlimit, the script was causing us network congestion.First I tried --bwlimit=12800
This was expected to limit transfers to 100 Mb/s, leaving a comfortable 50 Mb/s for all other traffic.
We were still seeing alerts and execution increased from about 7 hours to a whooping 37 hrs.Once we went with --bwlimit=256000, execution stabilised at 9 hrs.
We didn't see any alerts or practical problems for months, until this weekend.We have no other traffic shaping in place.
I'm a bit scared to experiment as this script ("aws s3 cp" specifically) is capable of crashing the firewall. -
@adamw Now that I’m by a PC, bwlimit is Kbytes per second. Somewhere I also recall that rsync doesn’t necessarily limit at a constant speed:
“Rsync writes data over the socket in blocks, and this option both limits the size of the blocks that rsync writes, and tries to keep the average transfer rate at the requested limit. Some “burstiness” may be seen where rsync writes out a block of data and then sleeps to bring the average rate into compliance.”
https://www.cyberciti.biz/faq/how-to-set-keep-rsync-from-using-all-your-bandwidth-on-linux-unix/That page also has other possible solutions.
Or as I mentioned, traffic shaping to make this low priority traffic.