Downstream drops when upstream is saturated
-
I have a new install of pfsense on a laptop.
With no significant network activity, a speed test will yield approx 200M down 8M up.I am running a backup to cloud software which uploads. It seems to breakup a file into 10mb chunks and uploads up to 100 streams at a time saturating the upload.
When this is happening, if I repeat the speed test, my downstream will drop to approx 10M. Sometimes it drops all the way down to 1-2M.
I thought this was a bufferbloat issue so I added a limiter. Limiter and Queue has Queue Management Algorithm set to TailDrop, Limiter has Scheduler set to FQ_CODEL. Down has bandwidth set to 160M (not sure what to set since down varies so much). Up set to 6M. That change got me up to an A+ on a bufferbloat test. That has helped latency issues, but not the downstream issue.
Here is the limiter info:
Limiters: 00001: 160.000 Mbit/s 0 ms burst 0 q131073 1000 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail sched 65537 type FIFO flags 0x0 0 buckets 0 active 00002: 6.000 Mbit/s 0 ms burst 0 q131074 1000 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail sched 65538 type FIFO flags 0x0 0 buckets 0 active Schedulers: 00001: 160.000 Mbit/s 0 ms burst 0 q65537 50 sl. 0 flows (1 buckets) sched 1 weight 0 lmax 0 pri 0 droptail sched 1 type FQ_CODEL flags 0x0 0 buckets 1 active FQ_CODEL target 5ms interval 100ms quantum 1514 limit 10240 flows 1024 NoECN Children flowsets: 1 BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp 0 ip 0.0.0.0/0 0.0.0.0/0 12 2000 0 0 0 00002: 6.000 Mbit/s 0 ms burst 0 q65538 50 sl. 0 flows (1 buckets) sched 2 weight 0 lmax 0 pri 0 droptail sched 2 type FQ_CODEL flags 0x0 0 buckets 1 active FQ_CODEL target 5ms interval 100ms quantum 1514 limit 10240 flows 1024 NoECN Children flowsets: 2 0 ip 0.0.0.0/0 0.0.0.0/0 762959 890964498 221 293551 150113 Queues: q00001 50 sl. 0 flows (1 buckets) sched 1 weight 0 lmax 0 pri 0 droptail q00002 50 sl. 0 flows (1 buckets) sched 2 weight 0 lmax 0 pri 0 droptail
-
If your up stream is completely filled then the ACKs required for the downstream cannot pass and the added latency will impact the throughput.
codelq is usually quite good at mitigating that so perhaps you have not applied it to the traffic correctly?
Otherwise you could use ALTQ shaping to prioritise ACK traffic. Or set a limiter on the backup upload traffic so it does not saturate the WAN.
Steve
-
@stephenw10 Yes, I agree, that's why I added the limiter. I guess its not configured correctly. I followed an online guide. I'm a bit confused as to why a down AND up limiter is needed. I tried turning off the downstream limiter (leaving the upstream on) and it seemed to go back to normal. so I guess the limiter isnt configure correctly.
-
The issue I am having is with the limiter bandwidth setting.
If I do a speed test, then set it just under the numbers, I get no buffer bloat, but the limiter limits the bandwidth so I'm not getting full speed. When the upstream is saturated, buy downstream is like 10X lower.
If I setup the bandwidth to a higher number, I get full speed but the bufferbloat latency goes up.
My internet speeds change randomly. My downstream can vary from 150 to 250 and can vary from 3M to 8M. I guess that's the nature of cable internet but I'm not sure how to setup the limiter.
-
@eng3 wouldn't it just be easier to limit your backup upload? Is there no way in the backup software to limit its upload so it never saturates your upload vs trying to shape it or limit it at the router?
I have no idea how much you have to backup, etc. But can you not just limit it to be after hours? Or at a slow enough rate to make sure it doesn't eat up your upload be it 3 or 8? Can you not set it to say limit of 1? Say between midnight and 6am it can use full?
This is why pretty much every company does backups after hours.. I would never schedule a backup during hours I want to use the connection. If needed, it would be limited at the backup software to use min speed, etc.. during working/play hours.
-
@johnpoz No, there is no way I am aware of. This is an initial backup which will take a many weeks/months. However, there may also be other times where I want to do a large upload perhaps for different reasons with software may have no way to limit either. I'd rather have a solution that works for everything vs having to deal with buffer bloat everytime this happens.
This isn't really related to the base question.
- Is the download limiter necessary in this situation?
- I thought buffer bloat affected latency. Why would the downstream bandwidth drop 10X if upstream is saturated (only when limiting). Without limiting, speed is fine, latency just suffers.
-
@eng3 said in Downstream drops when upstream is saturated:
I am aware of
And what software are you using - I have never seen any software that backs up to the cloud that doesn't have away to control bandwidth used..
What is this just some small mom and pop shop with some soho router and no anything for rate limiting or qos.. They would just be dead for the week it took to do an initial backup?
bufferbloat is normally an issue with latency sorts of things, voip, gaming, video conference, etc. Its a problem on the upside for dns, or trying to get to new sites if your upload is just saturated.. Its more a noticeable on asymmetrical connections 100/5 sorts of connections etc.. Like yours where the ratio is way off..
If you had a bunch of stuff going on just filling your pipe, sure going the shaping/limiting route would be the direction your going. But seems from what you have stated you have this one thing killing your pipe when you would want to use it.. Its simpler to just limit that 1 thing.. Most backup to the clould should have a way to limit that upload use, and even be able to schedule it so off hours it can use more, then during use hours.
-
Even if the software won't do it you could just use a Limiter pipe in pfSense for that host.
-
@stephenw10 As I said, this isnt the only software that may saturate the upstream. It also may not always be from the same device. Can we just focus on my actual question? Why would the downstream bandwidth drop 10X if upstream is saturated while latency being ok.
-
Well I imagine it's because it's dropping TCP ACK packets in the upload queue causing the TCP window to close down to something very small.
Can we see how you have those Limiters applied?
In the limiter Info above you have 12 packets total shown on the download pipe which seems wrong.
Steve
-
@stephenw10 Shouldn't it be dropping more without the limiter? Without the limiter, speeds are high. I just followed the manual exactly https://docs.netgate.com/pfsense/en/latest/recipes/codel-limiters.html
Limiter and Queue has Queue Management Algorithm set to "Tail Drop"
Limiter has Scheduler set to "FQ_CODEL"
Everything else was left at default.Then I played around with the bandwidth settings
I assume 12 packets on the down pipe was because I wasn't downloading anything at the time.
If I do a speed test, then it goes up a little. Ofcourse with the downstream being so limited, there isnt much traffic. On the speed test, I notice "single stream" will only go at around 2M were "multi stream" will get up to 50M. Without the limiter its like 200MLimiters: 00001: 160.000 Mbit/s 0 ms burst 0 q131073 1000 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail sched 65537 type FIFO flags 0x0 0 buckets 0 active 00002: 6.000 Mbit/s 0 ms burst 0 q131074 1000 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail sched 65538 type FIFO flags 0x0 0 buckets 0 active Schedulers: 00001: 160.000 Mbit/s 0 ms burst 0 q65537 50 sl. 0 flows (1 buckets) sched 1 weight 0 lmax 0 pri 0 droptail sched 1 type FQ_CODEL flags 0x0 0 buckets 1 active FQ_CODEL target 5ms interval 100ms quantum 1514 limit 10240 flows 1024 ECN Children flowsets: 1 BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp 0 ip 0.0.0.0/0 0.0.0.0/0 251 330328 46 69000 0 00002: 6.000 Mbit/s 0 ms burst 0 q65538 50 sl. 0 flows (1 buckets) sched 2 weight 0 lmax 0 pri 0 droptail sched 2 type FQ_CODEL flags 0x0 0 buckets 1 active FQ_CODEL target 5ms interval 100ms quantum 1514 limit 10240 flows 1024 ECN Children flowsets: 2 0 ip 0.0.0.0/0 0.0.0.0/0 474062 593056906 265 298902 66543 Queues: q00001 50 sl. 0 flows (1 buckets) sched 1 weight 0 lmax 0 pri 0 droptail q00002 50 sl. 0 flows (1 buckets) sched 2 weight 0 lmax 0 pri 0 droptail
-
Hmm, the totals on the queues there are ~600MB upload and ~300KB download.
So either the download traffic is not correctly using the limiter. Or your firewall rule is applying the limiters reversed, which might explain the very low speeds.
Did you note the warning that the queues are reversed for an outbound floating rule?
-
@stephenw10 I have IN set to the Up queue and OUT set to the Down queue.
Up limiter is set to 6M Down limiter is set to 160MThe issue only occurs with the limiters active AND upstream saturated.
With no traffic and speed test running during download phase:
Limiters: 00001: 160.000 Mbit/s 0 ms burst 0 q131073 1000 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail sched 65537 type FIFO flags 0x0 0 buckets 0 active 00002: 6.000 Mbit/s 0 ms burst 0 q131074 1000 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail sched 65538 type FIFO flags 0x0 0 buckets 0 active Schedulers: 00001: 160.000 Mbit/s 0 ms burst 0 q00001 50 sl. 0 flows (1 buckets) sched 1 weight 0 lmax 0 pri 0 droptail sched 1 type FQ_CODEL flags 0x0 0 buckets 1 active FQ_CODEL target 5ms interval 100ms quantum 1514 limit 10240 flows 1024 ECN Children flowsets: 1 BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp 0 ip 0.0.0.0/0 0.0.0.0/0 201056 301086289 114 171000 1325 00002: 6.000 Mbit/s 0 ms burst 0 q65538 50 sl. 0 flows (1 buckets) sched 2 weight 0 lmax 0 pri 0 droptail sched 2 type FQ_CODEL flags 0x0 0 buckets 1 active FQ_CODEL target 5ms interval 100ms quantum 1514 limit 10240 flows 1024 ECN Children flowsets: 2 0 ip 0.0.0.0/0 0.0.0.0/0 3 144 0 0 0 Queues: q00001 50 sl. 0 flows (1 buckets) sched 1 weight 0 lmax 0 pri 0 droptail q00002 50 sl. 0 flows (1 buckets) sched 2 weight 0 lmax 0 pri 0 droptail
Now with upstream saturated, speed test during download phase:
Limiters: 00001: 160.000 Mbit/s 0 ms burst 0 q131073 1000 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail sched 65537 type FIFO flags 0x0 0 buckets 0 active 00002: 6.000 Mbit/s 0 ms burst 0 q131074 1000 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail sched 65538 type FIFO flags 0x0 0 buckets 0 active Schedulers: 00001: 160.000 Mbit/s 0 ms burst 0 q00001 50 sl. 0 flows (1 buckets) sched 1 weight 0 lmax 0 pri 0 droptail sched 1 type FQ_CODEL flags 0x0 0 buckets 1 active FQ_CODEL target 5ms interval 100ms quantum 1514 limit 10240 flows 1024 ECN Children flowsets: 1 BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp 0 ip 0.0.0.0/0 0.0.0.0/0 76 82028 0 0 0 00002: 6.000 Mbit/s 0 ms burst 0 q65538 50 sl. 0 flows (1 buckets) sched 2 weight 0 lmax 0 pri 0 droptail sched 2 type FQ_CODEL flags 0x0 0 buckets 1 active FQ_CODEL target 5ms interval 100ms quantum 1514 limit 10240 flows 1024 ECN Children flowsets: 2 0 ip 0.0.0.0/0 0.0.0.0/0 91307 121523662 170 216086 15670 Queues: q00001 50 sl. 0 flows (1 buckets) sched 1 weight 0 lmax 0 pri 0 droptail q00002 50 sl. 0 flows (1 buckets) sched 2 weight 0 lmax 0 pri 0 droptail
-
That's what I would expect. As a test try setting the download limit to, say, 100Mbps and make sure it's actually catching that.
-
@stephenw10 What do you mean by "catching it"? what do I look for? Also, I just edited my post
-
I mean if the downstream limiter is set to 100Mbps and the rule it applied to is correctly matching all outbound connections you should not be able see more than 100Mbps in a speedtest.
-
@stephenw10 Yes I was able to verify this
-
@stephenw10 The other thing I noticed is that with the limiter active, if I go on another computer and trying to ping a random site (google.com), every 10-20th ping will just time out. I tried to apply the limiter only to the IP of the computer currently uploading (saturating) and now every ping works.
Overall performance seems a little better, however certain sites (ie united.com, tripadvisor.com, capitalone.com, maybe sites with alot of dynamically loading content) are very slow to load and some parts just fail causing me to have to click reload -
Just all the time or only when the upload is saturated still?
Are you still using FQ_CoDel with the limiter only applied to the uploading host?
I would start with something basic here. Just set a limiter with default config to only the uploading host so it cannot saturate the upload bandwidth and go from there.
Steve
-
@stephenw10 Only when saturated. Yes for now, I figured I'd give it some time to see if the issue persists or was a coincidence.
ok. If this works, its a decent temporary fix, but in the future I may have multiple IPs that could saturate the upstream.