Playing with fq_codel in 2.4

Gatekeeper-ZA

lol :P thanks at least that leads me to a step into a direction, I will try out what you suggested and post back at the moment it is not possible
for some reason from 19:00 - 21:00 the ping goes up to 100ms locked every day im still trying to get them to give an explanation to what is going on.
as it is not only my connection it seems like it is the whole LTE network from the provider. maybe they are too congested but am not really sure as it
has been going on now for a month. as soon as I try the setup I will let you know whats happening from my side. thanks again.

Gatekeeper-ZA

Ok so I tried as advised and seems to be working although my service is so bad at the moment will have to post back later with results.
Thanks Guys

chrcoluk

TheNarc this quote has me thinking.

when dynamic pipes are used, each flow will get the same bandwidth as defined by the pipe, whereas when dynamic queues are used, each flow will share the parent's pipe bandwidth evenly with other flows generated by the same queue (note that other queues with different weights might be connected to the same pipe).

I will give a scenario.

One device on the network is running a steam download as fast as it can 32 download threads.
Other one has a single youtube video playing.

With a 0 ipv4 mask, both devices share a queue.

As I understand it this would give the steam device a 32/33 portion of the bandwidth. Each thread on the steam device would get 1/33 of the queue's (and pipe) bandwidth as it would have 33 flows.

With a 32 ipv4 mask, both devices have their own queue.

The queues are allocated half bandwidth each assuming equal priority?

So steam threads get total of 16/32 of bandwidth of pipe or rather 1/2, and each thread effectively has 0.5/32 of pipe share, whilst the youtube video on the second device if its able to saturate the bandwidth gets 1/2 the pipe to itself.

Do I understand this right that queues share bandwidth in this manner? if yes I prefer the masking per host for sure.

TheNarc

I can confirm that your understanding matches my understanding. What I can't confirm is that my understanding is correct :D It's obviously a pretty confusing topic. Because of course you can have more than one child queue per pipe as well, and any child queue may be dynamic or not. In my setup right now, for example, I have two pipes (upload and download) and two child queues for each pipe.

Consider only my download pipe. It has two child queues: one with a weight of 30 and one with a weight of 70 (for low and high priority traffic respectively). Both child queues are dynamic, with /32 masks on the destination address. Based on my understanding, this means that every host on my LAN should get its own queue.

Now, if that's true, suppose I have 5 hosts that are downloading and directed to my 30-weight "low priority" download queue based on firewall rules and 1 host that is downloading and directed to my 70-weight "high priority" download queue. Each of the 5 "low priority" hosts will get their own queue, but will each of those queues have a weight of 30? My expectation would be no; instead, the weight of the child queue should be equally distributed among however many dynamic queues are spawned from it. So in this case, there would be 5 dynamic queues each with a weight of 6 spawned from the "low priority" child queue of weight 30 and 1 dynamic queue of weight 70 spawned from the "high priority" child queue of weight 70.

Maybe that's not exactly how it works, but my hope is that it's at least conceptually accurate. Because it wouldn't make sense if dynamic queues each had the same weight as the child queue from which they were spawned. In the example above, I'd end up with 5 queues of weight 30 and 1 queue of weight 70. So collectively, my 5 low priority hosts would be getting a share of (150/220) or roughly 68% of the parent pipe, and my 1 high priority host would be getting roughly 32%. That situation would turn on its head my original intention of reserving 30% of a saturated pipe for low priority hosts and 70% for high priority hosts.

I don't know if these ruminations are helpful or simply add to the confusion . . . but at least it's fun trying to think it through ;) Still hoping for a true dummynet prodigy to poke his or her head into this thread.

ntwk

I’ve noticed all of the fq_codel config examples have inbound and outbound queues.

It’s been my experience that shaping or prioritizing inbound WAN traffic, on a home/office router, is ineffective and degrades network performance.

Quick testing on DSLreports, I get much better results when I disable the inbound fw rules and only shape the outbound WAN traffic:

P4 2.4Ghz 2 core w/HT
Spectrum 30down 15up

No shaping - normal throughout & C+ buffer bloat
IN & OUT shaping - reduced download throughput (50%) & A+ buffer bloat
OUT only shaping - normal or slightly improved overall throughput & A+ buffer bloat

Has anyone seen download performance increased with the inbound shaper activated?

Btw, I’m new to the forum, thanks to everyone who’s posted, great discussion.

Nullity

@ntwk:

I’ve noticed all of the fq_codel config examples have inbound and outbound queues.

It’s been my experience that shaping or prioritizing inbound WAN traffic, on a home/office router, is ineffective and degrades network performance.

Quick testing on DSLreports, I get much better results when I disable the inbound fw rules and only shape the outbound WAN traffic:

P4 2.4Ghz 2 core w/HT
Spectrum 30down 15up

No shaping - normal throughout & C+ buffer bloat
IN & OUT shaping - reduced download throughput (50%) & A+ buffer bloat
OUT only shaping - normal or slightly improved overall throughput & A+ buffer bloat

Has anyone seen download performance increased with the inbound shaper activated?

Btw, I’m new to the forum, thanks to everyone who’s posted, great discussion.

I agree but only because of my similarly limited experience. When looking at other's experiences, download rate-limiting has it's use but it's highly dependent on who your ISP is and what hardware they use.

From a bufferbloat perspective, avoiding any buffering that is beyond your control is vital… whether these uncontrollable buffers are making a noticeable impact, well, that is very situational.

(IIRC) I saw ~20% decrease in worst-case latency on download, but I lost ~10% with my average download speed. It was not worth it to me.

Harvy66

To go along with what Nullity said, generally upload has the highest bloat and gives you the most return and you actually have full control over it. One of the reasons for higher bloat in upload is most bufferbloat is caused by fixed sized buffers that are sized for the maximum theoretical provisioned rate that the device supports. This means a 30Mb cable connection may have the buffer of a 300Mb connection. To make matters worse, is the buffer for 300Mb is also larger than optimal. the good news for download is the bottleneck may more likely be in the uplink, which should have properly sized buffers. Downloading tends not to have too bad of bloat.

Upload on the other hand tends to be much slower than download for most residential connections, making the fixed size buffers even worse of an issue. And the source of the data is on the wrong side of the bloat for uploading.

tman222

I think might be the exception here - I actually get (seemingly) slightly better performance when I have fq_codel applied to both upload and download on my fiber connection. That being said, the only way I have tested this to date has been through speed tests, and in particular the DSL Reports speed test to get an idea of bufferbloat. With fq_codel applied to the download side as well, my download during the test is a bit more stable, comes in slightly higher (bandwidth) and has lower average latency during the test. In fact, I have found the best performance so far for me has been by using a 940/940 limit on a gigabit FTTH connection, with a little bit more aggressive target of 3ms and interval of 60ms – the fq_codel defaults are 5ms and 100ms. This does limit the download and upload speed to about 915-920Mbit on my connection, but I'm willing to take a 3 - 3.5% hit on bandwidth to have average lower average latency and connection stability. One could argue that a on a gigabit Fiber connection all this really doesn't matter anyway since there are very few cases where the bandwidth is maxed out anyway, and that's probably true. But nonetheless I do like having these settings to ensure stability.

tman222

@ntwk:

I’ve noticed all of the fq_codel config examples have inbound and outbound queues.

It’s been my experience that shaping or prioritizing inbound WAN traffic, on a home/office router, is ineffective and degrades network performance.

Quick testing on DSLreports, I get much better results when I disable the inbound fw rules and only shape the outbound WAN traffic:

P4 2.4Ghz 2 core w/HT
Spectrum 30down 15up

No shaping - normal throughout & C+ buffer bloat
IN & OUT shaping - reduced download throughput (50%) & A+ buffer bloat
OUT only shaping - normal or slightly improved overall throughput & A+ buffer bloat

Has anyone seen download performance increased with the inbound shaper activated?

Btw, I’m new to the forum, thanks to everyone who’s posted, great discussion.

50% is a very large throughput decrease - do you mind sharing your settings?

chrcoluk

@ntwk:

I’ve noticed all of the fq_codel config examples have inbound and outbound queues.

It’s been my experience that shaping or prioritizing inbound WAN traffic, on a home/office router, is ineffective and degrades network performance.

Quick testing on DSLreports, I get much better results when I disable the inbound fw rules and only shape the outbound WAN traffic:

P4 2.4Ghz 2 core w/HT
Spectrum 30down 15up

No shaping - normal throughout & C+ buffer bloat
IN & OUT shaping - reduced download throughput (50%) & A+ buffer bloat
OUT only shaping - normal or slightly improved overall throughput & A+ buffer bloat

Has anyone seen download performance increased with the inbound shaper activated?

Btw, I’m new to the forum, thanks to everyone who’s posted, great discussion.

Well given for it to work you have to set the inbound pipe lower than your line capacity, yes download speeds will be a bit slower to match the pipe size.

Shaping isnt about max possible throughput but maintaining fairness across different network applications and QoS.

Some applications will by their design completely swamp a line for downloading and act as a sort of ddos.

chrcoluk

@tman222:

I think might be the exception here - I actually get (seemingly) slightly better performance when I have fq_codel applied to both upload and download on my fiber connection. That being said, the only way I have tested this to date has been through speed tests, and in particular the DSL Reports speed test to get an idea of bufferbloat. With fq_codel applied to the download side as well, my download during the test is a bit more stable, comes in slightly higher (bandwidth) and has lower average latency during the test. In fact, I have found the best performance so far for me has been by using a 940/940 limit on a gigabit FTTH connection, with a little bit more aggressive target of 3ms and interval of 60ms – the fq_codel defaults are 5ms and 100ms. This does limit the download and upload speed to about 915-920Mbit on my connection, but I'm willing to take a 3 - 3.5% hit on bandwidth to have average lower average latency and connection stability. One could argue that a on a gigabit Fiber connection all this really doesn't matter anyway since there are very few cases where the bandwidth is maxed out anyway, and that's probably true. But nonetheless I do like having these settings to ensure stability.

you not the exception :)

robnitro

I'm running fq_codel along with HSFC classes.
It helped my bufferbloat go from A to A+ even when I put the max speed in HSFC and the limiter!
Before with just hsfc I had to run 2 or 3 mbit lower for up and down to get just A.
I also have 2 other limiters set lower, for guest clients unknown and known.

In shellcmd (also on filter change)
ipfw sched 1 config pipe 1 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50 && ipfw sched 2 config pipe 2 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50 && ipfw sched 3 config pipe 3 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50 && ipfw sched 4 config pipe 4 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50 && ipfw sched 5 config pipe 5 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50 && ipfw sched 6 config pipe 6 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50

Harvy66

@tman222:

I think might be the exception here - I actually get (seemingly) slightly better performance when I have fq_codel applied to both upload and download on my fiber connection. That being said, the only way I have tested this to date has been through speed tests, and in particular the DSL Reports speed test to get an idea of bufferbloat. With fq_codel applied to the download side as well, my download during the test is a bit more stable, comes in slightly higher (bandwidth) and has lower average latency during the test. In fact, I have found the best performance so far for me has been by using a 940/940 limit on a gigabit FTTH connection, with a little bit more aggressive target of 3ms and interval of 60ms – the fq_codel defaults are 5ms and 100ms. This does limit the download and upload speed to about 915-920Mbit on my connection, but I'm willing to take a 3 - 3.5% hit on bandwidth to have average lower average latency and connection stability. One could argue that a on a gigabit Fiber connection all this really doesn't matter anyway since there are very few cases where the bandwidth is maxed out anyway, and that's probably true. But nonetheless I do like having these settings to ensure stability.

I see gigabit maxed out all of the time. Youtube, Netflix, and Hulu microburst their ~250KiB chunks at 1Gb/s. Packet sniff these TCP connections and I see back-to-back 1500 byte frames for about 2ms at a time. That's for steady state. I technically only have a 150Mb connection, but it's a 1Gb link that is policed to 150Mb. If I keep jumping around the video timelines, I can keep the video stream in a perma-buffering state, which attempts to send at full 1Gb/s. I can see 1Gb/s for a bout the first 100ms or so before the policer starts ramping up. That could represent a 100ms burst in latency if it was not for my ISP's AQM plus my HFSC shaping.

dennypage

https://github.com/pfsense/pfsense/pull/3941

And happiness ensued…

:)

gsmornot

@dennypage:

https://github.com/pfsense/pfsense/pull/3941

And happiness ensued…

:)

Fired up about it. Looking forward to this addition.

strangegopher

;D ;D ;D ;D

tman222

Great news!!!

I wonder if it's a good idea to clear the current fq_codel config before upgrading to 2.4.4 and rebuild it using the GUI after the upgrade? Otherwise, maybe things might not upgrade correctly? Does anyone have any thoughts on that?

Thanks in advance.

matt_

@gsmornot:

@dennypage:

https://github.com/pfsense/pfsense/pull/3941

And happiness ensued…

:)

Fired up about it. Looking forward to this addition.

Me too! I submitted that PR after working on it this week, actually got the idea from reading this thread. I'm testing it on my own right now; I have multi-LAN and multi-WAN, and I needed something to be able to classify traffic coming from WAN A / WAN B out LAN A/B and WAN B out the same LAN paths. Dummynet works awesome for this, and I've gone from a Bufferbloat score of C to A with the patch. If you want to load the patch, you just have to make a queue under the limiter, and assign it with a floating rule on the WAN interface (out direction).

Gripes about Traffic Shaper right now that the Limiter PR seems to help out with:

Not enough parameter configuration for FQ_CODEL on the GUI (default FQ_CODEL target is 5000 on my install)
Couldn't get traffic shaper (altq) to bind to LAGG interfaces
Couldn't get traffic shaper (altq) to work with a multi-LAN setup for download classification
PIE support

For those of you interested, here's a SS: https://i.imgur.com/N36gpXF.png

If anyone has any suggestions for changes or notices any problems, I'd be happy to factor them into my fork (and therefore the PR). Full disclosure, I am by no means an expert on dummynet/ipfw. I just have a lot of free time on my hands…

dennypage

Really appreciate the work Matt!

Harvy66

@matt, why is the interval so large? The interval should be roughly equal to your upper typical RTT. 100,000ms is a pretty big RTT.