Playing with fq_codel in 2.4

TheNarc

I feel that I should emphasize that I'm not any sort of expert on this subject, so I don't want to lead you astray :) But . . . the following is based on my current understanding:

If the only thing you care about is limiting (capping) bandwidth, either globally or on a per-host basis, then all you need are pipes. You only need queues underneath your pipes if you care about how the pipes' limited bandwidth is distributed among various hosts. Let's start with your example, focusing for now only on the pipes. You don't want the cumulative bandwidth of your "upload pipes" to be more than your ISP's upload cap, and the same goes for download. In fact, the prevailing rule of thumb for preventing bufferbloat seems to be that you don't want it to be more than about 85% of your ISP's caps.

The difficult thing about restricting bandwidth on a per-host basis is that I am unaware of a means by which you can, for example, restrict one host to never use more than 2Mbps of upload bandwidth while also allowing other hosts to use all of your available upload bandwidth in the event that the one restricted host is not using its 2Mbps. Because if you make two pipes, one 8Mbps and one 2Mbps, then any hosts directed through the former will never be allowed more than 8Mbps and any hosts directed through the latter will never be allowed more than 2Mbps.

But this would also be an unusual goal, as far as I can tell. Because you can easily use queues to configure things such that a host is allowed to use as much upload bandwidth as it wants so long as no other hosts are using any, but be throttled down to 2Mbps if other hosts start consuming upload bandwidth. I'm going to assume from here on that this is acceptable to you, but if not, let me know.

I would have only a single upload pipe and a single download pipe, maybe try setting both to 9Mbps at first and lower that if you still see unacceptable bufferbloat from the dslreports speed test. Don't set a mask on either pipe. Next add child queues to each pipe. I'll assume that a "high priority" and a "low priority" queue are sufficient, but you can add as many as you want. I believe that the weights of all child queues should add up to 100, but the system may not enforce this. Nevertheless, it makes things clearer. For example, your existing queue weights add up to 115. If the system didn't give you any error, then I have to assume that the queue with weight 90 is being given 90 / 115 or ~78% of its parent pipe's bandwidth and the queue with weight 25 is being given ~22%. But that's confusing. So I would, as an example with 2 child queues, give the high priority queue a weight of 80 and the low priority queue a weight of 20. For queues under your upload pipe, set a source address mask of 32; for queues under your download pipe, set a destination address mask of 32. That results in "dynamic queues" such that each host will get its own queue, rather than every host on your network being funneled into the same queue. I must admit that this is one of the points on which I am least clear myself, because I don't know how the queue weights come into play then. My expectation would be that however many dynamic queues are spawned from a queue with a weight of 10, that weight would be equally distributed among them, but I'm not certain. Referring back to our example of two child queues, one with weight 80 and the other 20, suppose that 4 hosts are currently being directed to the queue with weight 20. So with the settings I have described, that should result in 4 dynamic queues, one for each host. But is each of those queues then guaranteed a minimum of 5% of the bandwidth of the parent pipe (as I expect) or 20%? I'd need to do more testing/research to know for sure.

Coming back out of the weeds a bit . . . once you have your pipes and child queues set up as described above, you need to assign traffic to them using firewall rules, as you already know. And you want to assign in rules for your LAN interface, inbound direction. If you run any servers and may have connections initiated by inbound traffic on your WAN interface, then you'd want rules there too. But if that's not the case, then you only need to match on inbound LAN traffic, setting the "in pipe" to the desired upload queue and the "out pipe" to the desired download queue. Note that if you do need to match on inbound traffic on your WAN interface, this is reversed, because inbound traffic on the LAN interface is upload while inbound traffic on the WAN interface is download.

Your shellcmd examples look fine to me, although obviously they won't be as long if you only have two pipes. I can't provide sound advice on whether you should tweak the default settings for things like quantum and ECN. I'd just experiment and see what works best for your connection.

With respect to your question about fq_codel vs. wf2q+, you should see fq_codel in the output from ipfw sched show after those shellcmds have been run. I did notice that a filter reload seemed to revert back to wf2q+, so I made two shellcmds, one of type "shellcmd" and one of type "afterfilterchangeshellcmd". I hadn't seen any references to anyone else doing this though, so it may or may not truly be necessary, but I don't think it can hurt anything either.

So, looking back on this, it's kind of jumble, and definitely reflects my initial disclaimer that I don't understand all of it myself :) Hopefully it will at least be somewhat useful. I'm still hoping for a real dummynet guru to step in . . .

Gatekeeper-ZA

lol :P thanks at least that leads me to a step into a direction, I will try out what you suggested and post back at the moment it is not possible
for some reason from 19:00 - 21:00 the ping goes up to 100ms locked every day im still trying to get them to give an explanation to what is going on.
as it is not only my connection it seems like it is the whole LTE network from the provider. maybe they are too congested but am not really sure as it
has been going on now for a month. as soon as I try the setup I will let you know whats happening from my side. thanks again.

Gatekeeper-ZA

Ok so I tried as advised and seems to be working although my service is so bad at the moment will have to post back later with results.
Thanks Guys

chrcoluk

TheNarc this quote has me thinking.

when dynamic pipes are used, each flow will get the same bandwidth as defined by the pipe, whereas when dynamic queues are used, each flow will share the parent's pipe bandwidth evenly with other flows generated by the same queue (note that other queues with different weights might be connected to the same pipe).

I will give a scenario.

One device on the network is running a steam download as fast as it can 32 download threads.
Other one has a single youtube video playing.

With a 0 ipv4 mask, both devices share a queue.

As I understand it this would give the steam device a 32/33 portion of the bandwidth. Each thread on the steam device would get 1/33 of the queue's (and pipe) bandwidth as it would have 33 flows.

With a 32 ipv4 mask, both devices have their own queue.

The queues are allocated half bandwidth each assuming equal priority?

So steam threads get total of 16/32 of bandwidth of pipe or rather 1/2, and each thread effectively has 0.5/32 of pipe share, whilst the youtube video on the second device if its able to saturate the bandwidth gets 1/2 the pipe to itself.

Do I understand this right that queues share bandwidth in this manner? if yes I prefer the masking per host for sure.

TheNarc

I can confirm that your understanding matches my understanding. What I can't confirm is that my understanding is correct :D It's obviously a pretty confusing topic. Because of course you can have more than one child queue per pipe as well, and any child queue may be dynamic or not. In my setup right now, for example, I have two pipes (upload and download) and two child queues for each pipe.

Consider only my download pipe. It has two child queues: one with a weight of 30 and one with a weight of 70 (for low and high priority traffic respectively). Both child queues are dynamic, with /32 masks on the destination address. Based on my understanding, this means that every host on my LAN should get its own queue.

Now, if that's true, suppose I have 5 hosts that are downloading and directed to my 30-weight "low priority" download queue based on firewall rules and 1 host that is downloading and directed to my 70-weight "high priority" download queue. Each of the 5 "low priority" hosts will get their own queue, but will each of those queues have a weight of 30? My expectation would be no; instead, the weight of the child queue should be equally distributed among however many dynamic queues are spawned from it. So in this case, there would be 5 dynamic queues each with a weight of 6 spawned from the "low priority" child queue of weight 30 and 1 dynamic queue of weight 70 spawned from the "high priority" child queue of weight 70.

Maybe that's not exactly how it works, but my hope is that it's at least conceptually accurate. Because it wouldn't make sense if dynamic queues each had the same weight as the child queue from which they were spawned. In the example above, I'd end up with 5 queues of weight 30 and 1 queue of weight 70. So collectively, my 5 low priority hosts would be getting a share of (150/220) or roughly 68% of the parent pipe, and my 1 high priority host would be getting roughly 32%. That situation would turn on its head my original intention of reserving 30% of a saturated pipe for low priority hosts and 70% for high priority hosts.

I don't know if these ruminations are helpful or simply add to the confusion . . . but at least it's fun trying to think it through ;) Still hoping for a true dummynet prodigy to poke his or her head into this thread.

ntwk

I’ve noticed all of the fq_codel config examples have inbound and outbound queues.

It’s been my experience that shaping or prioritizing inbound WAN traffic, on a home/office router, is ineffective and degrades network performance.

Quick testing on DSLreports, I get much better results when I disable the inbound fw rules and only shape the outbound WAN traffic:

P4 2.4Ghz 2 core w/HT
Spectrum 30down 15up

No shaping - normal throughout & C+ buffer bloat
IN & OUT shaping - reduced download throughput (50%) & A+ buffer bloat
OUT only shaping - normal or slightly improved overall throughput & A+ buffer bloat

Has anyone seen download performance increased with the inbound shaper activated?

Btw, I’m new to the forum, thanks to everyone who’s posted, great discussion.

Nullity

@ntwk:

I’ve noticed all of the fq_codel config examples have inbound and outbound queues.

It’s been my experience that shaping or prioritizing inbound WAN traffic, on a home/office router, is ineffective and degrades network performance.

Quick testing on DSLreports, I get much better results when I disable the inbound fw rules and only shape the outbound WAN traffic:

P4 2.4Ghz 2 core w/HT
Spectrum 30down 15up

No shaping - normal throughout & C+ buffer bloat
IN & OUT shaping - reduced download throughput (50%) & A+ buffer bloat
OUT only shaping - normal or slightly improved overall throughput & A+ buffer bloat

Has anyone seen download performance increased with the inbound shaper activated?

Btw, I’m new to the forum, thanks to everyone who’s posted, great discussion.

I agree but only because of my similarly limited experience. When looking at other's experiences, download rate-limiting has it's use but it's highly dependent on who your ISP is and what hardware they use.

From a bufferbloat perspective, avoiding any buffering that is beyond your control is vital… whether these uncontrollable buffers are making a noticeable impact, well, that is very situational.

(IIRC) I saw ~20% decrease in worst-case latency on download, but I lost ~10% with my average download speed. It was not worth it to me.

Harvy66

To go along with what Nullity said, generally upload has the highest bloat and gives you the most return and you actually have full control over it. One of the reasons for higher bloat in upload is most bufferbloat is caused by fixed sized buffers that are sized for the maximum theoretical provisioned rate that the device supports. This means a 30Mb cable connection may have the buffer of a 300Mb connection. To make matters worse, is the buffer for 300Mb is also larger than optimal. the good news for download is the bottleneck may more likely be in the uplink, which should have properly sized buffers. Downloading tends not to have too bad of bloat.

Upload on the other hand tends to be much slower than download for most residential connections, making the fixed size buffers even worse of an issue. And the source of the data is on the wrong side of the bloat for uploading.

tman222

I think might be the exception here - I actually get (seemingly) slightly better performance when I have fq_codel applied to both upload and download on my fiber connection. That being said, the only way I have tested this to date has been through speed tests, and in particular the DSL Reports speed test to get an idea of bufferbloat. With fq_codel applied to the download side as well, my download during the test is a bit more stable, comes in slightly higher (bandwidth) and has lower average latency during the test. In fact, I have found the best performance so far for me has been by using a 940/940 limit on a gigabit FTTH connection, with a little bit more aggressive target of 3ms and interval of 60ms – the fq_codel defaults are 5ms and 100ms. This does limit the download and upload speed to about 915-920Mbit on my connection, but I'm willing to take a 3 - 3.5% hit on bandwidth to have average lower average latency and connection stability. One could argue that a on a gigabit Fiber connection all this really doesn't matter anyway since there are very few cases where the bandwidth is maxed out anyway, and that's probably true. But nonetheless I do like having these settings to ensure stability.

tman222

@ntwk:

I’ve noticed all of the fq_codel config examples have inbound and outbound queues.

It’s been my experience that shaping or prioritizing inbound WAN traffic, on a home/office router, is ineffective and degrades network performance.

Quick testing on DSLreports, I get much better results when I disable the inbound fw rules and only shape the outbound WAN traffic:

P4 2.4Ghz 2 core w/HT
Spectrum 30down 15up

No shaping - normal throughout & C+ buffer bloat
IN & OUT shaping - reduced download throughput (50%) & A+ buffer bloat
OUT only shaping - normal or slightly improved overall throughput & A+ buffer bloat

Has anyone seen download performance increased with the inbound shaper activated?

Btw, I’m new to the forum, thanks to everyone who’s posted, great discussion.

50% is a very large throughput decrease - do you mind sharing your settings?

chrcoluk

@ntwk:

I’ve noticed all of the fq_codel config examples have inbound and outbound queues.

It’s been my experience that shaping or prioritizing inbound WAN traffic, on a home/office router, is ineffective and degrades network performance.

Quick testing on DSLreports, I get much better results when I disable the inbound fw rules and only shape the outbound WAN traffic:

P4 2.4Ghz 2 core w/HT
Spectrum 30down 15up

No shaping - normal throughout & C+ buffer bloat
IN & OUT shaping - reduced download throughput (50%) & A+ buffer bloat
OUT only shaping - normal or slightly improved overall throughput & A+ buffer bloat

Has anyone seen download performance increased with the inbound shaper activated?

Btw, I’m new to the forum, thanks to everyone who’s posted, great discussion.

Well given for it to work you have to set the inbound pipe lower than your line capacity, yes download speeds will be a bit slower to match the pipe size.

Shaping isnt about max possible throughput but maintaining fairness across different network applications and QoS.

Some applications will by their design completely swamp a line for downloading and act as a sort of ddos.

chrcoluk

@tman222:

I think might be the exception here - I actually get (seemingly) slightly better performance when I have fq_codel applied to both upload and download on my fiber connection. That being said, the only way I have tested this to date has been through speed tests, and in particular the DSL Reports speed test to get an idea of bufferbloat. With fq_codel applied to the download side as well, my download during the test is a bit more stable, comes in slightly higher (bandwidth) and has lower average latency during the test. In fact, I have found the best performance so far for me has been by using a 940/940 limit on a gigabit FTTH connection, with a little bit more aggressive target of 3ms and interval of 60ms – the fq_codel defaults are 5ms and 100ms. This does limit the download and upload speed to about 915-920Mbit on my connection, but I'm willing to take a 3 - 3.5% hit on bandwidth to have average lower average latency and connection stability. One could argue that a on a gigabit Fiber connection all this really doesn't matter anyway since there are very few cases where the bandwidth is maxed out anyway, and that's probably true. But nonetheless I do like having these settings to ensure stability.

you not the exception :)

robnitro

I'm running fq_codel along with HSFC classes.
It helped my bufferbloat go from A to A+ even when I put the max speed in HSFC and the limiter!
Before with just hsfc I had to run 2 or 3 mbit lower for up and down to get just A.
I also have 2 other limiters set lower, for guest clients unknown and known.

In shellcmd (also on filter change)
ipfw sched 1 config pipe 1 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50 && ipfw sched 2 config pipe 2 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50 && ipfw sched 3 config pipe 3 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50 && ipfw sched 4 config pipe 4 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50 && ipfw sched 5 config pipe 5 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50 && ipfw sched 6 config pipe 6 type fq_codel target 5 noecn quantum 300 limit 1000 interval 50

Harvy66

@tman222:

I think might be the exception here - I actually get (seemingly) slightly better performance when I have fq_codel applied to both upload and download on my fiber connection. That being said, the only way I have tested this to date has been through speed tests, and in particular the DSL Reports speed test to get an idea of bufferbloat. With fq_codel applied to the download side as well, my download during the test is a bit more stable, comes in slightly higher (bandwidth) and has lower average latency during the test. In fact, I have found the best performance so far for me has been by using a 940/940 limit on a gigabit FTTH connection, with a little bit more aggressive target of 3ms and interval of 60ms – the fq_codel defaults are 5ms and 100ms. This does limit the download and upload speed to about 915-920Mbit on my connection, but I'm willing to take a 3 - 3.5% hit on bandwidth to have average lower average latency and connection stability. One could argue that a on a gigabit Fiber connection all this really doesn't matter anyway since there are very few cases where the bandwidth is maxed out anyway, and that's probably true. But nonetheless I do like having these settings to ensure stability.

I see gigabit maxed out all of the time. Youtube, Netflix, and Hulu microburst their ~250KiB chunks at 1Gb/s. Packet sniff these TCP connections and I see back-to-back 1500 byte frames for about 2ms at a time. That's for steady state. I technically only have a 150Mb connection, but it's a 1Gb link that is policed to 150Mb. If I keep jumping around the video timelines, I can keep the video stream in a perma-buffering state, which attempts to send at full 1Gb/s. I can see 1Gb/s for a bout the first 100ms or so before the policer starts ramping up. That could represent a 100ms burst in latency if it was not for my ISP's AQM plus my HFSC shaping.

dennypage

https://github.com/pfsense/pfsense/pull/3941

And happiness ensued…

:)

gsmornot

@dennypage:

https://github.com/pfsense/pfsense/pull/3941

And happiness ensued…

:)

Fired up about it. Looking forward to this addition.

strangegopher

;D ;D ;D ;D

tman222

Great news!!!

I wonder if it's a good idea to clear the current fq_codel config before upgrading to 2.4.4 and rebuild it using the GUI after the upgrade? Otherwise, maybe things might not upgrade correctly? Does anyone have any thoughts on that?

Thanks in advance.

matt_

@gsmornot:

@dennypage:

https://github.com/pfsense/pfsense/pull/3941

And happiness ensued…

:)

Fired up about it. Looking forward to this addition.

Me too! I submitted that PR after working on it this week, actually got the idea from reading this thread. I'm testing it on my own right now; I have multi-LAN and multi-WAN, and I needed something to be able to classify traffic coming from WAN A / WAN B out LAN A/B and WAN B out the same LAN paths. Dummynet works awesome for this, and I've gone from a Bufferbloat score of C to A with the patch. If you want to load the patch, you just have to make a queue under the limiter, and assign it with a floating rule on the WAN interface (out direction).

Gripes about Traffic Shaper right now that the Limiter PR seems to help out with:

Not enough parameter configuration for FQ_CODEL on the GUI (default FQ_CODEL target is 5000 on my install)
Couldn't get traffic shaper (altq) to bind to LAGG interfaces
Couldn't get traffic shaper (altq) to work with a multi-LAN setup for download classification
PIE support

For those of you interested, here's a SS: https://i.imgur.com/N36gpXF.png

If anyone has any suggestions for changes or notices any problems, I'd be happy to factor them into my fork (and therefore the PR). Full disclosure, I am by no means an expert on dummynet/ipfw. I just have a lot of free time on my hands…

dennypage

Really appreciate the work Matt!