CoDel - How to use

kieranc

Given that the 2.2.4 'correct' settings seem to give worse results than the 'incorrect' 2.2.3 ones (for me at least), it seems that we need the ability to tune both interval and target in order to make codel useful for everyone.
I'm guessing it would be complicated to add an extra field to the traffic shaper setup page, but since the queue limit field is currently being reused to set the target, could we add some logic to use it to set both target and interval? if the field contains a single integer, use it as a target, if it contains something else (t5i100? 5:100?) then use it as target and interval?
It's a bit beyond my skill level but it seems like it should be possible in theory, or is there a better way to do it?

edit: can we just set the interval and derive the target from that, if it's easier?

Nullity

@kieranc:

Given that the 2.2.4 'correct' settings seem to give worse results than the 'incorrect' 2.2.3 ones (for me at least), it seems that we need the ability to tune both interval and target in order to make codel useful for everyone.
I'm guessing it would be complicated to add an extra field to the traffic shaper setup page, but since the queue limit field is currently being reused to set the target, could we add some logic to use it to set both target and interval? if the field contains a single integer, use it as a target, if it contains something else (t5i100? 5:100?) then use it as target and interval?
It's a bit beyond my skill level but it seems like it should be possible in theory, or is there a better way to do it?

edit: can we just set the interval and derive the target from that, if it's easier?

Those who created the codel algorithm are the one's who dictate that, and I presume they chose wisely. Target is dynamic anyway, I think.

Though, if you choose CoDel as the primary/parent scheduler, then you can choose whatever interval/target you want, via command-line, at least for temporary testing purposes.
Our problem is that we cannot customize CoDel's parameters when it is a sub-discipline aka "Codel Active Queue" checkbox.

Whether to expose the CoDel params in the GUI or not… if it is anything like the HFSC params, people will needlessly tweak them with unforeseen consequences simply because they are there. I dunno... maybe we can use the System Tunables tab and add custom params and values that way?

Harvy66

Like Nullity mentioned, the default values were chosen because they work best for the bulk of users. They are a one size fits all that are not optimal for all users, but is still better than FIFO.

The optimal interval should be the lowest value that covers the bulk of the RTTs of your flows. For me the default 100ms is my latency to London and way too large for many of the 10ms-20ms servers that I communicate with, until they add the ability to change the interval, I can't complain that they're choosing the recommended defaults. But I will post some results of before and after once I'm allowed to upgrade.

Harvy66

I just realized that I should mention that I am using HFSC with CoDel is a child discipline.

PFSense 2.2.3 - this is actually a fairly typical graph, so I didn't run it more than once

PFSense 2.2.4 - I ran this test a few times, they all pretty much showed the same

I see no real difference.

Nullity

@Harvy66:

I just realized that I should mention that I am using HFSC with CoDel is a child discipline.

PFSense 2.2.3 - this is actually a fairly typical graph, so I didn't run it more than once

PFSense 2.2.4 - I ran this test a few times, they all pretty much showed the same

I see no real difference.

Did you make sure that the values were different? I thought we stilll did not know how to display the live values of CoDel's params when it is a sub-discipline.

Maybe we should attempt some real testing though. Local tests mesuring small time-spans, probably kernel timer debugging level of granularity. That DSLreports test is great for introducing regular folks to bufferbloat, but it is not accurate. HTML5 within a browser just cannot perform well enough, especially with your 10ms-latency connection as the test.

I tried to set ALTQ to debugging state on pfSense then FreeBSD but I never got far. That is my best guess for a way to measure CoDel with enough accuracy to be useful. Any ideas?

Harvy66

@Nullity:

@Harvy66:

I just realized that I should mention that I am using HFSC with CoDel is a child discipline.

PFSense 2.2.3 - this is actually a fairly typical graph, so I didn't run it more than once

Removed images

PFSense 2.2.4 - I ran this test a few times, they all pretty much showed the same

Removed images

I see no real difference.

Did you make sure that the values were different? I thought we stilll did not know how to display the live values of CoDel's params when it is a sub-discipline.

Maybe we should attempt some real testing though. Local tests mesuring small time-spans, probably kernel timer debugging level of granularity. That DSLreports test is great for introducing regular folks to bufferbloat, but it is not accurate. HTML5 within a browser just cannot perform well enough, especially with your 10ms-latency connection as the test.

I tried to set ALTQ to debugging state on pfSense then FreeBSD but I never got far. That is my best guess for a way to measure CoDel with enough accuracy to be useful. Any ideas?

Like you said, last time I tried to check the values as a child discipline, it didn't show them. I figured if the defaults were changed for the scheduler, it may have affected these as well, but I have a hard time telling.

If I bypass PFSense and go strait to the Internet, I get a very distinctive bufferbloat on the DSLReports tests.

You can also see with this that if I change from 16 streams to 32 streams, it starts to tax the simple CoDel algorithm's ability to smooth out latency.

This is what my connection looks like when I bypass PFSense, which means no CoDel. Same 16 streams.

It's hard to see because of the large range, but the base idle ping is 17ms avg, the download is 27ms avg, and the upload is 44ms avg. That is distinctly different than what I get through PFSense with HFSC+CoDel, which was +2ms over idle instead of +10-27ms over idle. A magnitude difference.

As you can tell, my ISP has horrible bufferbloat /sarc 30ms, most horrible.

I'm not sure the best way to measure the affects of CoDel as a simple test. You'd probably need to use something to load a rate limited connection(not line rate), like iperf, then doing pings. My expectation is there should be a measurable difference between avg ping, and std-dev of ping.

You may need to be careful doing this test on a LAN where the latency can be measured in microseconds. All TCP implementations have a minimum 2 segments sent when streaming data. 0.1ms ping at 1500byte segments sizes puts a lower limit of 120Mb/s. All known TCP congestion algorithms will not backoff below this speed for that latency. That's per stream. If you have 8 streams, that's 960Mb/s.

0.1ms is actually a high latency for a LAN. I measure as low as 0.014ms using a high resolution ping program, but my switch is rated for 2.3 microseconds. It really depends on how often the kernel scheduling.

Nullity

@Harvy66:

I'm not sure the best way to measure the affects of CoDel as a simple test. You'd probably need to use something to load a rate limited connection(not line rate), like iperf, then doing pings. My expectation is there should be a measurable difference between avg ping, and std-dev of ping.

You may need to be careful doing this test on a LAN where the latency can be measured in microseconds. All TCP implementations have a minimum 2 segments sent when streaming data. 0.1ms ping at 1500byte segments sizes puts a lower limit of 120Mb/s. All known TCP congestion algorithms will not backoff below this speed for that latency. That's per stream. If you have 8 streams, that's 960Mb/s.

0.1ms is actually a high latency for a LAN. I measure as low as 0.014ms using a high resolution ping program, but my switch is rated for 2.3 microseconds. It really depends on how often the kernel scheduling.

Network measurements are not what we want. We only care about the time before packets are on the wire. All CoDel controls is the scheduling of the local buffers, so that is what we would want to measure, right?

CoDel has relatively zero influence over a packet's latency once the packet hits the wire.

Harvy66

I was thinking of a simple test that indirectly measures CoDel by the characteristics of the network. If you want a more direct measurement, one would need to implement some wrapper code that acts like the network stack and OS and simulates the network pushing packets through. More direct, much more work.

mifronte

I just got fiber to my home (pfSense WAN plugged into an optical network terminal/modem) with symmetrical gigabit service.

With the default pfSense install where there was no traffic shaping, I was receiving an F for BufferBloat and my speeds were only in the 600's Mbps on the DSLReports speed tests. I found this thread and started to play with the codelq.

So far, I have found that just enabling codelq without specifying any other values works great. It got me from an F rating to A for BufferBloat. I also played with the Queue Limit (target) by specifying 5 (pfSense defaults to 50). The value of 5 resulted in more packet drops but no performance gain. I also noticed that I get more BufferBloat on my downloads so I specified a bandwidth of 980Mbps on my LAN interface. This seems to give me the best results. In my speed tests, I see spikes above 1000Mbps and so I think my ISP fiber connection is faster than my gigabit LAN and that is why I needed to limit my LAN bandwidth to 980Mbps to get better overall performance.

I am just running codelq on the WAN and LAN with no sub-queues.

| | |

Harvy66

@mifronte:

I see spikes above 1000Mbps and so I think my ISP fiber connection is faster than my gigabit LAN

You shouldn't be able to see bursts above your LAN speed because that would require transferring data faster than your LAN. There are ways things may seem to burst, but it's probably just timing issues. The overall average should be fairly accurate.

Cake will be awesome ones it comes out. I plan on dropping HFSC+codel and just using Cake.

tuffcalc

@mifronte:

I just got fiber to my home (pfSense WAN plugged into an optical network terminal/modem) with symmetrical gigabit service.

With the default pfSense install where there was no traffic shaping, I was receiving an F for BufferBloat and my speeds were only in the 600's Mbps on the DSLReports speed tests. I found this thread and started to play with the codelq.

So far, I have found that just enabling codelq without specifying any other values works great. It got me from an F rating to A for BufferBloat. I also played with the Queue Limit (target) by specifying 5 (pfSense defaults to 50). The value of 5 resulted in more packet drops but no performance gain. I also noticed that I get more BufferBloat on my downloads so I specified a bandwidth of 980Mbps on my LAN interface. This seems to give me the best results. In my speed tests, I see spikes above 1000Mbps and so I think my ISP fiber connection is faster than my gigabit LAN and that is why I needed to limit my LAN bandwidth to 980Mbps to get better overall performance.

I am just running codelq on the WAN and LAN with no sub-queues.

| | |

Based on my (many) tests - I think this is working properly for you because your ports are physically limited to 1Gbps. If your internet upload speed is lower than your port speed, you need (in my experience) to limit your bandwidth speed in the traffic shaper to 95% of your upload speed for codel to properly work.

Thanks for posting this!

Nullity

So, 2.3's CODELQ (and assumedly the sub-discipline "CoDel Active queue" check-box version) have a CoDel with proper "interval" & "target" values (rather than reversed and/or wrong);```
[2.3-RELEASE][admin@pfsense.wlan]/root: pfctl -vsq | grep codel
altq on pppoe1 codel( target 5 interval 100 ) bandwidth 640Kb tbrsize 1492


Hmm… anyone got some stats to share?

Harvy66

What kind of stats? I did a DSLReport speedtest after 2.3 and the bufferbloat part looks the same. Still A+.

kpa

In 2.3 it seems you have to enter a bandwidth value to enable CODELQ. Just to verify that I've understood correctly the information, I have a DSL connection with theoretical maximum of 24Mbs/2Mbs down/up and that is in actuality something like 20Mbs/1.4Mbs down/up. Based on what I read here I should set the WAN bandwidth to about 95% of that 1.4Mbs, is that right?

Harvy66

Correct. Good that it forces you to fill out your bandwidth, because it's useless if you just forward the packets at max rate.

vesikk

I'm trying to enter 5.3 Mbps as the bandwidth value but I get a popup box saying "please enter a valid value. the two nearest values are 5 and 6" please help.

bodosom

@vesikk:

I'm trying to enter 5.3 Mbps …

Change the multiplier from Mbit/s to Kbit/s.

bodosom

I have a nominal 30/5 link that runs at ~ 40/6.
I've tried a variety of bandwidth settings on the WAN link and I still get a C-D grades on the DSLreports "bufferbloat" grade and see high echo RTT when the link is loaded.

What settings result in a DSLreports A grade?

My previous experience was with Linux and fq_codel which worked perfectly.

kpa

Test your upstream speed a few times without any shaping and set the upstream bandwidth in the CODELQ settings to about 90-95% of the average value you got from the tests.

Nullity

@bodosom:

I have a nominal 30/5 link that runs at ~ 40/6.
I've tried a variety of bandwidth settings on the WAN link and I still get a C-D grades on the DSLreports "bufferbloat" grade and see high echo RTT when the link is loaded.

What settings result in a DSLreports A grade?

My previous experience was with Linux and fq_codel which worked perfectly.

As the other poster said, 90-95% is a safe choice. You can go even higher if your connection is stable.
You might try doing your own tests, since they will likely be more accurate. Simply upload a file to an ftp (or any other service) and check the Quality graph in the Monitoring section of the pfSense GUI to see what your latency was during upload saturation.

(FYI, fq_codel/codel only helps upload.)