CoDel - How to use
-
I have been experimenting with my home router which is running DD-WRT and enabled HTB along with FQ_CoDel. My latency under load has improved from 300ms+ to just under 20ms+ as measured by pinging Google DNS. The connection is a Frontier FIOS 25M/25M. My throughput as verified by speedtest.net are good whether I am using a Seattle based server or one in Atlanta. I thought that was great and looked into turning CoDel on in pfSense at work.
We have two pfSense firewalls both in the Seattle area. One is sitting on a 50M/50M Comcast EDI circuit. The other one is sitting on a collocated burstable gigabit circuit that we shape to 100M/100M. Both use a simple PRIQ shaper with 500 packet queue limit, that's roughly a max queue delay of 114ms for 50M and 57ms for 100M. I have found that the 500 packet queue limit offers the best throughput performance with least drops and this has worked for us for a few years now.
For most general traffic I really don't care that it queues up and gets delayed, but less intensive flows should not get delayed. Which I see CoDel doing on my home router. pfSsense was able to do the same on our work connections. However, something weird is going on when I do speed tests from the pfSense boxes. When I choose a local Seattle server with an RTT of under 10ms, both connections approach close to their shaped throughput. When I choose a server in Atlanta or Miami where the RTT is around 80ms, suddenly the upload throughput is 25-50% less, and it takes longer to get up to that throughput. Download throughput is never affected even though CoDel is enabled on the LAN queues and I verified that it is working via ping. If I turn off CoDel AQM, the upload for high RTT servers goes back to roughly shaped throughput.
Can anyone explain why the longer RTT is causing issues for upload on pfSsense using CoDel. Why does DD-WRT not experience this issue?
-
Are you using the same computer when doing speed tests? I've seen large variations in speed tests between computers. Too bad PFSense doesn't have fq_codel yet, but there are some other awesome changes in the pipeline.
-
Are you using the same computer when doing speed tests? I've seen large variations in speed tests between computers. Too bad PFSense doesn't have fq_codel yet, but there are some other awesome changes in the pipeline.
I am using Windows Server 2008 R2 when testing the work connections. I am using Mac OS X 10.10.2 and Windows 8.1 Pro at home. I'll see if Windows 8.1 Pro makes a difference on the office connection tomorrow, though I doubt it.
I am thinking that the queue length has something to do with it on pfSense.
-
Are you using the same computer when doing speed tests? I've seen large variations in speed tests between computers. Too bad PFSense doesn't have fq_codel yet, but there are some other awesome changes in the pipeline.
I am using Windows Server 2008 R2 when testing the work connections. I am using Mac OS X 10.10.2 and Windows 8.1 Pro at home. I'll see if Windows 8.1 Pro makes a difference on the office connection tomorrow, though I doubt it.
I am thinking that the queue length has something to do with it on pfSense.
If stream fairness is your goal then FAIRQ might be a better choice. There is a picture somewhere showing graphs of fq_codel, codel, SFQ (stochastic fair queue) and the latency changes when each algorithm is dealing with dozens of simultaneous streams. SFQ was a close 2nd behind fq_codel for best latency. FAIRQ is very similar to SFQ (both give each stream a hash then iterate through them round-robin style).
Codel is lacking "fair queueing" (there are many papers on this topic) so it does poorly with multiple streams, unlike fq_codel.
-
Are you using the same computer when doing speed tests? I've seen large variations in speed tests between computers. Too bad PFSense doesn't have fq_codel yet, but there are some other awesome changes in the pipeline.
I am using Windows Server 2008 R2 when testing the work connections. I am using Mac OS X 10.10.2 and Windows 8.1 Pro at home. I'll see if Windows 8.1 Pro makes a difference on the office connection tomorrow, though I doubt it.
I am thinking that the queue length has something to do with it on pfSense.
There are a few things at play.
- If your queue is too small, it will drop packets too aggressively. You can look at queue statistics to find out if there are any drops happening.
- If your queue is too large and are not using something like Codel with time based dropping, your bandwidth can also be made less efficient
- My personal most common reason for poor upload speeds is the TCP stack of the OS I'm using
Windows 202 R2 is the Win7 kernel, and Win7 defaults to some latency sensitive TCP congestion control. This may not be the same for the server edition, but when I switched to using CTCP, my upload bandwidth to higher latency targets increased substantially. Win8 of all versions default to CTCP.
And never assume two similar machines would get the same performance. I had two identical computers, exactly the same hardware, both with a fresh install of Win7, and one was over 50% faster than the other for speed tests. The only thing I could think of that would make the difference was heuristics. Win7 tries to be "smart" about certain things, which can cause it to get confused. A quick trip into the registry to change some settings and a reboot and both systems were getting identical speedtests.
So even freshly installed identical hardware can get large variations. ALWAYS test using the same machine. Or use an OS that doesn't suck. Freaking Windows.
-
Are you using the same computer when doing speed tests? I've seen large variations in speed tests between computers. Too bad PFSense doesn't have fq_codel yet, but there are some other awesome changes in the pipeline.
I am using Windows Server 2008 R2 when testing the work connections. I am using Mac OS X 10.10.2 and Windows 8.1 Pro at home. I'll see if Windows 8.1 Pro makes a difference on the office connection tomorrow, though I doubt it.
I am thinking that the queue length has something to do with it on pfSense.
There are a few things at play.
- If your queue is too small, it will drop packets too aggressively. You can look at queue statistics to find out if there are any drops happening.
- If your queue is too large and are not using something like Codel with time based dropping, your bandwidth can also be made less efficient
- My personal most common reason for poor upload speeds is the TCP stack of the OS I'm using
Windows 202 R2 is the Win7 kernel, and Win7 defaults to some latency sensitive TCP congestion control. This may not be the same for the server edition, but when I switched to using CTCP, my upload bandwidth to higher latency targets increased substantially. Win8 of all versions default to CTCP.
And never assume two similar machines would get the same performance. I had two identical computers, exactly the same hardware, both with a fresh install of Win7, and one was over 50% faster than the other for speed tests. The only thing I could think of that would make the difference was heuristics. Win7 tries to be "smart" about certain things, which can cause it to get confused. A quick trip into the registry to change some settings and a reboot and both systems were getting identical speedtests.
So even freshly installed identical hardware can get large variations. ALWAYS test using the same machine. Or use an OS that doesn't suck. Freaking Windows.
You are right about the TCP stack differences. Windows 2008 R2 copes much worse than Windows 8.1 or Windows 2012 R2. Though even in Windows 8.1 where I checked that CTCP is enabled, I am getting an average upload of 40-45Mbits where it should be 48-49Mbits. So I guess CoDel is just not worth it, at least how its implemented on pfSense. Maybe they'll fix it when they do FQ_CoDel.
-
PFSense only implements the original Codel which has a large buffer length and has a target latency of 5ms. This allows it to do well if lots of small or large packets come through at the same time. One of the big issues with buffer bloat is if you buffer is too small you can drop small packets, but if your buffer is too large, then large packets cause too much back-log.
fq_Codel extends this to include "fair" queuing which breaks up data flows into hash buckets and does a mixture of prioritizing packets arriving into empty buckets and dequeing back-logged buckets equally. Codel is still pretty much the best option for now. Set and forget.
-
Some have said CoDel is not a traffic shaper. This is confusing because CoDel drops packets to keep the buffers in check. Dropped TCP packets result in a throttling effect.
Perhaps I am confusing a "traffic shaper" with a "traffic policer".
http://www.cisco.com/c/en/us/support/docs/quality-of-service-qos/qos-policing/19645-policevsshape.htmlCoDel is one of the 2 though, right?
I am confused. :o
-
To oversimplify it quite a bit:
Shaping can delay sending traffic (as well as drop) to smooth out usage, whereas policing simply lops off anything over the max rate and chucks it in the bit bucket.
Shaping typically employs queues as well as the occasional drop, whereas policing just says "nope" and drops it hard if it crosses the high rate.
Policing is very harsh, if you have ever had to deal with a circuit that had traffic policing, you know that both ends MUST have the same policing set or it's a nightmare of dropped packets. I haven't personally seen a circuit with traffic policing in probably 10 yrs or so, thankfully.
-
Some have said CoDel is not a traffic shaper. This is confusing because CoDel drops packets to keep the buffers in check. Dropped TCP packets result in a throttling effect.
Perhaps I am confusing a "traffic shaper" with a "traffic policer".
http://www.cisco.com/c/en/us/support/docs/quality-of-service-qos/qos-policing/19645-policevsshape.htmlCoDel is one of the 2 though, right?
I am confused. :o
Traffics shapers do not drop packets, they dequeue packets queues at specified rates. It's the queue's drop packets, but the traffic shaper's job to decide which queue and when.
-
Some have said CoDel is not a traffic shaper. This is confusing because CoDel drops packets to keep the buffers in check. Dropped TCP packets result in a throttling effect.
Perhaps I am confusing a "traffic shaper" with a "traffic policer".
http://www.cisco.com/c/en/us/support/docs/quality-of-service-qos/qos-policing/19645-policevsshape.htmlCoDel is one of the 2 though, right?
I am confused. :o
Traffics shapers do not drop packets, they dequeue packets queues at specified rates. It's the queue's drop packets, but the traffic shaper's job to decide which queue and when.
I think I understand what you are saying, but he post above you and the Cisco link both say that shapers drop packets. :o
-
Shaping can drop but only by way of it dropping out of a queue. It still had to be queued, possibly delayed, etc.
The only action of Policing is to drop, no queue.
-
Perhaps it is my confusion between incoming and outgoing egress. CoDel throttles (shapes?) incoming egress TCP streams based on queueing delay, but this queueing delay is controlled by outgoing egress speeds, which are controlled by the traffic-shaper.
I should probably just head back to the books… :-X
:D
Edit: I am referring to WAN interface.
-
Codel is just a regular queue. Just like when the default queue gets full, it drops packets. The difference is the default queue does tail drops and does abrupt drops once full. Codel does head drops and defines full not as a number of packets but how long a packet was in the queue, even then, it doesn't do abrupt drops does does ever increasing rates of drops.
It is impossible to have a network interface without a queue, even if it's a queue of one. The whole point of a queue is to buffer packets. Codel does so in a way that reduces buffer bloat while allowing high throughput relative to the default fixed-size tail-drop that has been around for decades.
When writing multi-threaded code, you use queues a lot because synchronizing threads is expensive and you rarely have two threads that process data at the same rate. You need to buffer that data somewhere. Queues!
-
I think I get it.
Part of my confusion stemmed from when I tested CoDel, it caused my upload/download to drop to ~75% of my maximum bitrate and the throughput was unsteady. I never experienced this problem with "regular" queues. This caused me to assume that CoDel was doing something extra (shaping) to keep my queueuing delay low. Without CoDel, I achieved the bitrate assigned to the interface.
I now realize that CoDel should not have acted that way. I will need to revisit CoDel and see if I get the same results again.
My real-world internet speeds are 6.34Mb/666Kb.
-
I think I get it.
Part of my confusion stemmed from when I tested CoDel, it caused my upload/download to drop to ~75% of my maximum bitrate and the throughput was unsteady. I never experienced this problem with "regular" queues. This caused me to assume that CoDel was doing something extra (shaping) to keep my queueuing delay low. Without CoDel, I achieved the bitrate assigned to the interface.
I now realize that CoDel should not have acted that way. I will need to revisit CoDel and see if I get the same results again.
My real-world internet speeds are 6.34Mb/666Kb.
Did you set an upload bandwidth limit? Set it at 95% if 666Kb and have another run at it.
-
I think I get it.
Part of my confusion stemmed from when I tested CoDel, it caused my upload/download to drop to ~75% of my maximum bitrate and the throughput was unsteady. I never experienced this problem with "regular" queues. This caused me to assume that CoDel was doing something extra (shaping) to keep my queueuing delay low. Without CoDel, I achieved the bitrate assigned to the interface.
I now realize that CoDel should not have acted that way. I will need to revisit CoDel and see if I get the same results again.
My real-world internet speeds are 6.34Mb/666Kb.
Did you set an upload bandwidth limit? Set it at 95% if 666Kb and have another run at it.
I did. I usually set to less than 600Kbit. CoDel's official site states that <768Kbit connections are troublesome.
Though, the reason for my download falling from ~730kB/sec without CoDel to ~500kB/sec with CoDel is still unknown to me. I had this type of result numerous times.
I may have misconfigured something back when I tested. Hopefully that explains it… :)
I used the CODELQ setup, not the "Codel Active Queue" check-box.
-
Codel uses a target of 5ms, which at 768Kb/s is only 480bytes. This means a single 1500byte packet will cause Codel to want to start dropping packets. On my 100Mb connection, 5ms is 62,500 bytes, which is nearly 42 1500 byte packets. A single 1500 byte packet is 5ms at 2.4Mb/s. May be best to recommend Codel to be only implemented on 3Mb/s+ connections. fq_Codel probably wouldn't fair much better for bandwidth utilization, but would do better for not dropping small packets that immediately followed a 1500byte packet.
I think I remember reading that 10Mb+ is recommended for Codel, but I'm not sure if that was an official value or just an easy to remember number given as a rule of thumb.
Anyway, 1500 bytes is way to large for slow connections when latency is an issue.
edit: I think the 10Mb comment was in reference that 5ms is not optimal for connections below 10Mb, but not to say it won't work. I assume there is a lower bound where Codel is definitely not good, like the 657Kb someone else said they read or the 2.4Mb/s rate required to transmit 1500 bytes in 5ms.
-
I just tried the "Codel Active Queue" on my outgoing bulk HFSC queue and it worked like a charm. Dropped my average queue size from ~30 to ~1 and dropped my ping from ~600ms to ~50ms during a single stream upload.
Now I need to test out FAIRQ with Codel check-marked to see how that setup deals with multiple concurrent streams.
-
Hmm… I just ran into an unexpected negative side-effect of CoDel. I knew it had some drawbacks. ;)
I check-marked Codel on my WAN HFSC qBulk queue, that was configured to have an increased worst-case delay with link-share [0Kb, 25, 300Kb], so other packets would be prioritized. It also had a queue limit of 500, so that if things got bad, it could just queue up packets and let the delay climb, but. CoDel won't allow that, because it keeps the packet queueing delay at 5ms… right?
Without much consideration, I excitedly chose to decrease the delay of the my greediest queue to resultingly decrease the delay of all other queues. Dumb... I think I should have taken the more direct route of exclusively decreasing the delay of the non-qBulk queues, leaving qBulk to become backlogged and delayed as it increasingly yields to packets with more priority.
Small queues are not always the answer, apparently. :)