Playing with fq_codel in 2.4
-
@zwck What we saw with ecn on in fq_codel low upload speeds < 40mbit, was an occasional lockout issue where ecn'd flows seemed to cause too many drops of non-ecn'd flows. If you have a reasonable amount of upload bandwidth feel free to try enabling ecn there also. Go hog wild about turning it on on your tcp stacks also...
but do so on an informed basis.
The potential problems with ecn are vast... as is the potential benefit. I worry about overuse and mis-applications of it so much that we started a new (sadly unfunded) project and mailing list for it:
https://www.bufferbloat.net/projects/ecn-sane/wiki/
My position statement is here: https://www.bufferbloat.net/projects/ecn-sane/wiki/dtaht_ecn_editorial/ - and the related rant at the bottom one of my best and worth reading before you fiddle with ecn at all.
Our reasoning (in 2012) for only enabling it on inbound shaping was that more bw was available & the packet had already traversed the internet, so why drop it? At the point of congestion on outbound, though, I'd rather clear room immediately.
I note that I am presently in the minority as to grave concern over widespread ecn deployment - but willing to encourage others to try it on an informed basis.
-
@teh-g I can certainly see running out of cheap cpu at these speeds.
I have to note that usually it's the shaper (limiter) that accounts for 80% or more of the cpu cost of a fq_codel based qos system, and thus I don't get why folk are claiming here fairq + codelq eats less than fq_codel does unless that is effectively (?) mult-cored (?).
fq_codel adds a hash calculation and a ptr lookup and is almost immeasurably the same weight as codel alone (as, at least in linux, the hash often occurs elsewhere)
Elsewhere, I started work on a multi-core capable shaped fq_codel instance but ran out of money and time for now.
-
@gsmornot Is there a way to increase your token bucket size on the shaper at these speeds? BSD does not have high resolution timers.
-
@dtaht said in Playing with fq_codel in 2.4:
@teh-g I can certainly see running out of cheap cpu at these speeds.
I have to note that usually it's the shaper (limiter) that accounts for 80% or more of the cpu cost of a fq_codel based qos system, and thus I don't get why folk are claiming here fairq + codelq eats less than fq_codel does unless that is effectively (?) mult-cored (?).
fq_codel adds a hash calculation and a ptr lookup and is almost immeasurably the same weight as codel alone (as, at least in linux, the hash often occurs elsewhere)
Elsewhere, I started work on a multi-core capable shaped fq_codel instance but ran out of money and time for now.
Thanks for the info. My little Celeron J3455 just isn't built for fq_codel at gigabit down speeds. Luckily I don't run into bufferbloat issues too much, since download seems to be impacted most by it, and it isn't too frequent that I cap out the line with one gigabit.
A multi-core capable fq_codel would be great, sad to hear that money and time were lacking. I'm only pegging a core, so it is definitely a limitation there.
-
Always have funding problems. Used to it. What happens usually is someone with more time and anger tackles it for us. :) https://github.com/dtaht/fq_codel_fast is where I am with it presently. Single core it's about 5% faster so far. :(
I think this thread has established that 1.6 ghz arm and low end atoms bottleneck on a single core for inbound shaping on this OS at around 500Mbit? Higher end boxes are fine? I do think tuning the shaper might help some ( see for example: https://github.com/tohojo/sqm-scripts/issues/71 and the related pull request - but this is for linux, not bsd)
and the bulk of the bloat problem is usually outbound and at lower speeds, so the bulk of y'all can solve half the problem at least.
-
I would expect the traffic would respect the CPU affinity of the packet when it's received at the kernel level, over into the dummynet code. That may be why single cores are pinned -- there's only one TCP stream, thusly one core involved in the transport? Multiple flows may scale better. In this scenario, making FQ_CODEL multi-threaded would work for single flows well of course.
All this thinking gets me anxious to factor CAKE into dummynet, upstream in BSD @dtaht ... I'm no expert on C, but maybe there is a possibility if it's not too buried in Linux only features. Why stop with FQ_CODEL right? :)
-
As an update to my previous post concerning
Unable to configure flowset, flowset busy!
, I was able to determine the following from the FreeBSD source with some level of confidence:- The error is related to the re-configuring of the selected AQM.
- The error only occurs when you attempt to explicitly configure an AQM on a
queue
(akaflowset
) that is currently passing traffic, hence the intermittent messages. - The error only indicates your AQM setting (and its related tune-ables) failed to change as instructed, but everything else should have saved fine.
- As Reddit users have pointed out, the error may recur on its own for no reason due to the filter resync job. Besides the log spam (sorry), it isn't affecting anything since you're not changing settings at that time.
As mentioned in my last post, the patch's commands do explicitly reconfigure the AQM, via the use of the directives:
codel
,droptail
,red
, orgred
. They are always present/generated by the patch, to ensure consistency. Unfortunately, dummynet has a limitation where it cannot "hot-swap" the AQM as I anticipated it could.I have tried configuring the assigned
sched
/pipe
# to -1 first prior to the currently buggedqueue
configuration command, but this produces errors of its own (hah!). My thinking there was to disconnect the parent pipe, so the queue has nothing to complain about. I have also tried explicitly deleting any existingqueue
prior, but if a pipe does not exist, these commands fail and it breaks the rest of therules.limiter
execution. We could check for the existence of a queue first, and delete it if one already exists. Keep in mind, that these commands get run at a regular interval, so this could cause packet loss every 15 minutes if we're not careful...This is a tricky one. But, I'll keep trying to get it error-less and working.
-
@mattund said in Playing with fq_codel in 2.4:
[...] these commands get run at a regular interval, so this could cause packet loss every 15 minutes if we're not careful...
I'm trying to wrap my head around this thread (521 posts and growing, it's a real doozy...)
Just to clarify, you are talking theoretically about the filter reload causing packet loss every ~15m right? I took a stab at FQ_CODEL a couple of days ago and it caused me a lot of misery. I think mainly because I have a fairly complex multiwan config with lots of outbound NAT rules, VPNs, and policy-based routes. I think I have a fundamental misunderstanding of how the floating rules interact with everything else.
But, long story short - if we are just using the standard GUI to configure limiters, without any other patches, there is no packet loss issue, right?
-
I don't think there is any packet loss issue presently. However, some of the potential solutions I'm imagining for the "Unable to configure flowset, flowset busy!" problem could cause, say, 1 packet to be dropped if we disconnect or delete the receiving pipe on a limiter to avoid the error being printed in console.
As long as the error messages we're seeing are pallet-able, all one needs to do at this time is stop any traffic flowing through a queue (disable its firewall rule), save the new AQM configuration, then plug it all back together. Besides that, I'm not aware of other problems as far as packet delivery or reliability are concerned.
When it comes to floating rules, my rule of thumb (no pun intended...) has been to treat the FQ_CODEL/Limiter stuff as a separate rule, all on its own, and any other floating rules can remain the same. So long as it's matched, it'll pick up the configuration and carry it through the rest of the further rule processing, so as to make it a "drop-in" setup on an existing configuration without messing with too much behavior-wise.
-
Thanks @mattund. I plan to try again this weekend. Is there any workaround for
traceroute
becoming brain dead when FQ_CODEL is on? Unfortunately I really need working traceroute for $DAYJOB. -
Not a UDP-based traceroute unfortunately, unless you want to avoid shaping UDP which, for me at least, is a deal-breaker (I'd rather deal with a potato'd traceroute). I sacrificed ICMP shaping and avoided shaping that protocol to fix Windows traceroutes. I actually don't know if anyone's identified why this happens yet, that might be the first step to getting a workaround.
Or optionally, we could all settle and choose to live in an imaginary world where hosts are all one hop away, but refuse to accept our packets until an arbitrary TTL is reached. Sounds funny, but doesn't seem so fun...
-
@mattund said in Playing with fq_codel in 2.4:
Not a UDP-based traceroute unfortunately
I tried the other protocols available for traceroute, which on my Mac are:
UDP
TCP
GRE
orICMP
. Sadly, none of them besides UDP gave any meaningful results. And this is without any shaping enabled. Just vanilla.TCP just flat out doesn't work (timeouts across the board).
GRE worked a little, but always got blocked inevitably at one of the hops along the way. So I could never get a trace to complete using GRE.
ICMP always just immediately completed and told me that whatever I was tracing to was 1 hop away.
-
@mattund Sorry Matt, that i ask again, is this post from earlier in this thread still accurate https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/339 if not can you recommend some settings. in a screenshot form. That would be lovely
-
To workaround the traceroute issue, I just applied the in/out pipe on the LAN interface instead.
This won't work for larger setups with multiple LAN interfaces though.
-
I am still using that configuration presently
-
@muppet I don't think that would work if you have squid set up either right? Because the source for the proxied traffic would not be LAN... ?
-
@muppet I have my system with multiple vlans going through wan or openvpn connections being limited by the same limiters, and even prioritizing works.
-
@mattund Thanks (also out direction, wan interface in/out = up and down and not reversed like the description states it?)
-
@zwck said in Playing with fq_codel in 2.4:
@mattund Thanks (also out direction, wan interface in/out = up and down and not reversed like the description states it?)
The way to make sure it is working the right way is to setup arbitrarily low levels with different rates, say 5 and 10 Mbit/s. That way, you can figure out immediately if the order is swapped by doing a speedtest. Once you are sure for one rule, you can replicate the order for all other rules.
-
My apologies for the delay in responding. Thankfully it looks like most of the questions were answered by others in the meantime. :) Just a few more thoughts from me, FWIW:
-
Regarding ECN: I have it enabled in both up and down directions, and have had it on ever since I started using fq_codel a few pfSense releases ago. I have not experienced any adverse performance related to its use so it has not been a parameter I have spent a lot of time with. I saw that @dtaht already shared his thoughts on this topic, and I have also found this link on fq_codel tuning on the Bufferbloat.net site helpful: https://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchmarking_Codel_and_FQ_Codel/
-
Regarding Masks: I used to have these enabled before, but after watching the Netgate YouTube video on fq_codel and reading the description in the GUI, I decided to leave them disabled when I reenabled fq_codel in 2.4.4. There has been no difference in behavior or performance that I have noticed. In my case, I'm interested in limiting bandwidth by interface rather than by host. Plus, unless I"m misunderstanding, the FQ (or fair queuing) part of fq_codel should help ensure that all hosts get fair access (unless of course one host opens a ton of connections relative to the others, but this is unlikely in my current use case). That being said, one way to test this out might be to set the limiters arbitrarily low (i.e. lower than the limits of your internet connection). Then run two sets of speed tests with two hosts on the same network segment, one test with the masks and enabled and one test without. In the former case I would expect 2x the limit as throughput (since it is applied per source/destination IP), in the latter case it should be just 1x since it is the limit for the entire interface. Now, I have not actually ran this test, so I could be wrong, but it seems worth trying to better understand how masks work.
-
Performance: I have a symmetric 1Gbit fiber interconnection and with fq_codel enabled using 950Mbit limiters, I'm able to pass around 920-930Mbit in each direction while keeping latencies and bufferbloat in check. This is on a 2.2Ghz quad core Xeon D based system. For those of you experiencing performance issues, it might be worth trying to emulate the fq_codel by using ALTQ traffic shaping. All one has to do is enabled the FAIRQ traffic shaper and then enable Codel on the default queue. I used this setup for a little while before switching to fq_codel and performance was similar for me (in fact there might be a few notes comparing the two further up in the thread).
A Couple Other Thoughts:
-
One parameter I have had to increase (much to my surprise was) the Limit parameter. I've decided to double this from the default 10240 to 20480 because I was seeing enqueing errors in the system log. There are two main reason why I believe this was necessary: a) I have 10Gbit LAN hosts sending data into a 1Gbit WAN link (so this becomes the choke point) and b) I also enabled the TCP BBR congestion control on my 10Gbit Linux hosts. I really like BBR because it does seem to improve upload performance, especially over long distances, but it is a pretty aggressive algorithm so I could see how it might possibly overwhelm the default queue size.
-
From what I can tell, MacOS X now actually has fq_codel enabled by default as part of the operating system. I ran a speedtest on an iMac running the latest Mac OS X release (10.13.6) with fq_codel disabled on pfsense, and sure enough, bufferbloat and latencies stayed under control. Here is some additional discussion on this topic on the Bufferbloat mailing list: https://lists.bufferbloat.net/pipermail/bloat/2018-September/008615.html
-