Playing with fq_codel in 2.4

mattund

I don't think there is any packet loss issue presently. However, some of the potential solutions I'm imagining for the "Unable to configure flowset, flowset busy!" problem could cause, say, 1 packet to be dropped if we disconnect or delete the receiving pipe on a limiter to avoid the error being printed in console.

As long as the error messages we're seeing are pallet-able, all one needs to do at this time is stop any traffic flowing through a queue (disable its firewall rule), save the new AQM configuration, then plug it all back together. Besides that, I'm not aware of other problems as far as packet delivery or reliability are concerned.

When it comes to floating rules, my rule of thumb (no pun intended...) has been to treat the FQ_CODEL/Limiter stuff as a separate rule, all on its own, and any other floating rules can remain the same. So long as it's matched, it'll pick up the configuration and carry it through the rest of the further rule processing, so as to make it a "drop-in" setup on an existing configuration without messing with too much behavior-wise.

luckman212

Thanks @mattund. I plan to try again this weekend. Is there any workaround for traceroute becoming brain dead when FQ_CODEL is on? Unfortunately I really need working traceroute for $DAYJOB.

mattund

@luckman212

Not a UDP-based traceroute unfortunately, unless you want to avoid shaping UDP which, for me at least, is a deal-breaker (I'd rather deal with a potato'd traceroute). I sacrificed ICMP shaping and avoided shaping that protocol to fix Windows traceroutes. I actually don't know if anyone's identified why this happens yet, that might be the first step to getting a workaround.

Or optionally, we could all settle and choose to live in an imaginary world where hosts are all one hop away, but refuse to accept our packets until an arbitrary TTL is reached. Sounds funny, but doesn't seem so fun...

luckman212

@mattund said in Playing with fq_codel in 2.4:

Not a UDP-based traceroute unfortunately

I tried the other protocols available for traceroute, which on my Mac are: UDP TCP GRE or ICMP. Sadly, none of them besides UDP gave any meaningful results. And this is without any shaping enabled. Just vanilla.

TCP just flat out doesn't work (timeouts across the board).

GRE worked a little, but always got blocked inevitably at one of the hops along the way. So I could never get a trace to complete using GRE.

ICMP always just immediately completed and told me that whatever I was tracing to was 1 hop away.

zwck

@mattund Sorry Matt, that i ask again, is this post from earlier in this thread still accurate https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/339 if not can you recommend some settings. in a screenshot form. That would be lovely

A Former User

To workaround the traceroute issue, I just applied the in/out pipe on the LAN interface instead.

This won't work for larger setups with multiple LAN interfaces though.

mattund

@zwck

I am still using that configuration presently

luckman212

@muppet I don't think that would work if you have squid set up either right? Because the source for the proxied traffic would not be LAN... ?

bafonso

@muppet I have my system with multiple vlans going through wan or openvpn connections being limited by the same limiters, and even prioritizing works.

zwck

@mattund Thanks (also out direction, wan interface in/out = up and down and not reversed like the description states it?)

bafonso

@zwck said in Playing with fq_codel in 2.4:

@mattund Thanks (also out direction, wan interface in/out = up and down and not reversed like the description states it?)

The way to make sure it is working the right way is to setup arbitrarily low levels with different rates, say 5 and 10 Mbit/s. That way, you can figure out immediately if the order is swapped by doing a speedtest. Once you are sure for one rule, you can replicate the order for all other rules.

tman222

My apologies for the delay in responding. Thankfully it looks like most of the questions were answered by others in the meantime. :) Just a few more thoughts from me, FWIW:

Regarding ECN: I have it enabled in both up and down directions, and have had it on ever since I started using fq_codel a few pfSense releases ago. I have not experienced any adverse performance related to its use so it has not been a parameter I have spent a lot of time with. I saw that @dtaht already shared his thoughts on this topic, and I have also found this link on fq_codel tuning on the Bufferbloat.net site helpful: https://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchmarking_Codel_and_FQ_Codel/
Regarding Masks: I used to have these enabled before, but after watching the Netgate YouTube video on fq_codel and reading the description in the GUI, I decided to leave them disabled when I reenabled fq_codel in 2.4.4. There has been no difference in behavior or performance that I have noticed. In my case, I'm interested in limiting bandwidth by interface rather than by host. Plus, unless I"m misunderstanding, the FQ (or fair queuing) part of fq_codel should help ensure that all hosts get fair access (unless of course one host opens a ton of connections relative to the others, but this is unlikely in my current use case). That being said, one way to test this out might be to set the limiters arbitrarily low (i.e. lower than the limits of your internet connection). Then run two sets of speed tests with two hosts on the same network segment, one test with the masks and enabled and one test without. In the former case I would expect 2x the limit as throughput (since it is applied per source/destination IP), in the latter case it should be just 1x since it is the limit for the entire interface. Now, I have not actually ran this test, so I could be wrong, but it seems worth trying to better understand how masks work.
Performance: I have a symmetric 1Gbit fiber interconnection and with fq_codel enabled using 950Mbit limiters, I'm able to pass around 920-930Mbit in each direction while keeping latencies and bufferbloat in check. This is on a 2.2Ghz quad core Xeon D based system. For those of you experiencing performance issues, it might be worth trying to emulate the fq_codel by using ALTQ traffic shaping. All one has to do is enabled the FAIRQ traffic shaper and then enable Codel on the default queue. I used this setup for a little while before switching to fq_codel and performance was similar for me (in fact there might be a few notes comparing the two further up in the thread).

A Couple Other Thoughts:

One parameter I have had to increase (much to my surprise was) the Limit parameter. I've decided to double this from the default 10240 to 20480 because I was seeing enqueing errors in the system log. There are two main reason why I believe this was necessary: a) I have 10Gbit LAN hosts sending data into a 1Gbit WAN link (so this becomes the choke point) and b) I also enabled the TCP BBR congestion control on my 10Gbit Linux hosts. I really like BBR because it does seem to improve upload performance, especially over long distances, but it is a pretty aggressive algorithm so I could see how it might possibly overwhelm the default queue size.
From what I can tell, MacOS X now actually has fq_codel enabled by default as part of the operating system. I ran a speedtest on an iMac running the latest Mac OS X release (10.13.6) with fq_codel disabled on pfsense, and sure enough, bufferbloat and latencies stayed under control. Here is some additional discussion on this topic on the Bufferbloat mailing list: https://lists.bufferbloat.net/pipermail/bloat/2018-September/008615.html

dtaht

couple notes:

In linux we've evolved away from a per packet limit in both fq_codel and sch_cake in favor of memory limits. Bytes is a proxy for time, packets (in linux, with gso/gro) have a dynamic range of 1000x1 - 64 bytes to 64k bytes. I don't know to what extent freebsd does gro/gso. The 10k packet limit however should be enough at a gige, but perhaps not at 10gige, if you are also not using gso/gro.
While it is wonderful to see fq_codel as a default on osx wifi, it only applies when that box is the bottleneck. You still need something to manage the overall up/down characteristics of the link at the isp bottleneck for all the other users, and also many access points have begun to adopt fq_codel as well. So for example, a 1gbit isp link will end up bottlenecking on your 300Mbit wifi link, but your ethernet users will bottleneck on the isp link. I thought strongly when we started the project that wifi was going to become the core bottleneck for most people and we needed to fix that first - but it was complicated and we fixed ethernet and isp links first.

Still, wifi aps using qca's ath10k and ath9k chipsets are performing beautifully now - not just in the openwrt/dd-wrt world but eero and google wifi.

http://blog.cerowrt.org/post/real_results/

I'm pretty sure that this christmas's wifi models from tplink, netgear, etc have picked up at least some of the code on boxes using qca's chipsets. (not a peep from broadcom, sigh) - but I universally reflash 'em anyway so wouldn't know.

I mostly like bbr. the fq part of fq_codel helps it find the right rate, and stay there. Most of my objection to bbr is it can take a long time to find that rate and there's presently no defined ecn response for it. And yea, it can send a lot of packets while not probing for that rate quickly enough.
The biggest cpu hog we have is in the shaper. In linux - if you are running at line rate (e.g. 1gbit ethernet) you don't need a shaper on the outbound half at least. I think this is presently the biggest flaw in this version here ? - I am curious what happens on bsd if you try to run it without a shaper? Linux's device driver tech includes BQL ( https://lwn.net/Articles/469652/ ).
what's the difference in latency at a gbit unshaped vs shaped with your fiber connection? I've seen 60ms or more in both directions unshaped. You can't trust dslreports at this speed, you have to use a tool like flent targetted at a 10gigE box in the cloud.
I would love pfsense gbit shaping numbers on the apu2. It's not that I can't justify using fat neon boxes to save everybody on the link minutes of delay and frustration, jitter, bad videoconferences, and the like, it's that I hate fans!! and I did recently get the linux version of sqm for that to shape at 900mbit.

gsmornot

Interesting. MacOS 10.14.1 Beta. Removed all shaping from my gig connection and A on DSLRepots.
WiFi is A as well via iPhone iOS 12.1 Beta and reaches over 400Mb in each direction. Not bad.

dtaht

What's your AP?

What sort of stats is OSX reporting?

netstat -I en0 -qq # where en0 is your wifi interface

gsmornot

@dtaht said in Playing with fq_codel in 2.4:

What's your AP?

What sort of stats is OSX reporting?

netstat -I en0 -qq # where en0 is your wifi interface

AP is AirPort Extreme AC. I will have to check the other later.
I have 4 AirPorts around the house for coverage and all on channels based on least used per location to reduce interference.

bafonso

You may not see bad results of bufferbloat if the ISP has taken the proper measures to curb it.

Harvy66

@tman222 BBR2 is in the works. Better utilization, less aggressive. 1.0 was a great first attempt.

tman222

Hi @Harvy66 - that's great to hear. I'm pretty sure I'm using a fairly dated version of the algorithm as it has been enabled on machines which are running off the stable Debian branch (so older 4.9.x kernel).

Regarding, fq_codel on Apple devices, I can confirm that it appears to be consistent across both iOS and Mac OS X devices using the latest version of the respective operating systems:

iPad (2x2 AC, iOS 12):
https://www.dslreports.com/speedtest/39645428

Laptop (2x2 AC, Mac OS X 10.13.6):
https://www.dslreports.com/speedtest/39645618

The WiFi AP used in this case was as Ubiquiti AC-HD. I'm pretty impressed that I'm able to transmit 400-500Mbit over wireless with minimal bloat; pretty cool! I wish I had saved some of the old tests I ran, but I do recall struggling to get any higher than a "B" on the bufferbloat score.

occamsrazor

@tman222 said in Playing with fq_codel in 2.4:

Regarding, fq_codel on Apple devices, I can confirm that it appears to be consistent across both iOS and Mac OS X devices using the latest version of the respective operating systems:

As I use almost exclusively Apple devices, at least for those that utilize significant traffic, does this affect how one might set up the fq_codel traffic limiter?

And I've become a bit lost following all the recent discussion especially with @tman222 and @dtaht..... Is the basic setup outlined in the pfSense hangout here still the recommended basic setup? https://www.youtube.com/watch?v=o8nL81DzTlU&t=380