Playing with fq_codel in 2.4

bafonso

@zwck said in Playing with fq_codel in 2.4:

@mattund Thanks (also out direction, wan interface in/out = up and down and not reversed like the description states it?)

The way to make sure it is working the right way is to setup arbitrarily low levels with different rates, say 5 and 10 Mbit/s. That way, you can figure out immediately if the order is swapped by doing a speedtest. Once you are sure for one rule, you can replicate the order for all other rules.

tman222

My apologies for the delay in responding. Thankfully it looks like most of the questions were answered by others in the meantime. :) Just a few more thoughts from me, FWIW:

Regarding ECN: I have it enabled in both up and down directions, and have had it on ever since I started using fq_codel a few pfSense releases ago. I have not experienced any adverse performance related to its use so it has not been a parameter I have spent a lot of time with. I saw that @dtaht already shared his thoughts on this topic, and I have also found this link on fq_codel tuning on the Bufferbloat.net site helpful: https://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchmarking_Codel_and_FQ_Codel/
Regarding Masks: I used to have these enabled before, but after watching the Netgate YouTube video on fq_codel and reading the description in the GUI, I decided to leave them disabled when I reenabled fq_codel in 2.4.4. There has been no difference in behavior or performance that I have noticed. In my case, I'm interested in limiting bandwidth by interface rather than by host. Plus, unless I"m misunderstanding, the FQ (or fair queuing) part of fq_codel should help ensure that all hosts get fair access (unless of course one host opens a ton of connections relative to the others, but this is unlikely in my current use case). That being said, one way to test this out might be to set the limiters arbitrarily low (i.e. lower than the limits of your internet connection). Then run two sets of speed tests with two hosts on the same network segment, one test with the masks and enabled and one test without. In the former case I would expect 2x the limit as throughput (since it is applied per source/destination IP), in the latter case it should be just 1x since it is the limit for the entire interface. Now, I have not actually ran this test, so I could be wrong, but it seems worth trying to better understand how masks work.
Performance: I have a symmetric 1Gbit fiber interconnection and with fq_codel enabled using 950Mbit limiters, I'm able to pass around 920-930Mbit in each direction while keeping latencies and bufferbloat in check. This is on a 2.2Ghz quad core Xeon D based system. For those of you experiencing performance issues, it might be worth trying to emulate the fq_codel by using ALTQ traffic shaping. All one has to do is enabled the FAIRQ traffic shaper and then enable Codel on the default queue. I used this setup for a little while before switching to fq_codel and performance was similar for me (in fact there might be a few notes comparing the two further up in the thread).

A Couple Other Thoughts:

One parameter I have had to increase (much to my surprise was) the Limit parameter. I've decided to double this from the default 10240 to 20480 because I was seeing enqueing errors in the system log. There are two main reason why I believe this was necessary: a) I have 10Gbit LAN hosts sending data into a 1Gbit WAN link (so this becomes the choke point) and b) I also enabled the TCP BBR congestion control on my 10Gbit Linux hosts. I really like BBR because it does seem to improve upload performance, especially over long distances, but it is a pretty aggressive algorithm so I could see how it might possibly overwhelm the default queue size.
From what I can tell, MacOS X now actually has fq_codel enabled by default as part of the operating system. I ran a speedtest on an iMac running the latest Mac OS X release (10.13.6) with fq_codel disabled on pfsense, and sure enough, bufferbloat and latencies stayed under control. Here is some additional discussion on this topic on the Bufferbloat mailing list: https://lists.bufferbloat.net/pipermail/bloat/2018-September/008615.html

dtaht

couple notes:

In linux we've evolved away from a per packet limit in both fq_codel and sch_cake in favor of memory limits. Bytes is a proxy for time, packets (in linux, with gso/gro) have a dynamic range of 1000x1 - 64 bytes to 64k bytes. I don't know to what extent freebsd does gro/gso. The 10k packet limit however should be enough at a gige, but perhaps not at 10gige, if you are also not using gso/gro.
While it is wonderful to see fq_codel as a default on osx wifi, it only applies when that box is the bottleneck. You still need something to manage the overall up/down characteristics of the link at the isp bottleneck for all the other users, and also many access points have begun to adopt fq_codel as well. So for example, a 1gbit isp link will end up bottlenecking on your 300Mbit wifi link, but your ethernet users will bottleneck on the isp link. I thought strongly when we started the project that wifi was going to become the core bottleneck for most people and we needed to fix that first - but it was complicated and we fixed ethernet and isp links first.

Still, wifi aps using qca's ath10k and ath9k chipsets are performing beautifully now - not just in the openwrt/dd-wrt world but eero and google wifi.

http://blog.cerowrt.org/post/real_results/

I'm pretty sure that this christmas's wifi models from tplink, netgear, etc have picked up at least some of the code on boxes using qca's chipsets. (not a peep from broadcom, sigh) - but I universally reflash 'em anyway so wouldn't know.

I mostly like bbr. the fq part of fq_codel helps it find the right rate, and stay there. Most of my objection to bbr is it can take a long time to find that rate and there's presently no defined ecn response for it. And yea, it can send a lot of packets while not probing for that rate quickly enough.
The biggest cpu hog we have is in the shaper. In linux - if you are running at line rate (e.g. 1gbit ethernet) you don't need a shaper on the outbound half at least. I think this is presently the biggest flaw in this version here ? - I am curious what happens on bsd if you try to run it without a shaper? Linux's device driver tech includes BQL ( https://lwn.net/Articles/469652/ ).
what's the difference in latency at a gbit unshaped vs shaped with your fiber connection? I've seen 60ms or more in both directions unshaped. You can't trust dslreports at this speed, you have to use a tool like flent targetted at a 10gigE box in the cloud.
I would love pfsense gbit shaping numbers on the apu2. It's not that I can't justify using fat neon boxes to save everybody on the link minutes of delay and frustration, jitter, bad videoconferences, and the like, it's that I hate fans!! and I did recently get the linux version of sqm for that to shape at 900mbit.

gsmornot

Interesting. MacOS 10.14.1 Beta. Removed all shaping from my gig connection and A on DSLRepots.
WiFi is A as well via iPhone iOS 12.1 Beta and reaches over 400Mb in each direction. Not bad.

dtaht

What's your AP?

What sort of stats is OSX reporting?

netstat -I en0 -qq # where en0 is your wifi interface

gsmornot

@dtaht said in Playing with fq_codel in 2.4:

What's your AP?

What sort of stats is OSX reporting?

netstat -I en0 -qq # where en0 is your wifi interface

AP is AirPort Extreme AC. I will have to check the other later.
I have 4 AirPorts around the house for coverage and all on channels based on least used per location to reduce interference.

bafonso

You may not see bad results of bufferbloat if the ISP has taken the proper measures to curb it.

Harvy66

@tman222 BBR2 is in the works. Better utilization, less aggressive. 1.0 was a great first attempt.

tman222

Hi @Harvy66 - that's great to hear. I'm pretty sure I'm using a fairly dated version of the algorithm as it has been enabled on machines which are running off the stable Debian branch (so older 4.9.x kernel).

Regarding, fq_codel on Apple devices, I can confirm that it appears to be consistent across both iOS and Mac OS X devices using the latest version of the respective operating systems:

iPad (2x2 AC, iOS 12):
https://www.dslreports.com/speedtest/39645428

Laptop (2x2 AC, Mac OS X 10.13.6):
https://www.dslreports.com/speedtest/39645618

The WiFi AP used in this case was as Ubiquiti AC-HD. I'm pretty impressed that I'm able to transmit 400-500Mbit over wireless with minimal bloat; pretty cool! I wish I had saved some of the old tests I ran, but I do recall struggling to get any higher than a "B" on the bufferbloat score.

occamsrazor

@tman222 said in Playing with fq_codel in 2.4:

Regarding, fq_codel on Apple devices, I can confirm that it appears to be consistent across both iOS and Mac OS X devices using the latest version of the respective operating systems:

As I use almost exclusively Apple devices, at least for those that utilize significant traffic, does this affect how one might set up the fq_codel traffic limiter?

And I've become a bit lost following all the recent discussion especially with @tman222 and @dtaht..... Is the basic setup outlined in the pfSense hangout here still the recommended basic setup? https://www.youtube.com/watch?v=o8nL81DzTlU&t=380

Pentangle

Hi, i've been directed here by Jim Pingle. I have a 2.4.4 pfsense and have deployed FQ_Codel as per the hangouts youtube video.
However, I have a problem - whilst the bufferbloat tests now give A+ as you'd expect, whenever I use the bufferbloat test at DSLReports the download test is fine, but coming to the end of the upload test it knocks out all the other connections such as streaming radio, VoIP calls or RDP connections in the office.
Initially I thought it would be due to an over-optimistic setting for bandwidth (the WAN is an 80/20 FTTC connection which nominally performs at 71Mbit/s downstream and 17.5Mbit/s upstream as per speedtest.net tests done at the time) so as I originally had it set to 68/15 I brought it down to 60/12 and whilst it's a bit better it still dumps all the connections right at the end of the upstream test.
So, I have a feeling it might be an exhaustion of the queue? How do I view the counters in real time? and if it's not to do with this does anyone have any idea how to go about troubleshooting?

Second question - whilst I'd like to use FQ_Codel, can someone propose how I can combine this with a queue or limiter structure in order to guarantee bandwidth for things such as VoIP?

TheNarc

@pentangle I assume this was implied, but just to clarify, you do not see these connections being dropped when you don't use any limiters? One thing that may be worth trying (and at least is easy to try) is setting masks on the queues. That will cause a dynamic queue to be created for each host, rather than all hosts funneling into a single queue, while still enforcing a cumulative bandwidth limit. To be clear, don't set masks on the limiters, but on the child queues of the limiters. For download queues set a destination mask and for upload queues set a source mask; for both set them to 32/128 bits for IPv4/IPv6.

tman222

I would try @TheNarc 's suggestions first. If you still experience drops, here are a couple more things to try:

Reducing the quantum parameter on the algorithm to something lower like 300. Reference: https://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchmarking_Codel_and_FQ_Codel/
If that also does not help, you could create separate weighted queues under your limiters (e.g. one set of queues for VoIP and RDP traffic, another set for the other traffic) to ensure that VoIP and RDP are guaranteed a certain amount of bandwidth. This would also require you to create the appropriate firewall rules to route VoIP and RDP through the one set of queues, while the rest of the traffic would go through the other set of queues.

Hope this helps.

bafonso

As far as I know, there are no weights or priorities with FQ_CODEL. If you want such functionality you could try using QFQ + codel. From ipfw man page:

implements the QFQ algorithm, which is a very fast variant of
WF2Q+, with similar service guarantees and O(1) processing
costs (roughly, 200-250ns per packet).

Using QFQ + Codel works pretty well for me when it comes to sharing bandwith priority based.

dtaht

@tman222 try your test further from the ap.

dtaht

@pentangle normally things like voip "just work" with fq_codel. no classification required. https://www.researchgate.net/publication/327781871_Analysing_the_Latency_of_Sparse_Flows_in_the_FQ-CoDel_Queue_Management_Algorithm

jasonraymundo31

how to do fq_codel on dual wan (load balancing + failover) setup?

I tried to follow jim-p video on fq_codel, it works on single wan. but on dual it's not working.

dtaht

More long duration flent based tests would help. I'm concerned about various bits of flakyness y'all are reporting, like dropping all connections at the end of a test.
if there is someone out there that would like to send me a box to play with, or (or more simply) open up a ssh port, that would help.

bafonso

@knowbe4 said in Playing with fq_codel in 2.4:

how to do fq_codel on dual wan (load balancing + failover) setup?

I tried to follow jim-p video on fq_codel, it works on single wan. but on dual it's not working.

I have several gateways (WAN + VPNs) and I did it using limiter rules on LAN. They all get put into the same pipe. The tricky part with your setup is that your overall bandwidth is shared over two connections so if I were you I would use interface QoS per WAN interface and use a round-robin WAN assignment which is I'm assuming you are doing given you are doing load balancing. The only way to properly load balance and share bandwidth evenly between your customers would be to weight the random gateway assignment based on real time bandwidth usage, effectively load balancing the bandwidth as opposed to connections. I'm not sure such thing exists off the shelf.

zwck

@dtaht

What exactly would you need, a port open on the pfsense so you could ssh into it and you could play with fq_codel. Or are you more interested in just pure data acquisition with a pfsense standard config using flent. I have not looked into flent, are there some guides i can follow? Or would it be already enough to setup a ubuntu vm and you'll do the rest.