Playing with fq_codel in 2.4
-
@gsakes try the ack-filter option on your outbound cake instance. try turning on ecn on your src and destination tcp stacks. While I'd doing flent featureitus, you can also capture cake and fq_codel stats from the gw.
if you set up ssh authorized_keys to let you get there without a login. Some things (sigh) require root, so you setup your .ssh/config as I do with:
Host gw*
User rootHost apu*
User root--te=qdisc_interfaces=enp1s0 --te=qdisc_stats_hosts=hosta
it would be good to one day be able to poll a pfsense ipfw instance this way also.
You can monitor/plot ongoing cpu_stats on the gw with
--te=cpu_stats_hosts=hostA,hostB,hostC # and if you allow ssh to localhost monitor
local stuff too.While I'm at it you can also change congestion control algorithms with --te=CC=reno # for example. I don't have bbr universally or publically deployed, linode doesn't build it in. Note, we have no way of verifying except by eyeball if we actually switched CC algos. I can certainly see (after being trained by flent) what reno, cubic, cdg, and bbr "look like"). Perhaps we need to turn an AI on these graphs! :)
I just mentioned that vm's network can get overburdened. so there's a
--te=ping_hosts=hosta,hostb,hostcinstead of just watching ping in another window. That's lowcost.
A full list of the admittedly underdocumented and sometimes buggy additional test data collection mixins is in the flent/tests/*.inc. I do note that by default we try to make flent not heisenbug the tests by hitting a cpu burden or bandwidth limit elsewhere. For example, remotely polling for cake stats is very intensive at -s .02 and does seriously impact the performance of a low end openwrt router, so I wrote a tool in c that does it way faster - (but it broke on a recent release of iproute) - and it is still intensive so beware.
pull requests for better documentation, blog posts about the joy of flenting your network, gladly accepted. :)
I do tend to script this stuff with a huge variable of all the extra tests I run, toke uses the "batch" facility also built into flent. If you like [ini] file formats, go for "batch".
There's also tcpdump, tcptrace -G, and xplot.org. I DO - when I spot a weirdness - fire off a tcpdump while flenting and look at the real capture with wireshark or xplot.org. There's a good java version of xplot, also. Doing that tcpdump/xplot.org plot of your before/after test is quite informative, you can see all the carnage going on in a tcp flow even more directly (and we used tcpdump a lot to verify that flent's sampling and stats were indeed correct - but tcpdump (even with using -s 128) is very cpu intensive and often you want to be dumping at the server side of the thing...)
-
@strangegopher "didn't work". This is the -s .02 test? If so, it's pretty normal to show that your upload throughput is very spotty over 20ms interval at this low bandwidth. The default sampling rate of -s .2 "hides" that. see nyquist theorem. This is another one of my rants - humans thing of bandwidth as data/interval, and set the interval to seconds. Where, here, we just set it to data/20ms and the results got "interesting" - you got a lot more detail on the download sawtooth.
For the upload... (nyquist bit us again)...
Arguably the plot idea itself is wrong here. We should show dots or crosses rather than connect the dots with lines when the data rate is this low, or rework how the average is smoothed.
( I really am trying to encourage folk to post their .flent.gz data so I can verify you did it right).
-
@gsakes try that --socket-stats test without -s .02
-
@strangegopher having a fat cpu may or may not help on inbound shaping. you are bound by the context switch latency which is 1000s of clocks on "modern" intel hardware. The (paper) Mill computer cpu can do it in 5. If we could do it in offloaded hardware, cool.
You are also bound by clock resolution, interrupt coalescing, and numerous other potential difficulties.
I regard the biggest remaining technical problem in the whole bufferbloat effort is doing inbound shaping cheaply enough at high rates, to eliminate the "badwidth" folk are providing at these speeds (a recent test of 5G showed 2 seconds of buffering).
I'd have preferred the core algos just roll out to more vendor hardware, and pursued political solutions.
I'd hoped that outbound shaping would get more fixed by (one example) having a programmable completion interrupt in the ethernet/gpon/cable chip - which essentially makes it "free" for bql derived systems (I still haven't found the intel chipsets that do it), or that ISPs would get clue and demand a better shaper + fq_codel from their CMTS/BRAS/ENODE-B/GPON vendors on their side so we wouldn't have to inbound shape at all. It's just a programmable interrupt per subscriber to do it right in hardware on outbound in their case. Totally doable. And profitable for whatever chipmaker/vendor gets there first. http://jvimal.github.io/senic/
'cause fixing the internet on my dime and time costs. (but do you say "no" when vint cerf, jim gettys, esr, paul vixie and dave reed tell you you're the only guy that can make even a proof of concept work? (cerowrt). I couldn't... and I was thoroughly POed about it in the first place, as while I was (retired) in nicaragua, my wifi "upgrade" from g to n one summer for my 14km long link to the internet failed completely during the (2 month long) rainy season - when there was nothing to do but surf the internet. So great. it's 8 years later and I can finally get a decent wifi link to the boonies down there, and I keep thinking about retiring again.... sailing down there... never logging in again... reading books... writing one... surfing a lot...)
As it happens, I have what I think is a better algorithm for inbound policing (It's called "bobbie"), but I abandoned the work when early versions of cake were 40% faster than fq_codel was (cake is now twice as slow as htb + fq_codel due to featureitus), and decided to focus on fixing wifi as the most bang for what little bucks I had left, and as something I knew would work. Even with grant money for it, proving if bobbie could work was going to take a long time. I published (could have patented) the idea behind that in the hope that someone else would take up the work - as (for example) the fq_codel work for bsd was taken up by a team in australia, and other volunteers keep pushing more, good stuff out, fixing bugs, and figuring out how to deploy things.
you can also look at some of the work in the conex working group of the ietf.
Anyway, I don't know why you are only getting 120mbit down on 150mbit config on a beefy box running an OS I don't know a lot about. Tuning the limiter is the last idea I have... trying to up the target and interval in codel perhaps...
-
there has been a windows version of flent at various points in its existence. getting all the dependencies right has been hard, so it's mostly used on linux and osx. yes, having a windows version packaged up would be good. There was also an effort to make it work over the web at one point (flent's output is json, and with a good json library for plotting the prototype looked good. We needed a backend database and some other stuff)
def goin to bed
-
@dtaht Porting it into pfsense as a package would be nice too.
-
@pentangle It's in openbsd... please point a freebsd packager at it. https://repology.org/metapackage/flent/versions
-
@dtaht Unfortunately I'm one of those Windows experts who has little or no knowledge of BSD aside from what the GUI gives. I'll leave that to someone competent in it.
-
@dtaht said in Playing with fq_codel in 2.4:
@pentangle It's in openbsd... please point a freebsd packager at it. https://repology.org/metapackage/flent/versions
ahh, this would be ideal. The only other hardwired hosts I have to test from are either raspberry pi or an ancient mac mini - pretty sure the mini is the reason my flent tests show such a different picture than when i test using netperf on the pfsense router.
here was my latest: charter can't tell me what i'm provisioned for, so i don't know if i have 400mb or 300mb, so i adjusted my shaper down to 290 from 390:
-
@sciencetaco Is that a live link? What's your "before" result? That would give you an exact idea as to what you are getting from them, and an even more exact would be running the tcp_download test instead of rrul. What's a dslreports on a before setting? You can try watching your link quality on the cablemodem itself also. Bad SNR there can be a problem.
(helps to have the flent.gz files)
Still don't know why you aren't getting close to the set rate. (can the limiter burst size be tuned?) Another thing to try is setting the codel algo to kick in (for test purposes) much later, like target 30ms, but I'm still pointing fingers at the limiter/cpu context switch latency, out of cpu, etc.........
-
Going back to my alice's restaurant quote (which is an arcane bit of americana - you can lose 20 minutes to this https://www.youtube.com/watch?v=W5_8U4j51lI&start_radio=1&list=RDW5_8U4j51lI , easily, it's a thanksgiving tradition here)
(I think y'all are mostly in england or the continent?)
and being sad about how few views the various core bufferbloat presos jim, I and others have done ( https://www.youtube.com/results?search_query=bufferbloat ) - even the highly entertaining one by stephen hemminger ( https://www.youtube.com/watch?v=y5KPryOHwk8&t=828s ) only has 3k views (take 10 minutes out of your life, it's great, but not as great as alice's restaurant)
and seeing https://www.youtube.com/results?search_query=fq_codel+pfsense have 30k views...
I'd love to try to get some of what we've discussed and solved above with flent into a script and talk with a good presenter like lawrence or Mark Furneaux to get more folk to pay attention....
video is not my forte. But you can get anything you want at alice's restaurant.
-
@dtaht said in Playing with fq_codel in 2.4:
@sciencetaco Is that a live link? What's your "before" result? That would give you an exact idea as to what you are getting from them, and an even more exact would be running the tcp_download test instead of rrul. What's a dslreports on a before setting? You can try watching your link quality on the cablemodem itself also. Bad SNR there can be a problem.
(helps to have the flent.gz files)
Still don't know why you aren't getting close to the set rate. (can the limiter burst size be tuned?) Another thing to try is setting the codel algo to kick in (for test purposes) much later, like target 30ms, but I'm still pointing fingers at the limiter/cpu context switch latency, out of cpu, etc.........
link is live, only traffic is 1 streaming client.
I disbled my limiters and ran a few tests, as you suggested. Here are the outputs:
tcp_down limited:
rrul46 without a limiter:
tcp_down no limit:
i really appreciate everyone that's taken the time to take a look at my output and offer suggestions. thank you all.
flent .gz files: (4 of them in the tarball)
0_1539269433292_flent.tar -
@sciencetaco your limited and unlimited tcp_download plots are essentially the same. latency is good. Sure you turned it off? Otherwise your latency is not bad
Otherwise the fact that four flows do a bit better than one implies that we have a flow per CMTS channel (take a look at your cmts config and see how many channels are operational), or a limit on my tcp stack on this server (I usually test with four flows) or your receive window... (which btw, I'd not considered before on this thread - I can't do much testing of the real world at real rtts outside the lab. At short RTTs I can certainly do 1gb on various hardware)
There's some hint that powerboost might be on with charter (first 20 sec solid then a major drop (yea, optimizing for speedtest)).
but heck, rather than fiddle with the tcp config, try killing it with a hammer. try using
--te=download_streams=16 tcp_ndown
with the limiter on and off. Cool that you just tested both ipv6 and ipv4 at the same time, also. I kind of hope we end up with a "superscript" that has all 10 tests nicely done in a row... against someone elses server.
The totals plot is much easier to deal with.
thx for sharing the flent data! However I meant to go to bed about 8 hours ago... bbl
-
@ dtaht Here are the results with ack filter on the outbound interface, and ecn enabled:
-
@gsakes That result is a thing of beauty. If only the whole internet worked like that.
-
@dtaht lol - i must be reading that wrong - that result is actually better?
-
@gsakes well post the flent files? and tc -s stats? to my eye - tcp up/download occilations are less, bandwidth ruler flat, uplink bandwidth improved by a few percent, and induced latency cut from 5ms to about 4. Love to do a comparison plot.
Another flent trick is you can lock the scaling to one plot when flipping between them, so your first plot had more occilations than you see here. I think... let me scroll back
5 years of work for those few percentage points compared to the orders of magnitude win fq_codel was... but I'll take it, and frame it.
-
@dtaht Yes- frame it - it was worth it, no - really. Ok, here's without ack-filter, and ecn, I'll post those flent files next:
-
a comparison plot is easier if you make your -t 'text-to-be-included-in-plot' actually meaningful. :)
-
Flent files for ack-filter/ecn:
0_1539273758800_rrul-2018-10-11T155505.344180.text-to-be-included-in-plot.flent.gz
-
@dtaht lol - ok - if you don't mind, give me the command line for what you want me to run, it'll be easier than for me looking through the gazillion test/combinations/options. Sure, text is easy, but let me know what other tests you want me to run:)
-
I can fix it in post, but for a future run of any given new thing, it helps minute, hours, or months later if you run it with a meaningful title...
-t 'gsakes_cake_noecn_noack-filter'
or use the --notes option so you capture more of the basic test parameters.
I'm personally sitting on top of a few hundred thousand rrul tests....
Anyway, would love the prior noecn noackfilter test flent file before I try to get to bed
-
@dtaht said in Playing with fq_codel in 2.4:
gsakes_cake_noecn_noack-filter'
Sure thing:
0_1539274464800_rrul-2018-10-11T161222.683165.gsakes_cake_noecn_noack-filter.flent.gz -
@dtaht (facepalm) - Oh, I see - Flent's got a rather nice gui, now it makes sense:)
-
going back to the fios example earlier. having fq_codel shaping at 48mbit appears to get more bandwidth than fios with 120ms worth of buffering at 50mbit does. Why? You get a drop, it takes 120ms to start recovery, you get a drop at the end of that recovery and the process repeats. Someone tell FIOS that engineering their fiber network to speedtest only works for 20 seconds.....
It's kind of hard to see at this resolution but the green line's median and whiskers are locked solid at 48mbit while the other... (btw, I should go pull the actual data transferred out of flent, I didn't, but you can't actually trust it due to 120ms worth of occilation, anyway, but improved bandwidth here is real. As is the lowered latency.
-
Hey all,
here are some measurements
Setup:
Baseline measurement: Pc1 - Pc2 just via switch no sense involved:
0_1539274374287_rrul-2018-10-11T071821.166928.no_shaper_vm_to_vm.flent.gz
Measurement with sense without shaper:
0_1539274525822_rrul-2018-10-11T073544.338424.no_shaper_sense_inbetween.flent.gz
Then I started shaping/playing around(no methodology involved) to 100/100Mbit and played with tuneables.
For me the only tuneables that were usefull were:machdep.hyperthreading_allowed="0" hw.igb.rx_process_limit="-1"
For the setup i found out that i only adjusted quantum and turned off ecn.
10000: 800.000 Mbit/s 0 ms burst 0 q75536 50 sl. 0 flows (1 buckets) sched 10000 weight 0 lmax 0 pri 0 droptail sched 10000 type FQ_CODEL flags 0x0 0 buckets 0 active FQ_CODEL target 5ms interval 100ms quantum 1518 limit 10240 flows 1024 NoECN Children flowsets: 10000 10001: 800.000 Mbit/s 0 ms burst 0 q75537 50 sl. 0 flows (1 buckets) sched 10001 weight 0 lmax 0 pri 0 droptail sched 10001 type FQ_CODEL flags 0x0 1024 buckets 0 active FQ_CODEL target 5ms interval 100ms quantum 1518 limit 10240 flows 1024 NoECN Children flowsets: 10001
0_1539274939477_rrul-2018-10-11T172420.808123.100mbit1_1518_down_noecn_upecn.flent.gz
Finally i went up to 800Mbit
0_1539274972854_rrul-2018-10-11T174303.861055.800mbit1_1518_down_noecn_upecn.flent.gz
maybe its interesting :D
-
@gsakes heh. You were living this whole time at the command line? flent-gui *.flent.gz hit view->everything, select two or more files from the open files, select a different plot type like cdf, switch files around in the tabs.
I run this thing sometimes against 30+ test runs - it uses up every core you have to predict that next plot - it's just amazing tool... it's why we succeeded where others in the field are still painfully looking at tcptrace'd packet captures....
I would knight toke if I could.
anyway, good, thx for the data! cake with ack filtering is actually slightly higher latency than without (it gets rid of more little packets so more big ones fit in), you got .35mbits extra out of your 5mbit uplink, and something in the noise hit you...
And this shows how tcp occilate less (for the same throughput) by using ecn rather than drop for congestion control.
It's subtle, at this rtt, but you can see how data is bursty ? that's head of line blocking for a 35ms rtt against the 20ms sample rate.
a tcpdump is good for observing this.
I've noted earlier that I'm not all that huge on ecn. I can totally live with tcp blocking for 35ms (compared to hundreds of the pre-fq_codel-era) and with drop as the sole indicator of congestion. At rtts of greater than 100ms for human interactive traffic, well, ok, ecn might be good... but for that I stopped using ssh and switched to mosh.
wide deployment of ecn worries me. But it's happening anyway and why not get more folk playing with it while being aware we might break the internet with it...
there's other benefits in cake like per host fq... kind of hard to test with using this. One example of seeing per host fq work better is to load up a web page benchmark of some (sane) kind on a different box while rruning things like the rrul. We used to use chrome web page benchmarker, but that broke... even then fq_codel still does really well on web traffic...
... those extra .35 mbits from the ack-filter came hard. 6 months worth of work by 5 people.... over 9 months total... 12 if you include the initial research into the rfcs.
-
@dtaht heh, that'll teach me to start using new tools at 03:30am in the morning - yeah totally tunneled on flent, didn't even bother looking at the GUI:)
Yep, I get slightly better results turning off ecn, but leaving ack filter on - less latency, very slightly less throughput. What you and the bufferteam did was well worth it; I'm going to call this 'QOS 2.0' even though the term might already exist:)
-
All this flent testing was a bit inspiring. So I just setup a new server in linode's cloud, in singapore.taht.net . Turns out their current default distro enables fq_codel by default and also bbr. flent and irtt were part of their ubuntu 18.04 LTS package set, so a few minutes later I had a server running. I left the default to cubic. Three things to note:
A) At such a long rtt, there is no tcp that will work particularly well. Built into BBR is a RTT check which I think defaults to 200ms, and singapore is 190ms from california. BBR's primary use case is from a datacenter usually located MUCH closer to you.
If you are located on, like, the island of mauritus (as one of our more devout testers is), his typical rtt was 320ms and he found it necessary to up target and interval to 30 and 300 respectively.
B) fq_codel has two tunables that normally you shouldn't touch. target is the target local delay, and should be 5%-10% of the interval except when that 5% is greater than the MTU and I fudge factor that even more at lower bandwidths. 5ms is a good default above 4Mbit. The interval should be your normal (say 90%) maximum rtt for the flows you have in your locality. So if you are in england, you can fiddle with lower RTTs - like 30!, on the continent, I've seen people using 60 - and in most cases, 100 is just fine.
on wifi we found it helpful to up the target to 20ms during our early tests, in my deployment on real networks, I'm finding 5% to be "just fine", but 20 is the kernel default. I would like to make that tunable.
C) BBR does not pay much attention to the codel part of fq_codel.
since y'all are having such fun with flent, care to try irtt? :)
As for singapore, well, I thnk y'all are getting better at pattern recognition, but you'll see not much of a good one running as far as to singapore, and perhaps that's good input for a fuzzy recognition algorithm...
-
When I got home, I ran the dslreports test, with codel+fq_codel at 290/19, from my laptop on wifi. The results were very good, especially considering i was concurrently watching hulu.
netperfrunner.sh v6:
2018-10-11 16:19:53 Testing netperf-west.bufferbloat.net (ipv6) with 4 streams down and up while pinging gstatic.com. Takes about 60 seconds. Download: 242.21 Mbps Upload: 7.98 Mbps Latency: (in msec, 61 pings, 0.00% packet loss) Min: 35.790 10pct: 38.426 Median: 44.195 Avg: 45.566 90pct: 52.161 Max: 65.077
netperfrunner.sh v4:
2018-10-11 16:21:14 Testing netperf-west.bufferbloat.net (ipv4) with 4 streams down and up while pinging gstatic.com. Takes about 60 seconds. Download: 229.08 Mbps Upload: 12.07 Mbps Latency: (in msec, 30 pings, 50.82% packet loss) Min: 61.389 10pct: 62.261 Median: 67.240 Avg: 67.030 90pct: 70.341 Max: 77.469 code
flent rrul46:
0_1539293261991_rrul46-2018-10-11T161530.029411.290mb_19mb.flent.gz ]
I need to move some stuff around in an attempt to free up a switch port in the basement to test from this machine while wired. Either that, or i'll disconnect lan one day. holy packet loss over v4.
-
Three more tests; all were run with the same up/down b/w parameters, within a few minutes of each other.
1) PFSense - no shaping, baseline:
2) PFSense - codel + fq_codel:
3) Linux - sch_cake, noecn + ack-filter:
-
Well, after watching the pfSense video on fq_codel again, it was mentioned that the sub (child) queues under the limiters are required:
https://www.youtube.com/watch?v=o8nL81DzTlU
I'm not sure if they are required to make sure that the shaping works properly when using floating rules on the WAN interface, or just in general. From my personal experience, applying just the limiter on the LAN side (no child queues) work as well. So I'm still on the quest to find a better explanation as to why child queues need to be created.
-
Any chance You can share if You are configuring the bufferbloat testing servers in a special way(sysctl tweaks or something else). The only thing I do over vanilla ubuntu is to set-up iptables and set fq_codel as default qdisc.
I've also sorted my weird results. Turns out there was a bottleneck between me and my netperf server.
-
tail-drop + fq_codel
-
pie + fq_pie
-
-
Since people have started sharing flent test results, I figured I'd share a couple interesting one as well from some 10Gbit testing that I have done.
Test 1: RRUL test between two 10Gbit Linux hosts (Debian 9) across the firewall (i.e. two different subnets) with IDS (Snort) enabled. Both hosts have the TCP BBR algorithm enabled. A couple spikes in the ping - at this speed with IDS enabled the firewall's CPU cores are essentially pegged.
Test 2: RRUL test between a 10Gbit Linux host (Debian 9) and an iMac with 10Gbit Thunderbolt 3 to ethernet adapter. Both on same subnet. Linux host has BBR enabled. Not sure why there is such slow ramp on the iMac side when it is receiving traffic from the Linux host. Unfortunately I have not found a way to fix. Also note how much for stable the upload is on the Linux host with BBR enabled.
Test 3: RRUL test between two 10Gbit Linux hosts (Debian 9) on the same subnet. Both hosts have the TCP BBR algorithm enabled. In my opinion this test is as essentially perfect. Around 19 Gbit/s of traffic and an average latency of less than 1ms . Plus very stable transfer with BBR.
@dtaht - any thoughts?
-
@tman222 I've tested similarly, albeit lower configured pipes, and I actually find that using queues lowers latency.
Here codel+fq_codel is used on pfSense 2.4.4 limiters and flent/netserver hosts are cubic, kernel 4.4.0-137. (q denotes that one up and one down queue was used)
-
Here codel+fq_codel is used on pfSense 2.4.4 limiters and flent/netserver hosts are fq+bbr, kernel 4.15.0-36. (q denotes that one up and one down queue was used)
-
And then the opposite occurs as throughput increases, latency becomes higher when utilizing queues.
@dtaht any thoughts on what might we be running into here?
-
My first thought, this morning, when asked about my thoughts, was to get some food, a sixpack of beer, and go sailing. :)
Loving seeing y'all doing the comparison plot thing now. you needn't post your *.flent.gz files here if yu don't want to (my browser plugin dumps them into flent automaticall), but feel free to tarball 'em and email dave dot taht at gmail.com.
I don't get what you mean by codel + fq_codel. did you post a setup somewhere?
bb tomorrow. PDT.
-
@tman222 said in Playing with fq_codel in 2.4:
Since people have started sharing flent test results, I figured I'd share a couple interesting one as well from some 10Gbit testing that I have done.
these results were so lovely that I figured the internet didn't need me today.
Test 3) BQL, TSQ, and BBR are amazing. One of my favorite little tweaks is that in aiming for 98.x of the rate it just omits the ip headers from its cost estimate. As for "perfect", one thing you might find ironic, is that the interpacket latency you are getting for this load, is about what you'd get at a physical rate of 100mbits with cubic cake + a 3k bql. Or sch_fq bbr (possibly cubic also) with 3k bql. Thats basically an emulation of the original decchip tulip ethernet card from the good ole days. Loved that card ( https://web.archive.org/web/20000831052812/http://www.rage.net:80/wireless/wireless-howto.html )
If cpu context switch time had improved over the years, well, ~5usec would be "perfect" at 10gige. So, while things are about, oh, 50-100x better than they were 6 years back, it really woud be nice to have cpus that could context switch in 5 clocks.
Still: ship it! we won the internets.
could you rerun this test with cubic instead? (--te=CC=cubic)
sch_fq instead of fq_codel on the client/server? (or fq_codel if you are already using fq)
Test 2) OSX uses reno, with a low IW, and I doubt they've (post your netstat -I thatdevice -qq ?) fq'd that driver.
So (probly?) their transmit queue has filled with acks and there's not a lot of space to fit full size tcp packets. Apple's "innovation" in their stack is stretch acks, so after a while they send less acks per transfer. I think this is tunable on/off but they don't call the option stretch ack. (it may be the delack option you'll find via googling) You can clearly see it kick in with tcptrace though on the relevant xplot.org tsg plot. But 10 sec? wierd. bbr has a 10 sec probe phase... hmmm....Everybody here uses tcptrace -G + xplot.org daily on their packet caps yes?
Test 1) you are pegging it with tcpdump? there's faster alternatives out there. Does the ids need more that 128 bytes?
-
@xciter327 said in Playing with fq_codel in 2.4:
Any chance You can share if You are configuring the bufferbloat testing servers in a special way(sysctl tweaks or something else). The only thing I do over vanilla ubuntu is to set-up iptables and set fq_codel as default qdisc.
I've also sorted my weird results. Turns out there was a bottleneck between me and my netperf server.
-
tail-drop + fq_codel
-
pie + fq_pie
I will share my bufferbloat.net configuration if you gimme these fq_pie vs fq_codel flent files!!! fq_pie shares the same fq algo as fq_codel, but the aqm is different, swiping the rate estimator from codel while retaining a random drop probability like RED. (or pie). So the "wobblyness" of the fq_pie U/D graphs is possibly due to the target 20ms and longer tcp rtt in pie or some other part of the algo... but unless (when posting these) you lock them to the exact same scale (yep, theres a flent option for this) that's hard by eyeball here without locking the graphs. Bandwidth compare via bar chart. Even then, the tcp_nup --socket-stats option (we really need to add that to rrul) makes it easier to look harder at the latency being experienced by tcp.
ecn on on the hosts? ecn on pie is very different.
Pure pie result?
PS
I don't get what anyone means when they say pie + fq_pie or codel + fq_codel. I really don't. I was thinking you were basically doing the FAIRQ -> lots of codel or pie queues method?
-