Playing with fq_codel in 2.4

Pentangle

@fabrizior "Is this really bufferbloat, with the limiters turned on, or simply the lag to whatever testing servers dslReports is picking? no-load latency dropped substantially with the limiters turned on; just not as low as desired."

It's bufferbloat, just not YOUR bufferbloat. Your internet connection between you and the next hop (which is likely to be the DSLAM or equivalent fibre concentrator in your ISP's phone exchange) is not now suffering bufferbloat to any great extent, but the pipe connecting the phone exchange to the wider internet is likely to have latency or oversubscription issues. If you can find a pingable next hop that doesn't incorporate all that additional latency then you could test to the nth degree, but for now it's probably pretty pointless.

"With regards to the loss of maximum bandwidth (for the sake of stable latency) am I correct in my understanding that with this baseline, I can re-test with higher limiter bandwidth settings to figure out how high I can crank it up before my latency numbers become undesirably unstable?"

You can test as much as you like. You will find that as you start specifying bandwidth limits that are close to the maximum the line can reliably tolerate, your latency will go down the toilet, so it's good to get to that point and then crank it back a notch so that you're never hitting that scenario.

mind12

@pentangle
I have used your configuration and achieved A grade. Thank you.

The DSL reports site gives me much lower speeds than speedtest.net with the same configuration. So I searched for another test site and found this. Got similar speeds like on speedtest.net and the same A grade.
https://www.waveform.com/tools/bufferbloat

Which configuration should I modify to achieve even better results?
My connection is a 150/10.

bolmsted

Can the developers comment when/if we will see support for BBR TCP Congestion Control in pfSense/OPNsense? I understand that it may be a feature of FreeBSD 13.x and if HardenedBSD is based on the same release level we should expect this to show up in HBSD 13.x as well when OPNsense integrates HBSD 13.x?

There is nothing official on this but I came across links like this
https://fasterdata.es.net/host-tuning/freebsd/
https://lists.freebsd.org/pipermail/freebsd-current/2020-April/075930.html

The FreeBSD list doesn't really refer to which version of FreeBSD with the non CC method.

Linux has this integrated for quite some time but there is no elegant front end interface like pfSense or OPNsense for Linux - too bad it wasn't ported to Linux as well with nftables backend. If I build on linux I have to do all the setup from scratch link DNS, DHCP, firewalls, etc and not a big fan of the debian model for network configs and would much prefer the RHEL model.

I have a gigabit internet connection and they sync the GPON at 2.5Gbps which means that on upload I can actually exceed the provisioned upload rate causing packet loss, packet retransmissions, and performance degradation - thus not getting the provisioned upload rate.

I have 1Gbps/750Mbps but I should be able to get 1G/1G since they overprovision the line and right now I get 930/660 usually but if I switch to 1Gbps sync then I get 700/930 so it flips around so it seems realistic I should be able to get 1G/1G if I could prevent flooding using BBR on FreeBSD/HBSD under pfSense/OPNsense. I may consider 1.5G/1G package one day but 1G plan is good for me especially if I could get full upload rate.

Some folks on a forum I hang out show results using Linux with BBR when syncing at 1Gbps and 2.5Gbps on a 1.5Gbps/1000Gbps plan.

This is what it looks like for me under various conditions. performance loss is affected by latency.

1000 (No Throttle) Latency Down Up
Bell Alliant - Halifax (3907) 24 792 919
Bell Canada - Montreal (17567) 1 901 933
Bell Canada - Toronto (17394) 8 765 933
Bell Mobility - Winnepeg (17395) 30 874 913
Bell Mobility - Calgary (17399) 47 782 898

2500 (No Throttle) Latency Down Up
Bell Alliant - Halifax (3907) 24 1564 358
Bell Canada - Montreal (17567) 1 1638 933
Bell Canada - Toronto (17394) 8 1619 771
Bell Mobility - Winnepeg (17395) 30 1569 324
Bell Mobility - Calgary (17399) 47 1570 236

2500 (Throttled 1000 up) Latency Down Up
Bell Alliant - Halifax (3907) 24 1558 915
Bell Canada - Montreal (17567) 1 1650 931
Bell Canada - Toronto (17394) 8 1620 931
Bell Mobility - Winnepeg (17395) 30 1569 885
Bell Mobility - Calgary (17399) 47 1600 890

2500 (Throttled 1000 up +bbr ) Latency Down Up
Bell Alliant - Halifax (3907) 24 1541 922
Bell Canada - Montreal (17567) 1 1639 940
Bell Canada - Toronto (17394) 8 1583 936
Bell Mobility - Winnepeg (17395) 30 1617 928
Bell Mobility - Calgary (17399) 47 1547 916

Pentangle

@mind12 I'd say you would now need to benchmark your ISP. By this I mean that your connection to the internet does not just consist of your connection to the hole in your wall, but also the structure of your ISP's network and its peering to the internet. As such, you may want to bring down your limiter bandwidth settings, just by maybe 5mbit/s each, and see again whether your latency increases by roughly +21ms and +8ms again. If it's constant then there's nothing much else you can do on your end of the link and that latency increase may be due to whatever's in your ISP's network or how they deal with your packets when they get them. Once you've done this you can slowly increase the bandwidth settings again until you get to a point whereby the latency goes through the roof. This would then be the speed at which you can send through your pfsense instance and you should then drop it back a bit to ensure your limiters can do their stuff as much as possible. Then once you've finished your testing you should be able to see whether your ISP is adding a reasonable amount of latency or not. Personally, from here I wouldn't be too unhappy with a 23ms upstream latency but a 36ms downstream latency looks a little suboptimal (although a lot depends upon a number of factors probably beyond your control).

Gertjan

@pentangle : your keyboard seems broken ...

Pentangle

@gertjan said in Playing with fq_codel in 2.4:

@pentangle : your keyboard seems broken ...

In what way?

mind12

@pentangle Thanks I will try this.
Those results are not that bad at all, I just would like to maximally optimize my limiter.

What if it doesnt change by lowering the bandwidth?

How will I know if my limiter is the bottleneck? The other advanced parameters could also affect the results right? Queue length, limit flows etc.

Pentangle

@mind12 If the results don't change then you are within your bandwidth limits. The idea being that the limiters and FQ_CoDel need a certain amount of 'headroom' in order to operate (shuffling smaller packets to the front of the queue, etc), and so it'll operate well until it doesn't have enough 'headroom' to play with. You need to determine that headroom (bandwidth limit) and the easiest way to do it is to edge the bandwidth limits up until latency takes a nosedive at which point you know that your FQ_CoDel's efficiency is being impaired by the amount of headroom it has to play with, so you dial it back a notch and voila - the fastest throughput you could get whilst retaining low latency.
As regards other limits, I suggest you do a little reading on what those do for you. The settings I gave should be more than adequate for your connection - they work well with my 300/50 connection here.

fabrizior

@mind12 I drove myself nuts trying to tune the limiter & queue knobs at various limiter bandwidth settings on a 400/25 mbps service with and without load (100 sockets generating ~30+ MB/s download throughput).

What I’ve learned from folks here (thank you):

The only setting that induced definitive change in my test results was the bandwidth limit. The other settings recommended seem sufficiently high to allow proper functionality under load while varying the limiter bandwidth as suggested to identify the required headroom.

The general consensus seems to be that, once configured for appropriate headroom based on your provisioned rates, any variable and higher-than-desired latency results are likely induced somewhere upstream with the ISP’s equipment suffering from bufferbloat (or over-provisioning).

If speed tests and bufferbloat latency numbers stay fairly stable with and without significant load running in parallel with the test, then your side of things is well-tuned.

@pentangle said in Playing with fq_codel in 2.4:

@mind12 If the results don't change then you are within your bandwidth limits. The idea being that the limiters and FQ_CoDel need a certain amount of 'headroom' in order to operate (shuffling smaller packets to the front of the queue, etc), and so it'll operate well until it doesn't have enough 'headroom' to play with. You need to determine that headroom (bandwidth limit) and the easiest way to do it is to edge the bandwidth limits up until latency takes a nosedive at which point you know that your FQ_CoDel's efficiency is being impaired by the amount of headroom it has to play with, so you dial it back a notch and voila - the fastest throughput you could get whilst retaining low latency.
As regards other limits, I suggest you do a little reading on what those do for you. The settings I gave should be more than adequate for your connection - they work well with my 300/50 connection here.

MoonKnight

Got this test without playing with fq_codel
I do have 500/500 connection

Pentangle

@ciscox With a 500/500 connection, unless you're regularly maxing it out (unlikely) then you might not find shaping is necessary for you.

Pentangle

p.s. here's mine whilst watching a Youtube video at 1080p (because I couldn't be bothered to pause it:
https://www.waveform.com/tools/bufferbloat?test-id=1117b948-fafd-4eaa-9332-1b3a09c50819

thiasaef

I did everything according to the instructions in the reply to #815, but traceroute does not work. Any idea how to fix this? Adding the icmp exception rule to LAN is not an option for me.

q54e3w

@thiasaef said in Playing with fq_codel in 2.4:

Adding the icmp exception rule to LAN is not an option for me.

Curious why? It might help folks advise if we understand.

thiasaef

@q54e3w

I have multiple lan interfaces (so I thought it would be a bad idea to try that). And I don't understand why the guide does not work (I'm still on 2.4.5-RELEASE-p1).

bartkowski

@thiasaef I have mine on the Floating (with Quick checked) applied to WAN. May that can work for you?

thiasaef

@bartkowski, my floating rules look like this:

Traceroute output:

traceroute netgate.com
traceroute to netgate.com (208.123.73.73), 30 hops max, 60 byte packets
 1  _gateway (192.168.20.1)  0.098 ms  0.138 ms  0.080 ms
 2  208.123.73.73 (208.123.73.73)  2.615 ms  2.822 ms  4.052 ms
 3  * * * 
 4  208.123.73.73 (208.123.73.73)  22.185 ms  17.234 ms  17.226 ms
...
 8  208.123.73.73 (208.123.73.73)  18.800 ms  18.792 ms  21.285 ms
 9  * * * 
10  * * * 
11  208.123.73.73 (208.123.73.73)  167.760 ms  169.189 ms  169.182 ms
...
15  208.123.73.73 (208.123.73.73)  167.513 ms *  164.364 ms

thiasaef

I'm stupid ... all I had to do to make it work was to enable the --icmp option in traceroute, since traceroute uses udp by default on linux.

PS: Could someone explain me why fq_codel still works in both directions when I disable the 3rd floating rule (WAN-In FQ-CoDel queue).

mind12

@thiasaef Are you really sure that it works?
I made the same mistake before that the states were not cleared to the test IPs and resulted the same as before. Make sure to kill all states to the testing server before testing again.

thiasaef

@mind12 it definitely works if I add the -I flag to the traceroute command, but the 1st floating rule (policy routing traceroute workaround) seems to have nothing to do with it.

I logged the outgoing traceroute traffic both with and without the -I flag using Wireshark, but I could not find any packets of the ICMP subtype: Traceroute.

I would be glad if someone with more expertise than us would chime in on this.