Playing with fq_codel in 2.4

mind12

@pentangle
This is a great guide except the firewall rules, where you should use Match action instead of Pass, because rule evaluation will stop with Quick checked and you will unintentionally allow any TCP/UDP traffic in and out of your WAN.
The pfsense docs also recommend Match action for limiters.

"The match action is unique to floating rules. A rule with the match action will not pass or block a packet, but only match it for purposes of assigning traffic to queues or limiters for traffic shaping. Match rules do not work with Quick enabled."
https://docs.netgate.com/pfsense/en/latest/firewall/floating-rules.html#match-action

fabrizior

@mind12 said in Playing with fq_codel in 2.4:

https://docs.netgate.com/pfsense/en/latest/firewall/floating-rules.html#match-action

For reals?
Why has this never come up in the many threads about how to configure the fq_codel limiter?

Also, after setting all this up, traceroute on router is working fine, but on LAN clients, traceroute is now only showing the final IP/FQDN for all hops unexpectedly.

What the hell did I break now?

mind12

@fabrizior
It has come up, I saw it also in this thread. I was also shocked that even in Tom's from Lawrence Systems video was a pass rule not a match.
Of course if there is no NAT that could match a connection they can't connect but the possibility is there.

Traceroute needs another floating rule to work if you use a limiter:
https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/814

fabrizior

@Pentangle
Thanks for the specific recommendations.

I've done as you suggest and dropped/rebuilt my limiters and firewall rules.

I did make two changes to the firewall rules though (just had to...) the WAN-IN rules for IPv6 and IPv4 I did as one, setting Address Family to IPv4+IPv6 - since there is no gateway definition needed, this seemed reasonable.
I also set to Action = Match vs Pass (and quick = not-ticked) all all floating rules for this config.

None of these changes (yours or my deviations herein) has made any difference in reducing my bufferbloat latency results.

I do see where this config provides benefit, keeping latencies consistent (such as they are), when under high load.

I tested under load by adding a 10GB download using 100 sockets set to consume 240Mbit/s. This ran for about 5m39s from another LAN client to a remote internet service while I ran the dslreports speedtest twice from my primary desktop (24-core, 64GB RAM, 2x1gE bonded NICs). The two tests I ran under load (w/ & w/o limiters) were both completed during the 10GB download period.

WAN Limiter bandwidth was set at 415Mbit/s (down) and 22 Mbit/s (up) for these tests.

Test                              Bandwidth       Latency
# Conditions                  (Down/Up Mbit/s)  (Down/Up ms)
1 lim: disabled, load:  no         452/22          89/65
2 lim: disabled, load:  no         460/24          64/66
3 lim: disabled, load:  no         472/24          68/66
4 lim: disabled, load:  no         426/25          62/67
5 lim:   415/22, load:  no         390/21          53/34
6 lim:   415/22, load: yes         116/16          42/33
7 lim: disabled, load: yes         191/21          96/62

Seems interesting that download latency was lower under load than without while the limiters were enabled. If this was at work... I would probably run multiple tests per each set of conditions and pull min/max/median/std.dev... but this data at least leads me to conclude that the limiters are functional and of benefit under load, even with the seemingly higher than desired latency floor.

Is this really bufferbloat, with the limiters turned on, or simply the lag to whatever testing servers dslReports is picking? no-load latency dropped substantially with the limiters turned on; just not as low as desired.

With regards to the loss of maximum bandwidth (for the sake of stable latency) am I correct in my understanding that with this baseline, I can re-test with higher limiter bandwidth settings to figure out how high I can crank it up before my latency numbers become undesirably unstable?

fabrizior

@mind12 said in Playing with fq_codel in 2.4:

https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/814

floating rule (pass, quick) for IPv4 ICMP traceroute is set, as is the one for ICMP echoreq/echorep.
LAN clients tracceroute still not working properly.
What's next in how to triage this?

@mind12 @pentangle @TheNarc Thanks a LOT for your time. I appreciate it.

mind12

@fabrizior
I have just tested it, it also failed on one of my windows clients.
I added the ICMP floating rule per the guide I sent you and it's working now:

Add quick pass floating rule to handle ICMP echo-request and echo-reply. This rule matches ping packets so that they are not matched by the limiter rules. See bug 9024 for more info.

Action: Pass
Quick: Tick Apply the action immediately on match.
Interface: WAN
Direction: any
Address Family: IPv4
Protocol: ICMP
ICMP subtypes: Echo reply, Echo Request
Source: any
Destination: any
Description: limiter drop echo-reply under load workaround
Click Save

Pentangle

@mind12 You are of course correct - typo from me, should be "match"

Pentangle

@mind12 "Traceroute needs another floating rule to work if you use a limiter:"

This is why I chose tcp/udp rather than "any", since it was a problem with ICMP.

Pentangle

@fabrizior "None of these changes (yours or my deviations herein) has made any difference in reducing my bufferbloat latency results."

I did fear that that would be the case, since with a 450/20 connection you would have to be going some to continually saturate your internet connection. In which case, if your pfSense throughput isn't going anywhere near capacity whilst you're testing then the issue lies upstream of you, and there's not a lot you can do about that aside from shouting at your ISP.

Pentangle

@fabrizior "Is this really bufferbloat, with the limiters turned on, or simply the lag to whatever testing servers dslReports is picking? no-load latency dropped substantially with the limiters turned on; just not as low as desired."

It's bufferbloat, just not YOUR bufferbloat. Your internet connection between you and the next hop (which is likely to be the DSLAM or equivalent fibre concentrator in your ISP's phone exchange) is not now suffering bufferbloat to any great extent, but the pipe connecting the phone exchange to the wider internet is likely to have latency or oversubscription issues. If you can find a pingable next hop that doesn't incorporate all that additional latency then you could test to the nth degree, but for now it's probably pretty pointless.

"With regards to the loss of maximum bandwidth (for the sake of stable latency) am I correct in my understanding that with this baseline, I can re-test with higher limiter bandwidth settings to figure out how high I can crank it up before my latency numbers become undesirably unstable?"

You can test as much as you like. You will find that as you start specifying bandwidth limits that are close to the maximum the line can reliably tolerate, your latency will go down the toilet, so it's good to get to that point and then crank it back a notch so that you're never hitting that scenario.

mind12

@pentangle
I have used your configuration and achieved A grade. Thank you.

The DSL reports site gives me much lower speeds than speedtest.net with the same configuration. So I searched for another test site and found this. Got similar speeds like on speedtest.net and the same A grade.
https://www.waveform.com/tools/bufferbloat

Which configuration should I modify to achieve even better results?
My connection is a 150/10.

bolmsted

Can the developers comment when/if we will see support for BBR TCP Congestion Control in pfSense/OPNsense? I understand that it may be a feature of FreeBSD 13.x and if HardenedBSD is based on the same release level we should expect this to show up in HBSD 13.x as well when OPNsense integrates HBSD 13.x?

There is nothing official on this but I came across links like this
https://fasterdata.es.net/host-tuning/freebsd/
https://lists.freebsd.org/pipermail/freebsd-current/2020-April/075930.html

The FreeBSD list doesn't really refer to which version of FreeBSD with the non CC method.

Linux has this integrated for quite some time but there is no elegant front end interface like pfSense or OPNsense for Linux - too bad it wasn't ported to Linux as well with nftables backend. If I build on linux I have to do all the setup from scratch link DNS, DHCP, firewalls, etc and not a big fan of the debian model for network configs and would much prefer the RHEL model.

I have a gigabit internet connection and they sync the GPON at 2.5Gbps which means that on upload I can actually exceed the provisioned upload rate causing packet loss, packet retransmissions, and performance degradation - thus not getting the provisioned upload rate.

I have 1Gbps/750Mbps but I should be able to get 1G/1G since they overprovision the line and right now I get 930/660 usually but if I switch to 1Gbps sync then I get 700/930 so it flips around so it seems realistic I should be able to get 1G/1G if I could prevent flooding using BBR on FreeBSD/HBSD under pfSense/OPNsense. I may consider 1.5G/1G package one day but 1G plan is good for me especially if I could get full upload rate.

Some folks on a forum I hang out show results using Linux with BBR when syncing at 1Gbps and 2.5Gbps on a 1.5Gbps/1000Gbps plan.

This is what it looks like for me under various conditions. performance loss is affected by latency.

1000 (No Throttle) Latency Down Up
Bell Alliant - Halifax (3907) 24 792 919
Bell Canada - Montreal (17567) 1 901 933
Bell Canada - Toronto (17394) 8 765 933
Bell Mobility - Winnepeg (17395) 30 874 913
Bell Mobility - Calgary (17399) 47 782 898

2500 (No Throttle) Latency Down Up
Bell Alliant - Halifax (3907) 24 1564 358
Bell Canada - Montreal (17567) 1 1638 933
Bell Canada - Toronto (17394) 8 1619 771
Bell Mobility - Winnepeg (17395) 30 1569 324
Bell Mobility - Calgary (17399) 47 1570 236

2500 (Throttled 1000 up) Latency Down Up
Bell Alliant - Halifax (3907) 24 1558 915
Bell Canada - Montreal (17567) 1 1650 931
Bell Canada - Toronto (17394) 8 1620 931
Bell Mobility - Winnepeg (17395) 30 1569 885
Bell Mobility - Calgary (17399) 47 1600 890

2500 (Throttled 1000 up +bbr ) Latency Down Up
Bell Alliant - Halifax (3907) 24 1541 922
Bell Canada - Montreal (17567) 1 1639 940
Bell Canada - Toronto (17394) 8 1583 936
Bell Mobility - Winnepeg (17395) 30 1617 928
Bell Mobility - Calgary (17399) 47 1547 916

Pentangle

@mind12 I'd say you would now need to benchmark your ISP. By this I mean that your connection to the internet does not just consist of your connection to the hole in your wall, but also the structure of your ISP's network and its peering to the internet. As such, you may want to bring down your limiter bandwidth settings, just by maybe 5mbit/s each, and see again whether your latency increases by roughly +21ms and +8ms again. If it's constant then there's nothing much else you can do on your end of the link and that latency increase may be due to whatever's in your ISP's network or how they deal with your packets when they get them. Once you've done this you can slowly increase the bandwidth settings again until you get to a point whereby the latency goes through the roof. This would then be the speed at which you can send through your pfsense instance and you should then drop it back a bit to ensure your limiters can do their stuff as much as possible. Then once you've finished your testing you should be able to see whether your ISP is adding a reasonable amount of latency or not. Personally, from here I wouldn't be too unhappy with a 23ms upstream latency but a 36ms downstream latency looks a little suboptimal (although a lot depends upon a number of factors probably beyond your control).

Gertjan

@pentangle : your keyboard seems broken ...

Pentangle

@gertjan said in Playing with fq_codel in 2.4:

@pentangle : your keyboard seems broken ...

In what way?

mind12

@pentangle Thanks I will try this.
Those results are not that bad at all, I just would like to maximally optimize my limiter.

What if it doesnt change by lowering the bandwidth?

How will I know if my limiter is the bottleneck? The other advanced parameters could also affect the results right? Queue length, limit flows etc.

Pentangle

@mind12 If the results don't change then you are within your bandwidth limits. The idea being that the limiters and FQ_CoDel need a certain amount of 'headroom' in order to operate (shuffling smaller packets to the front of the queue, etc), and so it'll operate well until it doesn't have enough 'headroom' to play with. You need to determine that headroom (bandwidth limit) and the easiest way to do it is to edge the bandwidth limits up until latency takes a nosedive at which point you know that your FQ_CoDel's efficiency is being impaired by the amount of headroom it has to play with, so you dial it back a notch and voila - the fastest throughput you could get whilst retaining low latency.
As regards other limits, I suggest you do a little reading on what those do for you. The settings I gave should be more than adequate for your connection - they work well with my 300/50 connection here.

fabrizior

@mind12 I drove myself nuts trying to tune the limiter & queue knobs at various limiter bandwidth settings on a 400/25 mbps service with and without load (100 sockets generating ~30+ MB/s download throughput).

What I’ve learned from folks here (thank you):

The only setting that induced definitive change in my test results was the bandwidth limit. The other settings recommended seem sufficiently high to allow proper functionality under load while varying the limiter bandwidth as suggested to identify the required headroom.

The general consensus seems to be that, once configured for appropriate headroom based on your provisioned rates, any variable and higher-than-desired latency results are likely induced somewhere upstream with the ISP’s equipment suffering from bufferbloat (or over-provisioning).

If speed tests and bufferbloat latency numbers stay fairly stable with and without significant load running in parallel with the test, then your side of things is well-tuned.

@pentangle said in Playing with fq_codel in 2.4:

@mind12 If the results don't change then you are within your bandwidth limits. The idea being that the limiters and FQ_CoDel need a certain amount of 'headroom' in order to operate (shuffling smaller packets to the front of the queue, etc), and so it'll operate well until it doesn't have enough 'headroom' to play with. You need to determine that headroom (bandwidth limit) and the easiest way to do it is to edge the bandwidth limits up until latency takes a nosedive at which point you know that your FQ_CoDel's efficiency is being impaired by the amount of headroom it has to play with, so you dial it back a notch and voila - the fastest throughput you could get whilst retaining low latency.
As regards other limits, I suggest you do a little reading on what those do for you. The settings I gave should be more than adequate for your connection - they work well with my 300/50 connection here.

MoonKnight

Got this test without playing with fq_codel
I do have 500/500 connection

Pentangle

@ciscox With a 500/500 connection, unless you're regularly maxing it out (unlikely) then you might not find shaping is necessary for you.