QoS / Traffic Shaping / Limiters / FQ_CODEL on 22.05

emikaadeo

My current anti-bufferbloat config is a combination of this two guides:
https://docs.netgate.com/pfsense/en/latest/recipes/codel-limiters.html
https://isc.sans.edu/forums/diary/Securing+and+Optimizing+Networks+Using+pfSense+Traffic+Shaper+Limiters+to+Combat+Bufferbloat/27102/
It gives me a score A/A+ on https://www.waveform.com/tools/bufferbloat with 400down/15up ISP line.

My config:

WAN Upload Limiter

Firewall > Traffic Shaper > Limiters > New Limiter
• Enable > Enable limiter and its children [X]
• Name: WANUpload
• Bandwidth: 95% of your ISP upload speed
• Mask: None
• Description: empty
• Queue Management Algorithm: Tail Drop
• Scheduler: FQ_CODEL
• Queue length: 1000
• ECN: [X]
target: 5
interval: 100
quantum: 300
limit: 10240
flows: 1024

WAN Upload Limiter Queue

• Enable > Enable this queue [X]
• Name: WANUploadQ
• Mask: None
• Description: empty
• Queue Management Algorithm: Tail Drop
• Queue Length: empty
• ECN: [ ]

WAN Download Limiter

Firewall > Traffic Shaper > Limiters > New Limiter
• Enable > Enable limiter and its children [X]
• Name: WANDownload
• Bandwidth: 95% of your ISP download speed
• Mask: None
• Description: empty
• Queue Management Algorithm: Tail Drop
• Scheduler: FQ_CODEL
• Queue length: 1000
• ECN: [X]
target: 5
interval: 100
quantum: 300
limit: 10240
flows: 1024

WAN Download Limiter Queue

• Enable > Enable this queue [X]
• Name: WANDownloadQ
• Mask: None
• Description: empty
• Queue Management Algorithm: Tail Drop
• Queue Length: empty
• ECN: [ ]

Floating Rule #1

Firewall > Rules > Floating

• Action: Pass
• Quick: Apply the action immediately on match [X]
• Interface: WAN
• Direction: out
• Address Family: IPv4
• Protocol: ICMP
• ICMP subtypes: Traceroute
• Source: any
• Destination: any
• Description: Traceroute routing workaround
• Advanced Options > Gateway: WAN_DHCP

Floating Rule #2

• Action: Pass
• Quick: Apply the action immediately on match [X]
• Interface: WAN
• Direction: out
• Address Family: IPv4
• Protocol: ICMP
• ICMP subtypes: Echo reply, Echo request
• Source: any
• Destination: any
• Description: Limiter drop ping traffic under load workaround
• Advanced Options > Gateway: WAN_DHCP

Floating Rule #3

• Action: Pass
• Quick: Apply the action immediately on match [X]
• Interface: WAN
• Direction: out
• Address Family: IPv4
• Protocol: any
• Source: WAN Address
• Destination: any
• Description: WAN CoDel Limiters
• Advanced Options:
Gateway: WAN_DHCP
In Pipe: WANUploadQ
Out Pipe: WANDownloadQ

TheNarc

@jackyaz To clarify, you're referring to the queue length of the child queue of the limiter, not the queue length of the limiter itself, right? As I understand, the latter is ignored when the scheduler is fq_codel? And what is the pipe limit system tuneable you refer to? I looked but it wasn't obvious to me by name, and I couldn't seem to find it by googling. Thanks for your response!

jackyaz

@thenarc I'm referring limiter itself, but I may be wrong, I haven't done any testing against lower limits on my 1000/50 connection.

in the docs from here: https://docs.netgate.com/pfsense/en/latest/recipes/codel-limiters.html

Queue Length
Can vary depending on the speed of the link, but 1000 should be a safe default for most high speed WANs (100Mbit/s). For very high speed WANs (e.g. 1Gbit/s+), consider increasing further to 3000-5000.

The tunable is referenced on https://docs.netgate.com/pfsense/en/latest/trafficshaper/limiters.html

Tip

In cases where there are several limiters or limiters with large Queue Size values, a System Tunable may need set to increase the value of net.inet.ip.dummynet.pipe_slot_limit above the total number of configured queue lots among all pipes and queues.

TheNarc

@jackyaz Thanks for the additional information. I tried these settings on my end and didn't notice any difference, but I'm beginning to believe this is my ISP and not pfSense. I've got a consumer router to try swapping out later today; I just have to wait a bit so as not to knock out other people in the house working from home. I'll provide an update with those test results as soon as I have them.

TheNarc

@thenarc Well, I ran some tests with an old Trendnet router I had around and results were the same. It has some rudimentary bandwidth limiting too, and I'm finding that whether I use it or pfSense, I can limit my 400Mbps downstream all the way to 50Mbps or less and I still get catastrophic latency spikes (up to 1s or more) if I run a multi-stream download test (again, I'm just using the fast.com test with the parallel connections maxed out at 30). I'm not really certain how to diagnose further (could it be my cable modem, for example? It's a SB6180, which is not a Puma6 modem) or just crappy ISP configuration? In either case, I'm satisfied based on this testing that it's not pfSense, but I still have no solution :)

luckman212

@emikaadeo Thanks for posting this. I don't know what's wrong, but I've been at this all day today and gotten nowhere.

Can someone explain how the below is even possible?

As a test I set my bandwidth to 25Mbit/s to see if the limiter was even working at all...

# cat /tmp/rules.limiter

pipe 1 config  bw 25Mb droptail
sched 1 config pipe 1 type fq_codel target 5ms interval 100ms quantum 300 limit 10240 flows 1024 ecn
queue 1 config pipe 1 droptail

pipe 2 config  bw 25Mb droptail
sched 2 config pipe 2 type fq_codel target 5ms interval 100ms quantum 300 limit 10240 flows 1024 ecn
queue 2 config pipe 2 droptail

And yet...

luckman212

Just read through about 9 other threads reporting various breakage with ipfw limiters on 2.6 / 22.0x

Before I lose another day, @jimp or @stephenw10 is it the case that limiters are bugged on the latest builds of pfSense? Specifically for multi-wan setups with gateway groups? It would be nice to know, otherwise if the answer is "no, everything works fine" then I will keep trying or maybe even buy TAC to figure this out because it is driving me nuts.

Bob.Dig

@luckman212 Make sure you have IPv6 disabled on your machine, otherwise the test will not work correct.

thiasaef

@luckman212 said in QoS / Traffic Shaping / Limiters / FQ_CODEL on 22.0x:

is it the case that limiters are bugged on the latest builds of pfSense? Specifically for multi-wan setups with gateway groups?

They seem to work fine on my multi-wan setup with gateway groups.

Before I lose another day

Maybe throw in a quick downgrade to 2.4.5-p1 just to be sure?

luckman212

@bob-dig said:

@luckman212 Make sure you have IPv6 disabled on your machine, otherwise the test will not work correct.

Yes at this time I don't have IPv6 enabled at all.

@thiasaef said:

Maybe throw in a quick downgrade to 2.4.5-p1 just to be sure?

I can try that but it's a fair bit of work since my config has changed a lot since 2.5.x/22.x was released, and the configs are not backwards-compatible. So before doing that I'd like to know if I'm barking up the wrong tree here. Since you say it works for you, would you mind sharing how you've got it configured?

tman222

I'm not sure how helpful this will be, but I've got two separate locations both on 1Gbit/s FiOS circuits running pfSense 22.01 with limiters + FQ-Codel configured. No issues at either site. The instructions I followed for the limiter setup are these originally posted in the large FQ-Codel thread:

https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/814

The main difference I suppose is that I've only got the one FiOS connection at either location (i.e. no multi-wan or gateway groups configured).

Hope this helps.

thiasaef

@luckman212 said in QoS / Traffic Shaping / Limiters / FQ_CODEL on 22.0x:

would you mind sharing how you've got it configured?

Settings:

Firewall > Traffic Shaper > Limiters:
- WAN1Down/WAN1DownQ
- bandwidth: 265Mbps
- Queue mgmt algo: Tail Drop
- Scheduler: FQ_CODEL (5/100/1514/10240/8192)
- Queue length: empty
- ECN: not checked
Firewall > Rules > Floating:
- Action: Match
- Quick: unchecked
- Interface: WAN1
- Direction: out
- Family: IPv4
- Protocol: any
- Source: WAN1 address
- Dest: Any
- Gateway: WAN1
- In/Out Pipe: WAN1UpQ / WAN1DownQ

but it also works when I apply your exact settings (except for the different bandwidth).

@luckman212 said in QoS / Traffic Shaping / Limiters / FQ_CODEL on 22.0x:

triggering failover to my 4G LTE backup connection which does not have any shaper applied

As a side note, I also have a shaper on my 4G LTE backup that works wonders in terms of latency under load.

Bob.Dig

@luckman212 I did it exactly like what you already posted in your first post.

luckman212

@thiasaef What version of pfSense are you running there? Do you use gateway groups? What's your System > Routing > default gw IPv4 set to?

thiasaef

What version of pfSense are you running there?

2.6.0-RELEASE (amd64)

Do you use gateway groups?

What's your System > Routing > default gw IPv4 set to?

jimp

@luckman212 said in QoS / Traffic Shaping / Limiters / FQ_CODEL on 22.0x:

Just read through about 9 other threads reporting various breakage with ipfw limiters on 2.6 / 22.0x

Before I lose another day, @jimp or @stephenw10 is it the case that limiters are bugged on the latest builds of pfSense? Specifically for multi-wan setups with gateway groups? It would be nice to know, otherwise if the answer is "no, everything works fine" then I will keep trying or maybe even buy TAC to figure this out because it is driving me nuts.

There is a known issue with limiters if you also have Captive Portal enabled but that's the only problem I'm aware of at the moment:

https://redmine.pfsense.org/issues/12954

It's working fine for me on multi-WAN on my edge at home with this setup:

https://docs.netgate.com/pfsense/en/latest/recipes/codel-limiters.html

luckman212

@jimp Are you running 22.05 snaps on that system? Any possible chance you'd share a sanitized config.xml with me?

jimp

22.05 snapshot, yes, but I haven't updated that system in a couple weeks, it's on a snapshot from the 14th.

No need to share config, it's exactly as described on the docs page I linked. I wrote that based on the config I have been using successfully for months. Only difference is maybe the queue lengths since I have two fast WANs (1Gbit/s and 300Mbit/s) though I don't use limiters on my 1Gbit/s WAN since it's not necessary. I have to use the codel setup on my 300/30 WAN or the performance is crap under load.

A couple common mistakes people make:

Do not over-match with the floating rules. Outbound floating rules happen after NAT so the source can only be the IP address(es) on that interface, or perhaps routed IP address blocks if you have any. Don't use a source of 'any', private addresses, or the address of other WANs. For most people the best source to use is the interface address.
Don't re-use limiters for multiple interfaces/purposes. You should have one upload limiter+queue and one download limiter+queue for each WAN.
Some people might need or want to exclude ICMP traffic from being put in limiters. It can mess with traceroute results and maybe give a false sense of latency that doesn't really exist. That said, any traffic not put through the limiter will potentially mess with how accurate the limiter can be when it comes to knowing how full a circuit is.
Use large enough queue lengths on the limiter to hold any potential backlog. On my 300/30 WAN I'm using a queue length of 3000 on the limiter (parent) and I've left the default on the queues. Might be overkill, but it works for me.

TheNarc

@jimp Thanks for the common mistakes bullet points; in particular I don't recall having seen the limiter queue length guidance before so that's especially useful. Quick question on the floating rules: for a basic single-WAN setup is there still a compelling reason to match on WAN out and WAN in as opposed to LAN in and WAN in? I certainly understand that with multi-WAN you'd lose the granularity required to assign one limiter per WAN by matching on LAN in. But with single-WAN - and especially if ovpn client tunnels are in use - it has seemed more straightforward to me to match on LAN in. Probably a dumb question, but hoping to understand whether doing so may be problematic in a way I don't understand. Thanks again.

jimp

If you only have one WAN and one LAN and no VPNs then matching in on LAN may be OK. One of the main reasons to do it on WAN outbound is because there is no chance you are catching local traffic in the limiter (to/from the firewall, to/from other LANs, VPNs, other unrelated WANs, etc) -- there is a ton of room for error there so for most people it's much easier to take care of it outbound on WAN instead.

Sure you can setup a lot more rules to pass to the other destinations without the limiter but you end up adding so much extra complexity it's just not worth the effort to avoid using floating rules when it's a much cleaner solution.