25.03 beta - Bufferbloat / FQ CoDel issues
-
@RobbieTT
https://www.waveform.com/tools/bufferbloat
And what does it show here? -
As mentioned, it still gives me an A+ but the score does not reflect the issues now seen at higher flows:
It's one of the aspects that confused me until I worked out the limitations of this site (at least using it from here in the UK).
️
-
@RobbieTT
Hmm, interesting, really.
Have you tested it on 24.11 already? I mean this Apple network quality tool. -
Not that recently but all was ok back then so didn't appreciate the differing flow generation capabilities between it and the online tools as they all gave similar results then. I guess you don't look that hard when all is well.
The Apple / IETF tool came with macOS Mojave, so it's been around for a few years now. I was still rocking a EdgeRouter back then and it did a pretty good job with pppoe and fq_codel, so not much to see.
Looking into my current issue in a bit more detail I can see that it is only real-world noticeable when there is heavy traffic & flows in both directions (ie simultaneously). Running tests sequentially shows that upload is more impacted than download.
Running pure download I get full bandwidth, low latency and good responsiveness scores. That gives me something to focus on tomorrow. Of course, simultaneous tests are not really reflected in the online buffer bloat tests. Another reason why my real-world performance is bad and yet I get a reassuring A+ on waveform.com.
Wish I had more bandwidth to throw around or at least a symmetrical service...
️
-
@RobbieTT
I see something similar only on a wireless connection, but it's always been like that. I just tested fast.com with 16 streams, and the jitter didn’t exceed 7 ms on the wired connection. This was without any limiters applied — I’ll test it later with limiters as well.But I think that for my 1 Gbps symmetrical connection, even 16 or 30 streams may not be enough to fully saturate it. It probably requires something like 160 streams, and I don’t see any way to achieve that — I don’t have any Apple devices anyway.
Edit:
This is what I see with fast.com 30 connections. Drops are only on upload pipe.
-
@w0w
Similar results on fast.com for me, with my normal fq_codel settings. There is a drop in throughput between 8 and 16 streams though. Not that I find fast.com to be particularly trustworthy as it sometimes reports throughput well beyond my max bandwidth:16 streams:
8 streams:
I think the main issue I have is only apparent whilst at (or near) being fully loaded in both directions; fast.com only tests sequentially rather than simultaneously. So isn't enough of a trigger. My bandwidth is quite asymmetric but it is all I can get.
The old pppoe backend seems to cope better when tapping on the upload and download limits at the same time - albeit to do so it took a fast CPU to cope with the load on a single core; my Netgate 6100 would struggle with this but was pretty easy for my Xeon system.
Perhaps if_pppoe has an issue that only manifests on simultaneous loads as it share the workload across multiple cores, or perhaps the fq_codel implementation is now running into issues with pppoe on multiple cores/flows/directions?
️
-
@RobbieTT
Your fast.com settings are just too weak. Here's how I use it:
But of course, I admit that it's much easier to run into bufferbloat issues on a 100 Mbps connection. I also assume that it’s enough to overload a 100 Mbps upstream channel for bufferbloat to become noticeable.
By the way, what are your shaper settings? What does Diagnostics – Limiter Info show?
And what about the power-saving settings, by the way? They were changed for newer hardware in version 23.05, weren't they? -
Working fast.com harder doesn't really change my results. Presumably because the download and upload sessions are sequential:
Doing the fast.com run above my limiters looked like this for download:
And for upload:
Going through the data I think tweaking the upload bandwidth down on my fq_codel settings may help for simultaneous upload+download sessions. I can only refine that on the Apple / IETF tool though.
Yes, the power saving was changed in 23.x and 24.x. 25.03 also had an Intel microcode change but not looked into the details. Either way, the sleep settings are not a factor and the CPU isn't working that hard throughout the tests. I could be hitting a NIC limitation but both the relevant NIC hardware are reasonably competent and should have margin to spare.
️
-
@RobbieTT
Yeah, interesting...
If possible, I’d repeat the tests on version 24.11 — do you still have an old boot environment? Just in case the issue turns out to be caused by some changes on the provider’s side. -
@w0w
Ok, switched back to 24.11 and ran the Apple tool again:rob@Smaug ~ % networkQuality ==== SUMMARY ==== Uplink capacity: 90.237 Mbps Downlink capacity: 805.436 Mbps Responsiveness: High (33.661 milliseconds | 1782 RPM) Idle Latency: 12.625 milliseconds | 4752 RPM rob@Smaug ~ %
Responsiveness score returns back to 'High' again.
I find it perplexing that the older firmware with single-core PPPoE is, in this regard, working better than multiple cores with if_pppoe.It was a valid idea to double check again though.
Edit: Scratch the above for now as I think I found a misplaced patch being applied when it should not have been. This may have polluted my real-world experience and the testing....
️
-
I'm also starting to recall and analyze a bit what's going on with these traffic limiters. It's actually quite interesting that I'm seeing packet drops on the PPPoE upload, even though I haven’t set any actual bandwidth limit. It's configured to the maximum. Still, under load—though it's actually below 1 Gbit/s—I’m seeing drops specifically on the upload, on PPPoE using the new backend. I haven’t tested it yet on the old backend. However, I did test it on the second provider (which is behind triple NAT through ROOter using a 5G mobile network). Yes, I have Multi-WAN, but the second provider is only used for failover. So... either I didn’t notice, or under the same test conditions as before, I’m not seeing any drops at all on the second WAN, which is ~200/~50Mbit/s. Obviously, the same limiters are in place, and the bandwidth cap is still 1 Gbit/s, but logically, it shouldn't be active in either case, right?
Edit: just tested using old PPPoE backend, same drops on the upload pipe. -
@w0w
Some of your fq_codel setting are really demanding though.With a usual latency variance over the internet of around ±1ms or more (when unloaded) and with a usual setting of 5ms on fq_codel, you have a setting of 1µs. That's quite brutal I guess and probably more suited to use inside a data centre than over the net.
My router crashed in the early hours for no explicable reason, so my testing today was borked. Outside of testing or configuration changes it's my first ever hard crash of pfSense.
️
-
@RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:
Some of your fq_codel setting are really demanding though
Those are new default settings, I think. I have seen something on redmine regarding it, but... Ignored it
@RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:
My router crashed in the early hours for no explicable reason, so my testing today was borked
It just happens sometimes, any crash dumps available?
-
@w0w said in 25.03 beta - Bufferbloat / FQ CoDel issues:
@RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:
Some of your fq_codel setting are really demanding though
Those are new default settings, I think. I have seen something on redmine regarding it, but... Ignored it
@RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:
My router crashed in the early hours for no explicable reason, so my testing today was borked
It just happens sometimes, any crash dumps available?
Hi @w0w - I'm curious about this too. Where did you see that there might be new defaults on FQ CoDel parameters? Unless I missed it and that particular traffic shaping algorithm was changed / improved, 1us seems way too low. Thanks in advance.
-
@tman222 said in 25.03 beta - Bufferbloat / FQ CoDel issues:
Where did you see that there might be new defaults on FQ CoDel parameters?
https://redmine.pfsense.org/issues/16037
And this is what I see when I select an already created limiter — but you also don’t see any of those parameters when creating one...
And when you try to create the new one
I don't really think those are new defaults, because all the fq-codel man pages I can find on the web reference the same 5ms value that @RobbieTT mentioned.
-
@w0w said in 25.03 beta - Bufferbloat / FQ CoDel issues:
It just happens sometimes, any crash dumps available?
No crash log or anything of note in the usual logs. It just stopped doing its stuff.
️
-
@w0w said in 25.03 beta - Bufferbloat / FQ CoDel issues:
@tman222 said in 25.03 beta - Bufferbloat / FQ CoDel issues:
Where did you see that there might be new defaults on FQ CoDel parameters?
And this is what I see when I select an already created limiter — but you also don’t see any of those parameters when creating one...
I don't really think those are new defaults, because all the fq-codel man pages I can find on the web reference the same 5ms value that @RobbieTT mentioned.
The defaults can be messed up and showing zero, according to the redmine. The pfSense manual still has the correct defaults listed.
You do see the parameters when creating a new one, only that they do not appear until you set and save that page. If you look closely on your screenshot, below Scheduler: FQ_CODEL, you will see this note:
Save this limiter to see algorithm parameters.
Caution, coffee may be hot etc.
It catches many of us out when we haven't set a new one in ages. It's a weird UI human factor fail thing and I have no idea why pfSense makes it so complicated compared to other routers.
As Douglas Adams would have it "It's a black panel with a black button that lights-up black when you press it..."*
*Hotblack's ship, when he was spending a year dead, for tax reasons.
-
@RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:
Caution, coffee may be hot etc.
It catches many of us out when we haven't set a new one in ages.
Absolutely. Of course, that doesn’t change the fact that no one expects the default parameters to have values different from those stated in the documentation — or at the very least, everyone is used to trusting that those parameters actually exist and are being applied. I just didn’t check them myself, of course.
-
@w0w
No it doesn't and until your link to the redmine I had no idea it was a thing. It doesn't look like Netgate has addressed the issue, presumably because it is both intermittent and potentially unnoticed when new limiters are set.️
-
@RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:
Working fast.com harder doesn't really change my results. Presumably because the download and upload sessions are sequential:
They are.
The reasons is : a massive upload will not only saturation the upload pipe, but also use "a lot of" the download pipe.
After all, every TCP packet (about 1500 bytes in size) has to be acknowledged by an downstream "ACK", which will have the size of a minimal TCP ACK packet, or 46 bytes.
This means, you would lose 3 %.