Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    25.03 beta - Bufferbloat / FQ CoDel issues

    Scheduled Pinned Locked Moved Development
    26 Posts 4 Posters 760 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • w0wW
      w0w @RobbieTT
      last edited by

      @RobbieTT
      https://www.waveform.com/tools/bufferbloat
      And what does it show here?

      RobbieTTR 1 Reply Last reply Reply Quote 0
      • RobbieTTR
        RobbieTT @w0w
        last edited by

        @w0w

        As mentioned, it still gives me an A+ but the score does not reflect the issues now seen at higher flows:

         2025-05-09 at 16.58.05.png

        It's one of the aspects that confused me until I worked out the limitations of this site (at least using it from here in the UK).

        ☕️

        w0wW 1 Reply Last reply Reply Quote 1
        • w0wW
          w0w @RobbieTT
          last edited by

          @RobbieTT
          Hmm, interesting, really.
          Have you tested it on 24.11 already? I mean this Apple network quality tool.

          RobbieTTR 1 Reply Last reply Reply Quote 0
          • RobbieTTR
            RobbieTT @w0w
            last edited by RobbieTT

            @w0w

            Not that recently but all was ok back then so didn't appreciate the differing flow generation capabilities between it and the online tools as they all gave similar results then. I guess you don't look that hard when all is well.

            The Apple / IETF tool came with macOS Mojave, so it's been around for a few years now. I was still rocking a EdgeRouter back then and it did a pretty good job with pppoe and fq_codel, so not much to see.

            Looking into my current issue in a bit more detail I can see that it is only real-world noticeable when there is heavy traffic & flows in both directions (ie simultaneously). Running tests sequentially shows that upload is more impacted than download.

            Running pure download I get full bandwidth, low latency and good responsiveness scores. That gives me something to focus on tomorrow. Of course, simultaneous tests are not really reflected in the online buffer bloat tests. Another reason why my real-world performance is bad and yet I get a reassuring A+ on waveform.com.

            Wish I had more bandwidth to throw around or at least a symmetrical service...

            ☕️

            w0wW 1 Reply Last reply Reply Quote 0
            • w0wW
              w0w @RobbieTT
              last edited by w0w

              @RobbieTT
              I see something similar only on a wireless connection, but it's always been like that. I just tested fast.com with 16 streams, and the jitter didn’t exceed 7 ms on the wired connection. This was without any limiters applied — I’ll test it later with limiters as well.

              But I think that for my 1 Gbps symmetrical connection, even 16 or 30 streams may not be enough to fully saturate it. It probably requires something like 160 streams, and I don’t see any way to achieve that — I don’t have any Apple devices anyway.

              Edit:
              This is what I see with fast.com 30 connections. Drops are only on upload pipe.
              f87ddb02-5373-4a97-b402-f3a6eab843af-image.png

              RobbieTTR 1 Reply Last reply Reply Quote 0
              • RobbieTTR
                RobbieTT @w0w
                last edited by RobbieTT

                @w0w
                Similar results on fast.com for me, with my normal fq_codel settings. There is a drop in throughput between 8 and 16 streams though. Not that I find fast.com to be particularly trustworthy as it sometimes reports throughput well beyond my max bandwidth:

                16 streams:

                 2025-05-10 at 12.52.08.png

                8 streams:

                 2025-05-10 at 12.53.58.png

                I think the main issue I have is only apparent whilst at (or near) being fully loaded in both directions; fast.com only tests sequentially rather than simultaneously. So isn't enough of a trigger. My bandwidth is quite asymmetric but it is all I can get.

                The old pppoe backend seems to cope better when tapping on the upload and download limits at the same time - albeit to do so it took a fast CPU to cope with the load on a single core; my Netgate 6100 would struggle with this but was pretty easy for my Xeon system.

                Perhaps if_pppoe has an issue that only manifests on simultaneous loads as it share the workload across multiple cores, or perhaps the fq_codel implementation is now running into issues with pppoe on multiple cores/flows/directions?

                ☕️

                w0wW 1 Reply Last reply Reply Quote 0
                • w0wW
                  w0w @RobbieTT
                  last edited by

                  @RobbieTT
                  Your fast.com settings are just too weak. Here's how I use it:
                  684ec5a2-c506-4ec3-841c-54b4856e9337-image.png
                  But of course, I admit that it's much easier to run into bufferbloat issues on a 100 Mbps connection. I also assume that it’s enough to overload a 100 Mbps upstream channel for bufferbloat to become noticeable.
                  By the way, what are your shaper settings? What does Diagnostics – Limiter Info show?
                  And what about the power-saving settings, by the way? They were changed for newer hardware in version 23.05, weren't they?

                  RobbieTTR 1 Reply Last reply Reply Quote 0
                  • RobbieTTR
                    RobbieTT @w0w
                    last edited by RobbieTT

                    @w0w

                    Working fast.com harder doesn't really change my results. Presumably because the download and upload sessions are sequential:

                     2025-05-10 at 16.44.56.png

                    Doing the fast.com run above my limiters looked like this for download:

                     2025-05-10 at 16.42.54.png

                    And for upload:

                     2025-05-10 at 16.42.54.png

                    Going through the data I think tweaking the upload bandwidth down on my fq_codel settings may help for simultaneous upload+download sessions. I can only refine that on the Apple / IETF tool though.

                    Yes, the power saving was changed in 23.x and 24.x. 25.03 also had an Intel microcode change but not looked into the details. Either way, the sleep settings are not a factor and the CPU isn't working that hard throughout the tests. I could be hitting a NIC limitation but both the relevant NIC hardware are reasonably competent and should have margin to spare.

                    ☕️

                    w0wW GertjanG 2 Replies Last reply Reply Quote 1
                    • w0wW
                      w0w @RobbieTT
                      last edited by

                      @RobbieTT
                      Yeah, interesting...
                      If possible, I’d repeat the tests on version 24.11 — do you still have an old boot environment? Just in case the issue turns out to be caused by some changes on the provider’s side.

                      RobbieTTR 1 Reply Last reply Reply Quote 0
                      • RobbieTTR
                        RobbieTT @w0w
                        last edited by RobbieTT

                        @w0w
                        Ok, switched back to 24.11 and ran the Apple tool again:

                        rob@Smaug ~ % networkQuality             
                        ==== SUMMARY ====
                        Uplink capacity: 90.237 Mbps
                        Downlink capacity: 805.436 Mbps
                        Responsiveness: High (33.661 milliseconds | 1782 RPM)
                        Idle Latency: 12.625 milliseconds | 4752 RPM
                        rob@Smaug ~ % 
                        

                        Responsiveness score returns back to 'High' again.

                        I find it perplexing that the older firmware with single-core PPPoE is, in this regard, working better than multiple cores with if_pppoe.

                        It was a valid idea to double check again though.

                        Edit: Scratch the above for now as I think I found a misplaced patch being applied when it should not have been. This may have polluted my real-world experience and the testing....

                        ☕️

                        1 Reply Last reply Reply Quote 0
                        • w0wW
                          w0w
                          last edited by w0w

                          I'm also starting to recall and analyze a bit what's going on with these traffic limiters. It's actually quite interesting that I'm seeing packet drops on the PPPoE upload, even though I haven’t set any actual bandwidth limit. It's configured to the maximum. Still, under load—though it's actually below 1 Gbit/s—I’m seeing drops specifically on the upload, on PPPoE using the new backend. I haven’t tested it yet on the old backend. However, I did test it on the second provider (which is behind triple NAT through ROOter using a 5G mobile network). Yes, I have Multi-WAN, but the second provider is only used for failover. So... either I didn’t notice, or under the same test conditions as before, I’m not seeing any drops at all on the second WAN, which is ~200/~50Mbit/s. Obviously, the same limiters are in place, and the bandwidth cap is still 1 Gbit/s, but logically, it shouldn't be active in either case, right?
                          Edit: just tested using old PPPoE backend, same drops on the upload pipe.

                          RobbieTTR 1 Reply Last reply Reply Quote 0
                          • RobbieTTR
                            RobbieTT @w0w
                            last edited by

                            @w0w
                            Some of your fq_codel setting are really demanding though.

                            With a usual latency variance over the internet of around ±1ms or more (when unloaded) and with a usual setting of 5ms on fq_codel, you have a setting of 1µs. That's quite brutal I guess and probably more suited to use inside a data centre than over the net.

                            My router crashed in the early hours for no explicable reason, so my testing today was borked. Outside of testing or configuration changes it's my first ever hard crash of pfSense.

                            ☕️

                            w0wW 1 Reply Last reply Reply Quote 0
                            • w0wW
                              w0w @RobbieTT
                              last edited by

                              @RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                              Some of your fq_codel setting are really demanding though

                              Those are new default settings, I think. I have seen something on redmine regarding it, but... Ignored it 😁

                              @RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                              My router crashed in the early hours for no explicable reason, so my testing today was borked

                              It just happens sometimes, any crash dumps available?

                              T RobbieTTR 2 Replies Last reply Reply Quote 0
                              • T
                                tman222 @w0w
                                last edited by

                                @w0w said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                                @RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                                Some of your fq_codel setting are really demanding though

                                Those are new default settings, I think. I have seen something on redmine regarding it, but... Ignored it 😁

                                @RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                                My router crashed in the early hours for no explicable reason, so my testing today was borked

                                It just happens sometimes, any crash dumps available?

                                Hi @w0w - I'm curious about this too. Where did you see that there might be new defaults on FQ CoDel parameters? Unless I missed it and that particular traffic shaping algorithm was changed / improved, 1us seems way too low. Thanks in advance.

                                w0wW 1 Reply Last reply Reply Quote 0
                                • w0wW
                                  w0w @tman222
                                  last edited by w0w

                                  @tman222 said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                                  Where did you see that there might be new defaults on FQ CoDel parameters?

                                  https://redmine.pfsense.org/issues/16037

                                  And this is what I see when I select an already created limiter — but you also don’t see any of those parameters when creating one...

                                  dec7c970-e1de-4e27-b1f5-7c0aeb280913-image.png
                                  And when you try to create the new one
                                  1c5b29fd-5adc-4b5c-89f6-e36fdff28a4c-image.png

                                  I don't really think those are new defaults, because all the fq-codel man pages I can find on the web reference the same 5ms value that @RobbieTT mentioned.

                                  RobbieTTR 1 Reply Last reply Reply Quote 0
                                  • RobbieTTR
                                    RobbieTT @w0w
                                    last edited by

                                    @w0w said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                                    It just happens sometimes, any crash dumps available?

                                    No crash log or anything of note in the usual logs. It just stopped doing its stuff.

                                    ☕️

                                    1 Reply Last reply Reply Quote 0
                                    • RobbieTTR
                                      RobbieTT @w0w
                                      last edited by

                                      @w0w said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                                      @tman222 said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                                      Where did you see that there might be new defaults on FQ CoDel parameters?

                                      And this is what I see when I select an already created limiter — but you also don’t see any of those parameters when creating one...

                                      I don't really think those are new defaults, because all the fq-codel man pages I can find on the web reference the same 5ms value that @RobbieTT mentioned.

                                      The defaults can be messed up and showing zero, according to the redmine. The pfSense manual still has the correct defaults listed.

                                      You do see the parameters when creating a new one, only that they do not appear until you set and save that page. If you look closely on your screenshot, below Scheduler: FQ_CODEL, you will see this note:

                                      Save this limiter to see algorithm parameters.

                                      Caution, coffee may be hot etc.

                                      It catches many of us out when we haven't set a new one in ages. It's a weird UI human factor fail thing and I have no idea why pfSense makes it so complicated compared to other routers.

                                      As Douglas Adams would have it "It's a black panel with a black button that lights-up black when you press it..."*


                                      *Hotblack's ship, when he was spending a year dead, for tax reasons.

                                      w0wW 1 Reply Last reply Reply Quote 1
                                      • w0wW
                                        w0w @RobbieTT
                                        last edited by

                                        @RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                                        Caution, coffee may be hot etc.

                                        It catches many of us out when we haven't set a new one in ages.

                                        Absolutely. Of course, that doesn’t change the fact that no one expects the default parameters to have values different from those stated in the documentation — or at the very least, everyone is used to trusting that those parameters actually exist and are being applied. I just didn’t check them myself, of course.

                                        RobbieTTR 1 Reply Last reply Reply Quote 0
                                        • RobbieTTR
                                          RobbieTT @w0w
                                          last edited by

                                          @w0w
                                          No it doesn't and until your link to the redmine I had no idea it was a thing. It doesn't look like Netgate has addressed the issue, presumably because it is both intermittent and potentially unnoticed when new limiters are set.

                                          ☕️

                                          1 Reply Last reply Reply Quote 0
                                          • GertjanG
                                            Gertjan @RobbieTT
                                            last edited by

                                            @RobbieTT said in 25.03 beta - Bufferbloat / FQ CoDel issues:

                                            Working fast.com harder doesn't really change my results. Presumably because the download and upload sessions are sequential:

                                            They are.
                                            The reasons is : a massive upload will not only saturation the upload pipe, but also use "a lot of" the download pipe.
                                            After all, every TCP packet (about 1500 bytes in size) has to be acknowledged by an downstream "ACK", which will have the size of a minimal TCP ACK packet, or 46 bytes.
                                            This means, you would lose 3 %.

                                            No "help me" PM's please. Use the forum, the community will thank you.
                                            Edit : and where are the logs ??

                                            RobbieTTR 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.