• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Playing with fq_codel in 2.4

Traffic Shaping
123
1.1k
1.5m
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L
    luckman212 LAYER 8 @mattund
    last edited by Sep 26, 2018, 1:49 PM

    @mattund said in Playing with fq_codel in 2.4:

    EDIT: I'm continuing to dig into this more.

    In case you missed it:

    https://www.reddit.com/r/PFSENSE/comments/9j1h8u/244_codel_limiter_error/?st=JMJ7GJB0&sh=4db7939a

    1 Reply Last reply Reply Quote 0
    • M
      mattund @zwck
      last edited by mattund Sep 26, 2018, 1:58 PM Sep 26, 2018, 1:55 PM

      @zwck

      Personally, I don't use it on that side (upload), and I haven't noticed any performance loss. I am not sure where the idea came from to not use ECN on the outgoing queues, however in saying that I don't mean to discredit the idea. I have a limited understanding of what ECN actually accomplishes besides setting a TCP flag for the channel participants when the queue/link is at capacity, so I'll have to pass on saying much more than that. I will say, at first impression it seems as though it would help to have it set on the upload side. We may need to carefully benchmark it set on and with connections shared with ECN-supported hosts (not all support it).

      As for masks, I have played with them a little. I do believe they work still, and you can use them if you choose to. Personally, my setup is extremely basic so I don't need them configured outside of the default. FQ_CODEL will show one flow if you have one mask set up, by the way, usually 0.0.0.0/0 for the source and destination. From my experiences so far, this doesn't mean it's not working, it's just how it seems a dummynet scheduler configured as FQ_CODEL ingests streams. I think the developer of dummynet chose to use the internals of the scheduler type to determine the flow instead of using dummynet's capabilities of identifying unique flows, so maybe it "anonymizes" the traffic before heading into the FQ_CODEL code to save on CPU cycles.

      Unrelated to that post, I am looking into why people aren't able to configure their limiters after upgrading to 2.4.4. I had no trouble, however, I was using the 2.4.3 patch. I hope I'm not too late to help there

      1 Reply Last reply Reply Quote 0
      • W
        w0w
        last edited by w0w Sep 26, 2018, 2:01 PM Sep 26, 2018, 2:00 PM

        You don't need masks if you don't want additional features / filtering like evenly shared bandwidth. Anyway I've followed Netgate guide and even tried to change some settings wrongly, everything I've tried — does not affect bufferbloat nor bandwidth numbers at all.

        1 Reply Last reply Reply Quote 0
        • G
          gsmornot @w0w
          last edited by gsmornot Sep 26, 2018, 8:14 PM Sep 26, 2018, 2:21 PM

          @w0w said in Playing with fq_codel in 2.4:

          @gsmornot said in Playing with fq_codel in 2.4:

          pfblocker

          I do think that pfblocker have issues with limiters. That's reported before. Can you uninstall it and try again sometime?

          pfBlocker is not the issue, it's horsepower. My older router is a multicore PC that I replaced with the SG-3100 for power savings. The SG-3100 is fantastic but in this case is under powered for the needs of fq_codel. The older server is A+ across the board at nearly full gigabit rate without breaking a sweat. So, that answers that for me. I will stick with CODELQ and the SG-3100 because it has some advantages in being lower power and compact.

          Just for comparison, my older server peaks at about 10% at full rate without shaping, my SG-3100 needs 95% of the CPU. I bet my old server would be faster for my OpenVPN service as well but I will stick with compact.

          Shaping, the old server needs @26% to run the dslreports test. 2.1GHz Intel. The SG-3100 is full out. 1.6GHz ARM.

          Following a guide I found, I setup a FAIRQ shaper on the WAN with a CODEL Active Queue child. A+ across the board and less processor needed. I peaked at 60% roughly so sounds good to me.

          T 1 Reply Last reply Sep 26, 2018, 8:45 PM Reply Quote 1
          • T
            teh g @gsmornot
            last edited by Sep 26, 2018, 8:45 PM

            @gsmornot I'm running into similar CPU limits when attempting to use fq_codel. This subsequently caused a drop in outbound speeds I think. Running without the floating rules from the video I can hit my ~1000 Mb/s down, when the rules are enabled I only get ~650 Mb/s down and still get a C on bufferbloat.

            I'll have to give it a go with CODELQ.

            G 1 Reply Last reply Sep 26, 2018, 11:35 PM Reply Quote 0
            • S
              satadru
              last edited by Sep 26, 2018, 8:53 PM

              I setup an upload limiter for my primary WAN connection using the codel settings in the pfsense video from this summer here on 2.4.4: https://www.youtube.com/watch?v=o8nL81DzTlU&t=380

              This gave me an "A" on the dslreports speedtest site.

              With the floating rule set to send outgoing IPv4 traffic using a quick pass through the limiter, I get all sorts of problems with network connectivity on my LAN side. For instance my Google Home devices on my LAN side refused to connect.

              I have a failover WAN setup though, and it turns out that after setting this up for my primary WAN connection, my setup decided my primary WAN connection might be having a problem, and failed over to my backup (more expensive) WAN connection.

              Do any of you have a working setup with both a codel limiter and a failover WAN setup, and if so would you be willing to share your configuration?

              M 1 Reply Last reply Sep 26, 2018, 11:43 PM Reply Quote 0
              • G
                gsmornot @teh g
                last edited by Sep 26, 2018, 11:35 PM

                @teh-g said in Playing with fq_codel in 2.4:

                @gsmornot I'm running into similar CPU limits when attempting to use fq_codel. This subsequently caused a drop in outbound speeds I think. Running without the floating rules from the video I can hit my ~1000 Mb/s down, when the rules are enabled I only get ~650 Mb/s down and still get a C on bufferbloat.

                I'll have to give it a go with CODELQ.

                Same here. About 650 and packet loss on gateways.

                D 1 Reply Last reply Sep 27, 2018, 7:28 PM Reply Quote 0
                • M
                  mattund @satadru
                  last edited by mattund Sep 26, 2018, 11:49 PM Sep 26, 2018, 11:43 PM

                  @satadru said in Playing with fq_codel in 2.4:

                  Do any of you have a working setup with both a codel limiter and a failover WAN setup, and if so would you be willing to share your configuration?

                  On your floating rule, have you tried changing it to a match only rule (not pass), and turning off quick match? I didn't use the pass option, and I also don't use quick match like Jim shows. I do have smart home devices that all work perfectly fine and I can reach them without issue. They are even on a different VLAN + AP.

                  My floating rule(s):

                  • Action: Match
                  • Direction: out
                  • Interface: WAN
                  • Address Family: IPv4 or IPv6, but not both
                  • Protocol: TCP/UDP (to avoid the ICMP traceroute craziness others have demonstrated)
                  • Destination: Invert match, Single host or alias, RFC_1918 (an alias of mine, to prevent shaping to modem)
                  • Gateway: Self-explanatory; must be matching address family
                  • In / Out pipe: qWANUpload / qWANDownload

                  What hardware is everyone using? Shaping probably takes a bit of horsepower

                  ? S 2 Replies Last reply Sep 26, 2018, 11:58 PM Reply Quote 1
                  • ?
                    A Former User @mattund
                    last edited by Sep 26, 2018, 11:58 PM

                    @mattund that Protocol: TCP/UDP only avoids it for Windows, which uses ICMP.
                    Mac (I think) and other unix hosts use UDP so with this config, I still saw these loops.

                    I'm going to try and find a simple repro for it and log a ticket (not FQ_CODEL related, btw)

                    1 Reply Last reply Reply Quote 0
                    • W
                      w0w
                      last edited by Sep 27, 2018, 1:26 AM

                      I've searched a bit over internet and so far I've found that fq_codel is limited to use only one cpu core, so it definitely fails to achieve best results on some CPUs that have low horsepower per core. Also if you have igb NIC on WAN configured as PPPoE than you will be limited to one core performance as well.

                      T 1 Reply Last reply Sep 27, 2018, 1:36 AM Reply Quote 0
                      • T
                        teh g @w0w
                        last edited by Sep 27, 2018, 1:36 AM

                        @w0w said in Playing with fq_codel in 2.4:

                        I've searched a bit over internet and so far I've found that fq_codel is limited to use only one cpu core, so it definitely fails to achieve best results on some CPUs that have low horsepower per core. Also if you have igb NIC on WAN configured as PPPoE than you will be limited to one core performance as well.

                        That likely explains the performance hit I am seeing. I have a quad core with not the fastest speeds per core.

                        D 1 Reply Last reply Sep 27, 2018, 7:22 PM Reply Quote 0
                        • S
                          satadru @mattund
                          last edited by Sep 27, 2018, 1:58 AM

                          @mattund said in Playing with fq_codel in 2.4:

                          On your floating rule, have you tried changing it to a match only rule (not pass), and turning off quick match? I didn't use the pass option, and I also don't use quick match like Jim shows. I do have smart home devices that all work perfectly fine and I can reach them without issue. They are even on a different VLAN + AP.

                          My floating rule(s):

                          Thanks for that. I'm going to try match only the next time I'm not likely to disrupt anyone. I didn't know that ICMP was going to be an issue either, so changing the Address Family in the rule to TCP/UDP is also on my list.

                          For what it is worth my hardware CPU-wise is:

                          Intel(R) Atom(TM) CPU 330 @ 1.60GHz
                          4 CPUs: 1 package(s) x 2 core(s) x 2 hardware threads

                          Even with the limiter I'm able to get 100/10 throughput according to fast.com

                          1 Reply Last reply Reply Quote 0
                          • Z
                            zwck @strangegopher
                            last edited by Sep 27, 2018, 1:37 PM

                            @mattund said in Playing with fq_codel in 2.4:

                            Personally, I don't use it (ECN) on that side (upload), and I haven't noticed any performance loss. I am not sure where the idea came from to not use ECN on the outgoing queues, however in saying that I don't mean to discredit the idea.

                            I remember just this comment,

                            @strangegopher said in Playing with fq_codel in 2.4:

                            According to openwrt ecn should only be enabled on inbound packets.

                            which sources the openwrt wiki, maybe @dtaht can chime in :D

                            D 1 Reply Last reply Sep 27, 2018, 7:14 PM Reply Quote 0
                            • D
                              dtaht @zwck
                              last edited by dtaht Sep 27, 2018, 7:15 PM Sep 27, 2018, 7:14 PM

                              @zwck What we saw with ecn on in fq_codel low upload speeds < 40mbit, was an occasional lockout issue where ecn'd flows seemed to cause too many drops of non-ecn'd flows. If you have a reasonable amount of upload bandwidth feel free to try enabling ecn there also. Go hog wild about turning it on on your tcp stacks also...

                              but do so on an informed basis.

                              The potential problems with ecn are vast... as is the potential benefit. I worry about overuse and mis-applications of it so much that we started a new (sadly unfunded) project and mailing list for it:

                              https://www.bufferbloat.net/projects/ecn-sane/wiki/

                              My position statement is here: https://www.bufferbloat.net/projects/ecn-sane/wiki/dtaht_ecn_editorial/ - and the related rant at the bottom one of my best and worth reading before you fiddle with ecn at all.

                              Our reasoning (in 2012) for only enabling it on inbound shaping was that more bw was available & the packet had already traversed the internet, so why drop it? At the point of congestion on outbound, though, I'd rather clear room immediately.

                              I note that I am presently in the minority as to grave concern over widespread ecn deployment - but willing to encourage others to try it on an informed basis.

                              1 Reply Last reply Reply Quote 0
                              • D
                                dtaht @teh g
                                last edited by Sep 27, 2018, 7:22 PM

                                @teh-g I can certainly see running out of cheap cpu at these speeds.

                                I have to note that usually it's the shaper (limiter) that accounts for 80% or more of the cpu cost of a fq_codel based qos system, and thus I don't get why folk are claiming here fairq + codelq eats less than fq_codel does unless that is effectively (?) mult-cored (?).

                                fq_codel adds a hash calculation and a ptr lookup and is almost immeasurably the same weight as codel alone (as, at least in linux, the hash often occurs elsewhere)

                                Elsewhere, I started work on a multi-core capable shaped fq_codel instance but ran out of money and time for now.

                                T 1 Reply Last reply Sep 27, 2018, 7:28 PM Reply Quote 1
                                • D
                                  dtaht @gsmornot
                                  last edited by Sep 27, 2018, 7:28 PM

                                  @gsmornot Is there a way to increase your token bucket size on the shaper at these speeds? BSD does not have high resolution timers.

                                  1 Reply Last reply Reply Quote 0
                                  • T
                                    teh g @dtaht
                                    last edited by Sep 27, 2018, 7:28 PM

                                    @dtaht said in Playing with fq_codel in 2.4:

                                    @teh-g I can certainly see running out of cheap cpu at these speeds.

                                    I have to note that usually it's the shaper (limiter) that accounts for 80% or more of the cpu cost of a fq_codel based qos system, and thus I don't get why folk are claiming here fairq + codelq eats less than fq_codel does unless that is effectively (?) mult-cored (?).

                                    fq_codel adds a hash calculation and a ptr lookup and is almost immeasurably the same weight as codel alone (as, at least in linux, the hash often occurs elsewhere)

                                    Elsewhere, I started work on a multi-core capable shaped fq_codel instance but ran out of money and time for now.

                                    Thanks for the info. My little Celeron J3455 just isn't built for fq_codel at gigabit down speeds. Luckily I don't run into bufferbloat issues too much, since download seems to be impacted most by it, and it isn't too frequent that I cap out the line with one gigabit.

                                    A multi-core capable fq_codel would be great, sad to hear that money and time were lacking. I'm only pegging a core, so it is definitely a limitation there.

                                    1 Reply Last reply Reply Quote 0
                                    • D
                                      dtaht
                                      last edited by dtaht Sep 27, 2018, 7:57 PM Sep 27, 2018, 7:42 PM

                                      Always have funding problems. Used to it. What happens usually is someone with more time and anger tackles it for us. :) https://github.com/dtaht/fq_codel_fast is where I am with it presently. Single core it's about 5% faster so far. :(

                                      I think this thread has established that 1.6 ghz arm and low end atoms bottleneck on a single core for inbound shaping on this OS at around 500Mbit? Higher end boxes are fine? I do think tuning the shaper might help some ( see for example: https://github.com/tohojo/sqm-scripts/issues/71 and the related pull request - but this is for linux, not bsd)

                                      and the bulk of the bloat problem is usually outbound and at lower speeds, so the bulk of y'all can solve half the problem at least.

                                      1 Reply Last reply Reply Quote 1
                                      • M
                                        mattund
                                        last edited by mattund Sep 28, 2018, 5:49 PM Sep 28, 2018, 5:48 PM

                                        I would expect the traffic would respect the CPU affinity of the packet when it's received at the kernel level, over into the dummynet code. That may be why single cores are pinned -- there's only one TCP stream, thusly one core involved in the transport? Multiple flows may scale better. In this scenario, making FQ_CODEL multi-threaded would work for single flows well of course.

                                        All this thinking gets me anxious to factor CAKE into dummynet, upstream in BSD @dtaht ... 😏 I'm no expert on C, but maybe there is a possibility if it's not too buried in Linux only features. Why stop with FQ_CODEL right? :)

                                        1 Reply Last reply Reply Quote 0
                                        • M
                                          mattund @mattund
                                          last edited by mattund Sep 28, 2018, 7:03 PM Sep 28, 2018, 6:59 PM

                                          @mattund

                                          As an update to my previous post concerning Unable to configure flowset, flowset busy!, I was able to determine the following from the FreeBSD source with some level of confidence:

                                          • The error is related to the re-configuring of the selected AQM.
                                          • The error only occurs when you attempt to explicitly configure an AQM on a queue (aka flowset) that is currently passing traffic, hence the intermittent messages.
                                          • The error only indicates your AQM setting (and its related tune-ables) failed to change as instructed, but everything else should have saved fine.
                                          • As Reddit users have pointed out, the error may recur on its own for no reason due to the filter resync job. Besides the log spam (sorry), it isn't affecting anything since you're not changing settings at that time.

                                          As mentioned in my last post, the patch's commands do explicitly reconfigure the AQM, via the use of the directives: codel, droptail, red, or gred. They are always present/generated by the patch, to ensure consistency. Unfortunately, dummynet has a limitation where it cannot "hot-swap" the AQM as I anticipated it could.

                                          I have tried configuring the assigned sched/pipe # to -1 first prior to the currently bugged queue configuration command, but this produces errors of its own (hah!). My thinking there was to disconnect the parent pipe, so the queue has nothing to complain about. I have also tried explicitly deleting any existing queue prior, but if a pipe does not exist, these commands fail and it breaks the rest of the rules.limiter execution. We could check for the existence of a queue first, and delete it if one already exists. Keep in mind, that these commands get run at a regular interval, so this could cause packet loss every 15 minutes if we're not careful...

                                          This is a tricky one. But, I'll keep trying to get it error-less and working.

                                          L Z 2 Replies Last reply Sep 28, 2018, 7:13 PM Reply Quote 0
                                          511 out of 1108
                                          • First post
                                            511/1108
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.