Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Playing with fq_codel in 2.4

    Scheduled Pinned Locked Moved Traffic Shaping
    1.1k Posts 123 Posters 1.7m Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G
      gsmornot
      last edited by

      I’m glad you guys are working through this. I hope it gets to the point I can just have it on by default in the background making my connection nice.

      1 Reply Last reply Reply Quote 0
      • S
        strangegopher @dtaht
        last edited by

        @dtaht darn my modem, xb6, is on the list, didn't know puma 7 was also affected. Explains my crappy results in flent.

        1 Reply Last reply Reply Quote 0
        • T
          tman222
          last edited by tman222

          So I did a bit of reading today and found some fantastic resources of limiters, dummynet, and how the other schedulers in pfSense (FreeBSD) work:

          http://info.iet.unipi.it/~luigi/ip_dummynet/original.html
          http://info.iet.unipi.it/~luigi/qfq/
          http://info.iet.unipi.it/~luigi/doc/20100513-bsdcan10dn.pdf

          https://www.netgate.com/docs/pfsense/trafficshaper/limiters.html

          These are a bit dated, but still relevant.

          After doing a lot of reading today, I wanted to share a couple additional thoughts - one regarding setting up fq_codel, the other about an interesting alternative setup I started experimenting with.

          Regarding fq_codel Setup:

          For the most basic setup, I think one only needs to create one or two limiters (up and down) and then in the Queue section, just enable and configure the fq_codel algorithm under Scheduler. The section for Queue Management Algorithm in my opinion does not need to be changed from the default and child queues under the limiter also aren't necessary. fq_codel creates queues and handles the AQM for them so there is no need to fill Codel in again under Queue Management Algorithm. If you were using another algorithm that is strictly a scheduler (e.g. RR or QFQ), the proper queue management algorithm would need to be selected. Having said that, the Queue Management Section can still be filled in if fq_codel is chosen as scheduler, but I'm just not sure how much additional benefit there would be (vs. just more compute cycles) to have AQM on the incoming packet queue and then again from fq_codel on the flow queues it creates/manages. To finish setting up, all one would need to apply is the name of the up and down limiter to the in and out pipe sections in the firewall rules.

          Masks on the queues, in my opinion, also aren't necessary to get fq_codel to work properly because the algorithm handles the mapping of flows to queues.

          Child queues can be created and these can be applied to the firewall rules, but it's not required to get the algorithm to work. Child queues become more interesting if one wanted to e.g. split the total bandwidth into weighted queues (e.g. schedule 9 packets out of queue 1 before scheduling 1 packet out of queue 2 in a 90/10 weighted scheme). But as @bafonso already mentioned fq_codel does not support weighted queues. For that one would have to use a different scheduler such as QFQ, for example.

          I apologize in advance if anything on the above is incorrect, and if so someone please correct me. This is just my interpretation after doing some additional reading and testing today.

          An Interesting Alternative Setup:

          After reading about dummynet, limiters, pipes, queues, etc. today, I decided to try this alternative setup:

          1. Create two limiters: Up and Down. Fill in the bandwidth and then choose RR (Round Robin) for scheduler
          2. Create a child queue under each limiter and select Codel and ECN on each to enable AQM.
          3. For the upload child queue, choose "Source Addresses" for mask and change the bucket size to 1024.
          4. For the download child queue, choose "Destination Addresses" for mask and change the bucket size to 1024.
          5. Apply the upload and download queue to your LAN firewall rule (that allows outbound traffic) under in and out pipe.

          To me this setup is very seems very similar to what fq_codel does. A queue (managed by Codel AQM) gets created for each IP/flow and then the scheduler traverses those queues in round robin fashion. Besides not being able to adjust the quantum parameter for instance, can someone tell me how this setup is different from fq_codel? Performance from what I can tell so far seems quite similar. However, I'm sure there a probably more differences.

          Thanks in advance for your help, I really appreciate it.

          1 Reply Last reply Reply Quote 1
          • D
            dtaht
            last edited by

            fq_codel hashes on the 5 tuple, you are hashing on the source addr. The source addr is often not visible post nat, thus the 5 tuple (src,dst,src port, dst port, protocol) is a better distinguishing characteristic.

            A single fq_codel instance contains 1024 shared queues based on a hash of that.

            Having the filter up front into 1024 codel queues means a memory limit of 1024 * X packets. Usuaally not a problem, but it's losing the 5 tuple hash that hurts.

            It looks like this bug needs to be reopened for the ping through nat bug.

            https://redmine.pfsense.org/issues/4326

            1 Reply Last reply Reply Quote 0
            • T
              tman222
              last edited by

              Thanks @dtaht - I think you are right, that is the biggest difference. I was originally thrown off this capability, because if you look here:

              http://info.iet.unipi.it/~luigi/doc/20100513-bsdcan10dn.pdf

              Slide 33 claims that masks are applied to the 5-tuple of each packet (so similar to fq_codel). However, in the Netgate documentation I see this:

              https://www.netgate.com/docs/pfsense/trafficshaper/limiters.html

              "Dummynet pipes have a feature called dynamic queue creation which allows unique queues based on the uniqueness of a connections source protocol, IP address, source port, destination address or destination port. They can also be used in combination. pfSense currently only allows setting the source address or the destination address as the mask."

              So it looks like the limitation here might be pfSense and not dummynet itself? Does anyone know why this limitation exists in pfSense?

              I'm currently playing around with Quick Fair Queuing (QFQ) and weighted queues a little bit to see how that performs. Any suggestions for performance comparison tests I could run?

              Anyway, I don't mean to take this thread off track since it is about fq_codel after all and not the other scheduling algorithms available in dummynet/pfSense. However, after doing some reading, tinkering is a lot of fun :). That said, for simplicity and an algorithm that just works, fq_codel wins hands down, and the configuration is very easy on pfSense.

              D 1 Reply Last reply Reply Quote 0
              • D
                dtaht
                last edited by dtaht

                I put a bug over here: https://redmine.pfsense.org/issues/9024

                I am not in a position to "help" much more here. You've got one bad modem, one proof of a nat problem with ping, another as yet unproven report of "all nat connections collapsing after a test" (or was that the bad modem?), and proof that fq_codel is doing the right things (both with and without ecn) without nat in place.

                1 Reply Last reply Reply Quote 0
                • D
                  dtaht @tman222
                  last edited by

                  @tman222 fq_codel and qfq vs rrul.

                  T 1 Reply Last reply Reply Quote 0
                  • T
                    tman222 @dtaht
                    last edited by

                    @dtaht - that sounds like a good idea. Since I'm on a fast WAN connection, should I try to artificially limit the speed to e.g. maybe 500Mbit/s or 250Mbit/s so I can use external Flent servers?

                    1 Reply Last reply Reply Quote 0
                    • D
                      dtaht
                      last edited by

                      goferit

                      1 Reply Last reply Reply Quote 0
                      • D
                        dtaht
                        last edited by

                        as for bad cablemodems, I'm dying for someone to try this out: https://express.google.com/product/Arris-SURFboard-Cable-Modem-and-AC2350-Wi-Fi-Router-with-Arris-Secure-Home-Internet-by-McAfee/0_17937886568302066345_0

                        or a pure modem of the same generation from arris.

                        1 Reply Last reply Reply Quote 0
                        • T
                          tman222
                          last edited by

                          After doing a bit more thinking, I'm more curious about how the performance of fq_codel is impacted by enabling Codel AQM on the input queue.. For instance, consider the following two setups:

                          1. Setup 1: Up and down limiters created with appropriate bandwidth for each. Enable Codel for Active Queue Management and then enable fq_codel for scheduler. Adjust queue size as necessary. Apply limiters to firewall rules. This setup to me looks like this:

                          Limiter (Pipe) Input Queue (managed by Codel AQM) ---> fq_codel scheduler ---> 1....N output queues (managed by Codel AQM), where N is number of flows.

                          1. Setup 2: Up and down limiters created with appropriate bandwidth for each. Leave Active Queue Management as is and then enable fq_codel for scheduler. Adjust queue size as necessary. Apply limiters to firewall rules. This setup to me looks like this:

                          Limiter (Pipe) Input Queue (No AQM, just tail drop) ---> fq_codel scheduler ---> 1....N output queues (managed by Codel AQM), where N is number of flows.

                          I can imagine that setup 1) could potentially yield better performance especially if there is a big enough difference between the local interface (LAN) speed and the WAN connection speed. However, does the additional processing required (AQM x2) result in poorer performance on slower equipment?

                          I'm curious if anyone had run any tests using both these setups and noticed any difference? Also, it would be great to hear thoughts anyone might have regarding the performance of these options in general.

                          Thanks in advance.

                          1 Reply Last reply Reply Quote 1
                          • K
                            kjstech @xRaisen
                            last edited by

                            @xraisen In my pfSense 2.4.4 under CoDel there are two parameters. There is target which defaults to 5 and interval which defaults to 100. Is there any merits to adjusting these?

                            1 Reply Last reply Reply Quote 0
                            • Z
                              zwck @Harvy66
                              last edited by

                              @harvy66

                              1. how do you typically go forward in tuning your pfsense?
                              2. does hw.igb.fc_setting=0 actually exist?
                              1 Reply Last reply Reply Quote 0
                              • X
                                xciter327
                                last edited by

                                @zwck said in Playing with fq_codel in 2.4:

                                hw.igb.fc_setting=0

                                Does not actually work on my Supermicro Atom 2758. I use "hw.igb.0.fc=0", which does exists when I run "sysctl -a".

                                1 Reply Last reply Reply Quote 0
                                • T
                                  tman222
                                  last edited by

                                  @zwck - there are two main ways I'm aware of:

                                  1. Edit your loader.conf.local file
                                  2. Go to System --> Advanced --> System Tunables.

                                  @kjstech - Yes, with very slow connections (low upload or download speeds) the target and limit may need to be increased to avoid excessive drops in the queue.

                                  https://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchmarking_Codel_and_FQ_Codel/
                                  https://lists.bufferbloat.net/pipermail/bloat/2017-November/007975.html
                                  http://caia.swin.edu.au/freebsd/aqm/patches/README-0.2.1.txt

                                  Hope this helps.

                                  Z 1 Reply Last reply Reply Quote 0
                                  • Z
                                    zwck @tman222
                                    last edited by zwck

                                    @tman222
                                    Hey, Thanks mate. i guess i am aware of both methodologies, i am more wondering how do you find the proper settings to type in there. I read throught, and played around with, https://calomel.org/freebsd_network_tuning.html this guide. But could not see any difference.

                                    Also for people who want to play around with flent:
                                    quick installation guide for ubuntu 16+

                                    sudo apt update
                                    sudo apt upgrade
                                    sudo apt install git
                                    
                                    git clone https://github.com/HewlettPackard/netperf.git
                                    cd netperf
                                    sudo apt install texinfo
                                    sudo apt install iperf
                                    sudo apt-get install automake -y
                                    sudo apt install autoconf -y
                                    sudo apt install python-pip -y
                                    pip install netlib
                                    pip install cpp
                                    ./autogen.sh
                                    
                                    autoconf configure.ac > configure
                                    sudo chmod 755 configure
                                    ./configure --enable-demo
                                    make
                                    make install
                                    
                                    sudo add-apt-repository ppa:tohojo/flent
                                    sudo apt update
                                    sudo apt install flent
                                    
                                    
                                    flent rrul -p all_scaled -l 60 -H flent-london.bufferbloat.net -t no_shaper -o RRUL_no_shaper.png
                                    
                                    1 Reply Last reply Reply Quote 0
                                    • T
                                      tman222
                                      last edited by

                                      Hi @zwck

                                      It's a lot of trial and error (i.e. testing) to see what works best for your use case(s). Keep in mind that a lot of the guides you will find are for tuning host computers and some of those suggestions may not work well for a firewall appliance.

                                      One other site that I have gotten some helpful tuning info from has been the BSD Router Project, for example:
                                      https://bsdrp.net/documentation/technical_docs/performance

                                      Hope this helps.

                                      Z 1 Reply Last reply Reply Quote 0
                                      • S
                                        sciencetaco
                                        last edited by

                                        Is there any reason you folks can think of why when I run the flent rrul/rrul_noclassification, my download seems to top out at 40mb/s to netperf-west.bufferbloat.net. When I run "netperfrunner.sh" from the same host, i get the following:

                                        flent:
                                        alt text

                                        script:

                                        2018-10-10 08:59:19 Testing netperf-west.bufferbloat.net (ipv4) with 4 streams down and up while pinging gstatic.com. Takes about 60 seconds.
                                         Download:  150.21 Mbps
                                           Upload:  10.27 Mbps
                                          Latency: (in msec, 61 pings, 0.00% packet loss)
                                              Min: 29.343
                                            10pct: 33.824
                                           Median: 44.323
                                              Avg: 45.461
                                            90pct: 57.069
                                              Max: 74.273
                                        

                                        I'm applying the limiter via floating rules on WAN. I'm using codel+fq_codel set to 390mb/s down and 19mb/s up.

                                        I've seen some people incorporating their limiters via in/out pipe on the default lan allow rule - is there some consensus on which method is "best"? I've got a bunch of vlans off that interface - if i went this method, i'd need to include the in/out pipe on every default allow rule for each vlan?

                                        thank you for all you've managed to figure out and explain to me thus far.

                                        Z 1 Reply Last reply Reply Quote 0
                                        • Z
                                          zwck @sciencetaco
                                          last edited by zwck

                                          @sciencetaco

                                          I asked about this as well. some posts up dthat explains it, its actually 4x40Mbps ~ 160 and 4x3 ~ 12 Mbps (when you start flent with the option --gui you can check total download and upload values)
                                          Why it tops out at about half your speed limit is difficult to say, maybe hardware/line limitations from you or the host? I started setting up the codel params with extreme reduced speeds. i.e. 1gbit line limit, codel limiters set to 100Mbit.

                                          S 1 Reply Last reply Reply Quote 1
                                          • Z
                                            zwck @tman222
                                            last edited by

                                            @tman222 said in Playing with fq_codel in 2.4:

                                            Hi @zwck

                                            It's a lot of trial and error (i.e. testing) to see what works best for your use case(s). Keep in mind that a lot of the guides you will find are for tuning host computers and some of those suggestions may not work well for a firewall appliance.

                                            One other site that I have gotten some helpful tuning info from has been the BSD Router Project, for example:
                                            https://bsdrp.net/documentation/technical_docs/performance

                                            Hope this helps.

                                            I just quickly skimmed this section with the outcome:
                                            changing :

                                            machdep.hyperthreading_allowed="0" -> 24% increased performance
                                            net.inet.ip.fastforwarding=1 (useless since freebsd11)
                                            hw.igb.rxd or  hw.igb.txd -> decrease performance
                                            hw.igb.rx_process_limit=100 to -1 -> improvement, 1.7% 
                                            max_interrupt_rate from 8000 to 32000 -> no benefit
                                            Disabling LRO and TSO -> no impact
                                            
                                            T 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.