Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Playing with fq_codel in 2.4

    Scheduled Pinned Locked Moved Traffic Shaping
    1.1k Posts 123 Posters 1.6m Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      dtaht @zwck
      last edited by dtaht

      It;s awesome to have more folk doing bidir network stress testing with flent. nobody in product marketing wants you to do that.

      @zwck ok, I rebooted the box in london (it had a tcp tweak i didn't like), It should be back up in a minute. It looks to me though you just peak out at 1gbit total, though, on this hw...)

      1 Reply Last reply Reply Quote 0
      • D
        dtaht
        last edited by dtaht

        @zwck said in Playing with fq_codel in 2.4:

        burst 0

        Is there a way to tune the "burst" value in the limiter above? It's nice to see the dscp values actually being respected e2e here also. that never happens.

        Z 1 Reply Last reply Reply Quote 0
        • D
          dtaht
          last edited by

          btw, the rrul test does not account for tcp ack traffic. When i see ~480Mbit of perfect fq_codeled bandwidth at 500mbit, it's a good assumption the remaining ~20mbit was acks as there's about a 20x1 ratio there

          1 Reply Last reply Reply Quote 0
          • D
            dtaht
            last edited by

            @zwck said in Playing with fq_codel in 2.4:

            ipfw sched show

            during a test would be interesting.

            1 Reply Last reply Reply Quote 0
            • Z
              zwck @dtaht
              last edited by zwck

              @dtaht

              speedtest

              aftre london rebooted :D

              1_1538721917996_rrul-2018-10-05T083834.071452.zwck-shaper_on_500Mbit.flent.gz

              0_1538721917996_RRUL_Test001_bufferbloat-shaper_on_500Mbit.png

              admin@pfsense:~ # ipfw sched show
              10000: 500.000 Mbit/s    0 ms burst 0
              q75536  50 sl. 0 flows (1 buckets) sched 10000 weight 0 lmax 0 pri 0 droptail
               sched 10000 type FQ_CODEL flags 0x0 512 buckets 1 active
               FQ_CODEL target 5ms interval 60ms quantum 1514 limit 10240 flows 1024 ECN
                 Children flowsets: 10000
              BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
                0 ip           0.0.0.0/0             0.0.0.0/0     2357  2546546  0    0   0
              10001: 500.000 Mbit/s    0 ms burst 0
              q75537  50 sl. 0 flows (1 buckets) sched 10001 weight 0 lmax 0 pri 0 droptail
               sched 10001 type FQ_CODEL flags 0x0 512 buckets 1 active
               FQ_CODEL target 5ms interval 60ms quantum 1514 limit 10240 flows 1024 ECN
                 Children flowsets: 10001
                0 ip           0.0.0.0/0             0.0.0.0/0     306719 434714257 106 154656   7
              admin@pfsense:~ # ipfw sched show
              10000: 500.000 Mbit/s    0 ms burst 0
              q75536  50 sl. 0 flows (1 buckets) sched 10000 weight 0 lmax 0 pri 0 droptail
               sched 10000 type FQ_CODEL flags 0x0 512 buckets 1 active
               FQ_CODEL target 5ms interval 60ms quantum 1514 limit 10240 flows 1024 ECN
                 Children flowsets: 10000
              BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
                0 ip           0.0.0.0/0             0.0.0.0/0     4507  5174782  8 6208   0
              10001: 500.000 Mbit/s    0 ms burst 0
              q75537  50 sl. 0 flows (1 buckets) sched 10001 weight 0 lmax 0 pri 0 droptail
               sched 10001 type FQ_CODEL flags 0x0 512 buckets 1 active
               FQ_CODEL target 5ms interval 60ms quantum 1514 limit 10240 flows 1024 ECN
                 Children flowsets: 10001
                0 ip           0.0.0.0/0             0.0.0.0/0     362125 513262875 133 199500   7
              admin@pfsense:~ # ipfw sched show
              10000: 500.000 Mbit/s    0 ms burst 0
              q75536  50 sl. 0 flows (1 buckets) sched 10000 weight 0 lmax 0 pri 0 droptail
               sched 10000 type FQ_CODEL flags 0x0 512 buckets 1 active
               FQ_CODEL target 5ms interval 60ms quantum 1514 limit 10240 flows 1024 ECN
                 Children flowsets: 10000
              BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
                0 ip           0.0.0.0/0             0.0.0.0/0       46    61760  0    0   0
              10001: 500.000 Mbit/s    0 ms burst 0
              q75537  50 sl. 0 flows (1 buckets) sched 10001 weight 0 lmax 0 pri 0 droptail
               sched 10001 type FQ_CODEL flags 0x0 512 buckets 1 active
               FQ_CODEL target 5ms interval 60ms quantum 1514 limit 10240 flows 1024 ECN
                 Children flowsets: 10001
                0 ip           0.0.0.0/0             0.0.0.0/0     5427  7667181  0    0   0
              admin@pfsense:~ # ipfw sched show
              10000: 500.000 Mbit/s    0 ms burst 0
              q75536  50 sl. 0 flows (1 buckets) sched 10000 weight 0 lmax 0 pri 0 droptail
               sched 10000 type FQ_CODEL flags 0x0 512 buckets 1 active
               FQ_CODEL target 5ms interval 60ms quantum 1514 limit 10240 flows 1024 ECN
                 Children flowsets: 10000
              BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
                0 ip           0.0.0.0/0             0.0.0.0/0     3294  3669449 14 10864   0
              10001: 500.000 Mbit/s    0 ms burst 0
              q75537  50 sl. 0 flows (1 buckets) sched 10001 weight 0 lmax 0 pri 0 droptail
               sched 10001 type FQ_CODEL flags 0x0 512 buckets 1 active
               FQ_CODEL target 5ms interval 60ms quantum 1514 limit 10240 flows 1024 ECN
                 Children flowsets: 10001
                0 ip           0.0.0.0/0             0.0.0.0/0     90572 128064966 100 147104   1
              
              1 Reply Last reply Reply Quote 0
              • D
                dtaht
                last edited by

                well, the 500mbit results are awesome. there's 4 bursty drop episodes on the download that could be coming from anywhere for any cause - my box, yours, linode's shapers, the path, cosmic radiation.

                try a rrul_be test to see if you get that big bursy drop. It's midnight here. I'm fading

                Z 1 Reply Last reply Reply Quote 0
                • Z
                  zwck @dtaht
                  last edited by

                  @dtaht Thanks for the awesome help. It's in am's over here and i need to get to work. I have to read your flent documentation properly, enjoy your sailing trip.

                  D 1 Reply Last reply Reply Quote 0
                  • D
                    dtaht
                    last edited by dtaht

                    I don't have much insight into that drop but the recovery pattern looks normal

                    0_1538722775830_bigdrop.4.4_500Mbit_ECN.png

                    I dont have bbr on that box so can't try that, and is not the miracle of the juniper bushes enough? 800mbit still weird, though?

                    this also show's diffserv cs1 being respected.

                    .... you normally shouldn't see all 3 flows dropping a packet at the same time, just one (and you'd see, as in earlier in the test the flows trading bandwidth back and forth in the tcp sawtooth pattern). with 3 simultaneous drops they all cut their bandwidth in half and utilization is lowered while they recover.

                    1 Reply Last reply Reply Quote 0
                    • D
                      dtaht @zwck
                      last edited by dtaht

                      This post is deleted!
                      Z 1 Reply Last reply Reply Quote 0
                      • D
                        dtaht
                        last edited by

                        have a song: https://plus.google.com/u/0/107942175615993706558/posts/UtcLY2W9NXy

                        1 Reply Last reply Reply Quote 0
                        • Z
                          zwck @dtaht
                          last edited by

                          @dtaht have fun

                          10000: 800.000 Mbit/s    0 ms burst 0
                          q75536  50 sl. 0 flows (1 buckets) sched 10000 weight 0 lmax 0 pri 0 droptail
                           sched 10000 type FQ_CODEL flags 0x0 512 buckets 1 active
                           FQ_CODEL target 5ms interval 60ms quantum 1514 limit 10240 flows 1024 ECN
                             Children flowsets: 10000
                          BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
                            0 ip           0.0.0.0/0             0.0.0.0/0     3780  4324863  0    0   0
                          10001: 800.000 Mbit/s    0 ms burst 0
                          q75537  50 sl. 0 flows (1 buckets) sched 10001 weight 0 lmax 0 pri 0 droptail
                           sched 10001 type FQ_CODEL flags 0x0 512 buckets 1 active
                           FQ_CODEL target 5ms interval 60ms quantum 1514 limit 10240 flows 1024 ECN
                             Children flowsets: 10001
                            0 ip           0.0.0.0/0             0.0.0.0/0     107473 153093543 201 297156   1
                          
                          

                          1_1538723446600_RRUL_Test001_bufferbloat-shaper_on_800Mbit.png

                          0_1538723446599_rrul-2018-10-05T090400.526623.zwck-shaper_on_800Mbit.flent.gz

                          1 Reply Last reply Reply Quote 0
                          • D
                            dtaht
                            last edited by

                            I do gotta say I think these major drops are significant... but I'm tired! need to fire up a different netperf server in a different cloud to see if it's on my end. Got a fav cloud provider? this is linode....

                            or @uptownvagrant can weigh in

                            1 Reply Last reply Reply Quote 0
                            • Z
                              zwck
                              last edited by

                              It's probably on my end, I have beefier hardware that I can try plus I can maybe set it up similar to what vagrant is doing.

                              1 Reply Last reply Reply Quote 0
                              • D
                                dtaht
                                last edited by

                                do you have any major daemons running out of cron or elsewhere? this is happening ever ~40 sec it looks like... or a gc interval in the kernel?

                                bed, calling

                                1 Reply Last reply Reply Quote 0
                                • D
                                  dtaht
                                  last edited by dtaht

                                  thos 40 second dropouts are the sort of tiny long term misbehavior I have ocd over, even though you'd hardly notice it in normal use. For example this bug in wifi causes drones to physically crash:

                                  http://blog.cerowrt.org/post/disabling_channel_scans/

                                  Z 1 Reply Last reply Reply Quote 0
                                  • D
                                    dtaht
                                    last edited by

                                    So I'd end up running tests for hours while poking at all the other system resources, watching top for high cpu processes, cron, syslogs...

                                    0_1538749722832_ocd.png

                                    Z 1 Reply Last reply Reply Quote 0
                                    • Z
                                      zwck @dtaht
                                      last edited by

                                      @dtaht

                                      Probably saturday morning i'll do some more testing.

                                      1 Reply Last reply Reply Quote 0
                                      • Z
                                        zwck @dtaht
                                        last edited by

                                        @dtaht Thanks again! Please go and enjoy your boating!!!!

                                        In the mean time i try to isolate my network a bit. Are there some good examples how it should look. So i can quickly compare. Will see what i can come up with. I am also waiting for some better hardware for my pfsense box.

                                        D 1 Reply Last reply Reply Quote 0
                                        • D
                                          dtaht @zwck
                                          last edited by dtaht

                                          @zwck I will! The times I've seen something like this are:

                                          local process eating 100% cpu briefly
                                          system management interrupt
                                          kernel gc on something
                                          renewing an ip address via dhcp (router or host on the path)
                                          another program on the network wanting some bandwidth
                                          missing an arp
                                          route update or flap (somewhere)
                                          channel scan (in wifi)
                                          unaligned access trap (in mips)

                                          ... cosmic radiation and other explanations from the the bofh. :)

                                          1 Reply Last reply Reply Quote 0
                                          • D
                                            dtaht
                                            last edited by dtaht

                                            last bit of ocd. I'm not sure if the interrupt change or the increase in rx ringbuffer size did any good, but I note these are things that are not fixed numbers and need to scale by the bandwidth. "more" interrupts is generally better for low latency networking but too many interrupts overwhelms a modern cpu faster.

                                            I think the sysctl for tso and "large receive" offloads to the igp card are here:

                                            net.inet.tcp.tso="0"
                                            net.inet.tcp.lro="0"

                                            These "bulk up packets" in the card and lower system memory and interrupt requirements. I'm all about unbulking and interleaving (FQ-ing) packets. lro in particular is often notoriously buggy, but worth enabling as intel's network cards (igp) in particular generally has good support for it. (be prepared to totally crash the network side of the box though, or break something elsewhere in the network stack).

                                            Now that it is repeatable, can the icmp nat issue get reported somewhere?

                                            H 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.