Traffic not going to Limiters after 2.4.4



  • @jimp
    So, I've deleted all my limiters and deassigned the queues from the firewall rules.
    I then was able to reconfigure all limiters and queues from scratch, and the child queues are now showing up and I can reassign them to the firewall rules.

    Did a "reset states", did a reboot - but traffic is not going to the queues.

    On the console I see periodically errors:

    config_aqm Unable to configure flowset, flowset busy
    

    I've then changed all limiters / queues back to "TailDrop" / "FIFO", Reset states ... I don't see the error messages above, but still all limiters and queues are showing "0 flows" in the limiter info. :-(

    Any ideas?



  • Now this forum software goes crazy as well ... wanted to add my ipfw output to my post and the forum repeatedly shows the error message "post was flagged as spam by Akismet".

    So I can't post my ipfw output here. Stupid piece of software!


  • Rebel Alliance Developer Netgate

    @jacotec said in Traffic not going to Limiters after 2.4.4:

    Now this forum software goes crazy as well ... wanted to add my ipfw output to my post and the forum repeatedly shows the error message "post was flagged as spam by Akismet".

    So I can't post my ipfw output here. Stupid piece of software!

    Post it as a text attachment and not in the body of a message.



  • OK, I found that the /tmp/rules.limiter file did not show all of the queues after I've recreated them, also the ones which were there did not match to what I've configured.

    It was interesting that the file did get a new timestamp after I've applied a change in the limiters GUI, but there was no change in the file content!

    I've deleted the file completely and touched a queue in the GUI, then I've applied the limiters again in the GUI. The file which was created now looks good!

    Afterward I needed to touch every firewall rule again (I've set them all to the base limiter and applied, then set them back to the child queues and applied again). I now see some traffic in the queues.

    It's hard to say that fast if it's working as expected as the limiter info is neither fast nor gives too much detailed information, but I'll observe it.



  • @jacotec Hey!! I'm glad u're having the same problem I initially reported at the beginning of my post. For a while I thought I was being ignored by developers...

    Now that u have created your subqueues on the GUI (which of course it's an upgrade bug, they shouldn't have been deleted at first), we can move forward and analyze why rules are not sending traffic to subqueues, which IMHO it's a serious bug because it implies low level pieces of software (kernel, pf, etc).

    So, @jimp here it's my contents:

    [2.4.4-RELEASE][root@firewall]/root: ipfw queue show
    q00001  50 sl. 0 flows (256 buckets) sched 1 weight 20 lmax 0 pri 0 droptail
        mask:  0x00 0x00000000/0x0000 -> 0xffffffff/0x0000
    q00002  50 sl. 0 flows (256 buckets) sched 1 weight 1 lmax 0 pri 0 droptail
        mask:  0x00 0x00000000/0x0000 -> 0xffffffff/0x0000
    q00003  50 sl. 0 flows (256 buckets) sched 2 weight 20 lmax 0 pri 0 droptail
        mask:  0x00 0xffffffff/0x0000 -> 0x00000000/0x0000
    q00004  50 sl. 0 flows (256 buckets) sched 2 weight 1 lmax 0 pri 0 droptail
        mask:  0x00 0xffffffff/0x0000 -> 0x00000000/0x0000
    
    [2.4.4-RELEASE][root@firewall]/root: ipfw pipe show
    00001:   9.500 Mbit/s    0 ms burst 0 
    q131073  50 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail
     sched 65537 type FIFO flags 0x0 0 buckets 0 active
    00002: 950.000 Kbit/s    0 ms burst 0 
    q131074  50 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail
     sched 65538 type FIFO flags 0x0 0 buckets 0 active
    
    [2.4.4-RELEASE][root@firewall]/root: cat /tmp/rules.limiter 
    
    pipe 1 config  bw 9500Kb droptail
    sched 1 config pipe 1 type fifo
    queue 1 config pipe 1 weight 20 mask dst-ip6 /128 dst-ip 0xffffffff droptail
    queue 2 config pipe 1 weight 1 mask dst-ip6 /128 dst-ip 0xffffffff droptail
     
    
    pipe 2 config  bw 950Kb droptail
    sched 2 config pipe 2 type fifo
    queue 3 config pipe 2 weight 20 mask src-ip6 /128 src-ip 0xffffffff droptail
    queue 4 config pipe 2 weight 1 mask src-ip6 /128 src-ip 0xffffffff droptail
    

    I will also post two of my rules that uses limiters from /tmp/rules.debug:

    pass  in  quick  on $LAN  $GWTC_failover_WiFi inet proto { tcp udp }  from 192.168.211.0/24 to any port $navegacion_libre tracker 0100000101 keep state  dnqueue( 3,1)  label "USER_RULE: LAN Nav"
    
    pass  in  quick  on $GUEST  $GWTC_failover_WiFi inet proto { tcp udp }  from 10.0.4.0/24 to any port $navegacion_libre tracker 1522121250 keep state  dnqueue( 4,2)  label "USER_RULE: Guest nav"
    

    I can also confirm that if I move traffic to base limiters as @jacotec did, I can see some activity. But of course we loose the dynamic bandwidth assignment if doing that:

    [2.4.4-RELEASE][root@firewall]/root: ipfw pipe show
    00001:   9.500 Mbit/s    0 ms burst 0 
    q131073  50 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail
     sched 65537 type FIFO flags 0x0 0 buckets 1 active
    BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
      0 ip           0.0.0.0/0             0.0.0.0/0        6     4581  0    0   0
    00002: 950.000 Kbit/s    0 ms burst 0 
    q131074  50 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail
     sched 65538 type FIFO flags 0x0 0 buckets 1 active
      0 ip           0.0.0.0/0             0.0.0.0/0        6      654  0    0   0
    

    Hope we can work together to debug this problem and find a solution. Thanks.
    Victor



  • @vpreatoni In my opinion at least your /tmp/rules.limiter seems to match your configuration. When you move all your firewall rules to the base queues, then press "reload the firewall" ... and then move all rules back to the child queues (reload firewall again then) - do you still see no traffic in the limiters?

    I've changed mine to "Codel" / "QFQ" now instead of "Taildrop/FIFO". Here I see the traffic in the children.

    As mentioned I needed to delete my /tmp/rules.limiter file and let it recreate by pfSense. But in my case it did not reflect my config. Maybe you do the same to be on the safe side? Do this before you switch away and back your firewall rules.



  • Just a shot in the dark, did you use fq_codel in the previous pfSense version? If yes, do still have a shell command, system patch or script active that was used to switch your limiters to fq_codel?

    I'm running a pretty complex HFSC based traffic shaping setup with multiple (child) queues, and that upgraded fine from 2.4.3p1 to 2.4.4 and works flawless here.



  • @jacotec WOW WOW WOW!!!! We have something!!! Changing Limiter to Codel/QFQ as suggested made it work.

    0_1538604620188_1.png

    In my case I just changed the Download Limiter, and u can see how dynamic Download queues get filled in:

    Limiters:
    00001:   9.500 Mbit/s    0 ms burst 0 
    q131073  50 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0  AQM CoDel target 5ms interval 100ms NoECN
     sched 65537 type FIFO flags 0x0 0 buckets 0 active
    00002: 950.000 Kbit/s    0 ms burst 0 
    q131074  50 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail
     sched 65538 type FIFO flags 0x0 0 buckets 0 active
    
    
    Queues:
    q00001  50 sl. 5 flows (256 buckets) sched 1 weight 20 lmax 1500 pri 0 droptail
        mask:  0x00 0x00000000/0x0000 -> 0xffffffff/0x0000
    BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
     90 ip           0.0.0.0/0      192.168.211.11/0        9     4635  0    0   0
     94 ip           0.0.0.0/0      192.168.211.15/0        1      330  0    0   0
     99 ip           0.0.0.0/0      192.168.211.50/0      112    20579  0    0   0
    108 ip           0.0.0.0/0      192.168.211.61/0        5      260  0    0   0
    109 ip           0.0.0.0/0      192.168.211.60/0       94    22025  0    0   0
    q00002  50 sl. 0 flows (256 buckets) sched 1 weight 1 lmax 1500 pri 0 droptail
        mask:  0x00 0x00000000/0x0000 -> 0xffffffff/0x0000
    q00003  50 sl. 0 flows (256 buckets) sched 2 weight 20 lmax 0 pri 0 droptail
        mask:  0x00 0xffffffff/0x0000 -> 0x00000000/0x0000
    q00004  50 sl. 0 flows (256 buckets) sched 2 weight 1 lmax 0 pri 0 droptail
        mask:  0x00 0xffffffff/0x0000 -> 0x00000000/0x0000
    

    I wouldn't imagine that choosing a more complex scheduler would make it work...
    Anyway, I see this as a "Yellow alarm", because there should be no reason for Taildrop/FIFO aqm/sched to fail.

    PS: I didn't need to delete /tmp/rules.limiters. As soon as I applied Limiters config, it began to work.



  • @grimson Nope, I've never used that. I was on a regular 2.4.3 installation.



  • @vpreatoni Good to hear it works for you now as well! 👍🏻



  • Can see that our traffic shaper is nonfunctional now as of 2.4.4 in terms of per-host dynamic bandwidth shaping.

    Work around for the missing queues/inability to create queues in 2.4.4 was to delete all limiters & queues, then recreate them, then reassign queues to firewall rules. This part worked eventually. Then, found out that the limiter diagnostic info was not functional with Taildrop/FIFO (ipfw pipe show and ipfw queue show). Switching to Codel/QFQ allowed monitoring queues using ipfw pipe show and ipfw queue show, but while the overall bandwidth limiting was working (capping to the max allocated bandwidth), the shaper was NOT actually shaping, either with Taildrop/FIFO or with Codel/QFQ. Codel/QFQ caused system to crash eventually and had to be manually restarted.

    Whole point of the setup was to do the amazing per-host dynamic bandwidth dividing that pfsense was so good with. Can confirm now that although the limiters/queues are recreated and working to limit the maximum aggregate bandwidth, the mask by destination (Down_LAN) or sources addresses (Up_LAN) does not seem to work. These queues are under the DownLimiter and UpLimiter limiters. Up_LAN is assigned to In pipe on a LAN interface firewall rule and Down_LAN is assigned to Out pipe. All was working before 2.4.4! The hosts always showed identical traffic during peak usage, dividing the total bandwidth evenly. This is nonfunctional now.

    Is there anyway I can directly verify if traffic is actually going through per-host queues when using Taildrop/FIFO though ipfw pipe show and ipfw queue show do not show that happening?



  • Update: Have now switched to Codel/Round Robin. This combination seems to work -- traffic goes to child queues as expected and we can achieve the per-host dynamic bandwidth allocation. Would be nice if any other combinations including Taildrop with FIFO or QFQ would also work in the future to try and find optimal settings.



  • After some testing, I can confirm CoDel/QFQ is pretty fucked up!. My server restarts evry 2 or 3 days use, and I get flooded with:

    qfq_dequeue BUG/* non-workconserving leaf */
    

    I'm attaching my debug info here: 0_1539455038230_textdump.tar.0
    Please any dev reply to this post. I'm happy to provide any debugging information or do some testing, but IT IS NOT SERIOUS TO RELEASE A VERSION SO FUCKED UP WITH BUGS, and there was no rush to do it. 2.4.3_1 was working fine. Our servers are production machines.



  • @vpreatoni said in Traffic not going to Limiters after 2.4.4:

    After some testing, I can confirm CoDel/QFQ is pretty fucked up!. My server restarts evry 2 or 3 days use, and I get flooded with:

    qfq_dequeue BUG/* non-workconserving leaf */
    

    This problem was driving me crazy, but I fix it by doing this:

    • Removed every limiter on traffic shaper page.
    • Full reboot of pfsense box
    • Recreated the limiters and I even used different names for them, just in case...
    • Assigned the new limiters to the relevant firewall rules.

    That fixed my problem.



  • I'm also having problems with limiters and shaper in general.
    I have upgraded to 2.4.4, and limiters dissapeared. I recreated them, but there was no incoming traffic at all. Upload traffic seemed ok. So, i reverted to 2.4.3 (it's a VM), but then i realized that in 2.4.3 i cannot edit the limiters, or the shaper, or bad things happen. Just by modifying the bandwidth on the limiters resulted in very low incoming bandwidth (something like 700 kbps instead of 3.5 mbps). So, i decided to delete all limiters and re-create them, and i got the same problem than with the upgrade to 2.4.4: no incoming traffic. Removing the traffic shaper (CBQ) also resulted on the same problem. Fortunately, i just reverted to a previous snapshot, and restored normal functionality. But it seems like i cannot modify anything at all on the shapers/limiters, or bad things happen. Not good.
    Maybe part of the issue was already on 2.4.3? 🤔
    Adding new functionality is nice, but lots of things can go wrong with every change. To me, reliability is much more important than new stuff, and lately i have second thoughts about clicking the update button.



  • Howdy Folks, I don't know if this relative but I will just throw it out here. I have traffic shaper using codel as per the Netgate vid and started to get a log of (config_aqm unable to configure flow set). Also the WAN connection would break.
    This all started when I made a change to the interfaces settings, From "default" to "autoselect" .After making the change to autoselect everything ran fine for 8 hours with no indicating problems. Reboots of Pfsense and modem changed nothing.
    I then changed the interfaces back to default and all problems evaporated.
    When I made the changes, reboots and state table reset were done per best practice.
    Using: 2.4.4
    Supermicro C2558
    cable
    I hope this is helpful.



  • I see that there's a 2.4.4-p1 version now. Has this problem been fixed there?


  • Rebel Alliance Developer Netgate

    @fsr said in Traffic not going to Limiters after 2.4.4:

    I see that there's a 2.4.4-p1 version now. Has this problem been fixed there?

    Yes, assuming it was this: https://redmine.pfsense.org/issues/8973



  • @fsr Yes, it's fixed!. Now it is clear which scheduler is the default one (WF2Q+), and works perfect.
    Haven't tested QFQ yet, but I'm pretty happy with Codel ACM/WF2Q+ sched behavior.

    Some other issues have been solved in 2.4.4_1 too, like the unbound memory leak.



  • Excellent!! Thanks a lot!!