Traffic not going to Limiters after 2.4.4



  • Shame 2.4.4 was called a RELEASE... lot of troubles after upgrading. First, all my limiter queues disappeared. Tried creating them again but was not working. Had to manually edit XML file, erase all about Limiters, and start from zero.

    Now, with all Limiters and it's queues created, redirected traffic back to In/Out pipes (yes, had to edit my Rules again since 2.4.4 upgrade fucked them all!!), but dynamic queues are empty!. They show NO traffic:

    Limiters:
    00001:   9.500 Mbit/s    0 ms burst 0 
    q131073  50 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail
     sched 65537 type FIFO flags 0x0 0 buckets 0 active
    00002: 950.000 Kbit/s    0 ms burst 0 
    q131074  50 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail
     sched 65538 type FIFO flags 0x0 0 buckets 0 active
    
    
    Queues:
    q00001  50 sl. 0 flows (256 buckets) sched 1 weight 20 lmax 0 pri 0 droptail
        mask:  0x00 0x00000000/0x0000 -> 0xffffffff/0x0000
    q00002  50 sl. 0 flows (256 buckets) sched 1 weight 1 lmax 0 pri 0 droptail
        mask:  0x00 0x00000000/0x0000 -> 0xffffffff/0x0000
    q00003  50 sl. 0 flows (256 buckets) sched 2 weight 20 lmax 0 pri 0 droptail
        mask:  0x00 0xffffffff/0x0000 -> 0x00000000/0x0000
    q00004  50 sl. 0 flows (256 buckets) sched 2 weight 1 lmax 0 pri 0 droptail
        mask:  0x00 0xffffffff/0x0000 -> 0x00000000/0x0000
    

    Limiters are configured according to https://www.netgate.com/docs/pfsense/book/trafficshaper/limiters.html

    This is my Pipe config:
    0_1537917050989_1.png

    And this is the child pipe (dynamic queue) config. As u can see, mask is applied correctly:
    0_1537917090172_2.png

    This is traffic assignment into the outbound LAN rule:
    0_1537917162835_3.png

    But what it's worst..... IT WAS WORKING PERFECTLY ON 2.4.3_1!!!. So why now it's bugged??



  • I have the same issues with limiters when a child limiter is created. Limiters only work when there are no sub-categories. Looks like a bug to me.



  • Glad I wasn't the only one having this issue. all my child queues for my traffic limiters vanished. Creating a queue does not do anything. At this stage I'm going to just install a fresh VM and then restore my config to see if that works. It also seems that this update actually forced all Firewall rules for In/Out pipes to be none (even for those who didn't have child queues).



  • So, should we file a bug report?. This issue is quite critical


  • Rebel Alliance Developer Netgate

    There is already a report here: https://redmine.pfsense.org/issues/8956



  • Seems nobody caring about this bug, and is quite critical. Traffic shaping section is one of the most important, and the main reason for many people to use pfSense.

    So, only 4 guys affected by this?. Is there any way to BUMP the pfsense reported bug?.


  • Rebel Alliance Developer Netgate

    It already has a target of 2.4.4-p1, there is no need to "bump" it or draw more attention to it. We're all busy here and it hasn't made it to the top of anyone's todo list yet.



  • Hi jimp, sorry for disturbing.
    Reading in detail bug #8956 I can see it's a different situation. In that case, report it's about not being able to create queues under each limiter. Workaround for that is manually deleting all Limiters into XML file and starting from scratch.

    I filled in a new bug https://redmine.pfsense.org/issues/8973 because in this case, queues are properly created, they are shown into GUI and also doble checked with ipfw pipe show command, and queues are there.



  • +1 for a quick fix. This issue is ways too critical to wait weeks for a -p1 release in my opinion!

    There's a presentation video on limiters from August 2018 for the upcoming 2.4.4 release - I can't understand that a presentation was taken although that seems to be fully untested.

    As much as I love pfSense and appreciate the work Netgate puts in, but I really wonder how such a bug can make it into a release version ...



  • @vesikk said in Traffic not going to Limiters after 2.4.4:

    At this stage I'm going to just install a fresh VM and then restore my config to see if that works.

    So for those wondering I did install a fresh pfSense VM and I tested the limiters before restoring my backup config. Limiters and child queues were working perfectly but as soon as I restored my backup I could not create any queues for limiters. At the moment it's not an issue for me and I'm happy to wait for the patch release.



  • @vpreatoni said in Traffic not going to Limiters after 2.4.4:

    Hi jimp, sorry for disturbing.
    Reading in detail bug #8956 I can see it's a different situation. In that case, report it's about not being able to create queues under each limiter. Workaround for that is manually deleting all Limiters into XML file and starting from scratch.

    I filled in a new bug https://redmine.pfsense.org/issues/8973 because in this case, queues are properly created, they are shown into GUI and also doble checked with ipfw pipe show command, and queues are there.

    Could you check if this is just a GUI issue that the traffic is not shown (but the limiters itself are working), or aren't they working at all?

    I'd delete my shaper config via XML file as well and redo them it that would solve it, but as far as I understand your post this will still not help me even when I'm able to create them in the GUI.


  • Rebel Alliance Developer Netgate

    @jacotec said in Traffic not going to Limiters after 2.4.4:

    +1 for a quick fix. This issue is ways too critical to wait weeks for a -p1 release in my opinion!

    There's a presentation video on limiters from August 2018 for the upcoming 2.4.4 release - I can't understand that a presentation was taken although that seems to be fully untested.

    As much as I love pfSense and appreciate the work Netgate puts in, but I really wonder how such a bug can make it into a release version ...

    Because maybe it isn't quite that clear and it doesn't affect everyone?

    There are a large number of users with limiters on 2.4.4 working just fine, with traffic using the limiters as expected. You need only peek at the FQ_CODEL thread for evidence.


  • Rebel Alliance Developer Netgate

    If someone has a limiter problem where the queues DO NOT show up, including if you re-created them, I'd like to see the contents of the limiters from config.xml from before the upgrade as well as after. The section I'm looking for is the <dnshaper> ... </dnshaper> section. There should not be anything too private in there, with a possible exception of a masked subnet if you used that.

    I'd also like to see the contents of /tmp/rules.limiter, ipfw pipe show, and ipfw queue show.

    And as always, make sure you reset states between any limiter config change or test.



  • @jacotec It's not a GUI issue (in my case), check my first post, there is the output of ipfw pipe show and ipfw queue show. Pipes and subqueues are created properly.



  • @jimp Please find the requested info here: https://jaycloud.de/f/4a4b8a11ff4a49cfb179/

    There seems to be no command "ipfw limiter show":

    ipfw: bad command `limiter'
    

    Let me know if you need any more information


  • Rebel Alliance Developer Netgate

    That should have been ipfw queue show, sorry. I edited the message.



  • @jimp OK, that one is empty. I've updated my document above.


  • Rebel Alliance Developer Netgate

    You have queues defined but they are not loaded. Do your firewall rules have the queue selected or the base limiter itself?

    Also the "after" settings look like they were changed after the upgrade. Was that what you have right now after attempting to make changes, or from immediately after the upgrade?



  • @jimp The base limiters have been there after the update, the child queues have been completely gone. My floating rules in the firewall are still there, but they were using the child queues before the update - after the update the child queue assignment in all floating rules were gone, just showing "none". So pfSense has deleted the configured pipes at this point after the update.

    My child queues are not available anymore as the selection for the In/Out pipe of the rules, I see only the base limiters there.

    I've changed the base queues to "FQCodel" later after the update, right ... hoping that I can see / recreate my children after changing the settings. Which did not happen. But the childs vanished before, right after the update and still with the old settings.

    Do you think it would make sense to delete the dnshaper section from the XML, reboot and recreate the limiters and children in the GUI to see if they would work then?



  • @jimp
    So, I've deleted all my limiters and deassigned the queues from the firewall rules.
    I then was able to reconfigure all limiters and queues from scratch, and the child queues are now showing up and I can reassign them to the firewall rules.

    Did a "reset states", did a reboot - but traffic is not going to the queues.

    On the console I see periodically errors:

    config_aqm Unable to configure flowset, flowset busy
    

    I've then changed all limiters / queues back to "TailDrop" / "FIFO", Reset states ... I don't see the error messages above, but still all limiters and queues are showing "0 flows" in the limiter info. :-(

    Any ideas?



  • Now this forum software goes crazy as well ... wanted to add my ipfw output to my post and the forum repeatedly shows the error message "post was flagged as spam by Akismet".

    So I can't post my ipfw output here. Stupid piece of software!


  • Rebel Alliance Developer Netgate

    @jacotec said in Traffic not going to Limiters after 2.4.4:

    Now this forum software goes crazy as well ... wanted to add my ipfw output to my post and the forum repeatedly shows the error message "post was flagged as spam by Akismet".

    So I can't post my ipfw output here. Stupid piece of software!

    Post it as a text attachment and not in the body of a message.



  • OK, I found that the /tmp/rules.limiter file did not show all of the queues after I've recreated them, also the ones which were there did not match to what I've configured.

    It was interesting that the file did get a new timestamp after I've applied a change in the limiters GUI, but there was no change in the file content!

    I've deleted the file completely and touched a queue in the GUI, then I've applied the limiters again in the GUI. The file which was created now looks good!

    Afterward I needed to touch every firewall rule again (I've set them all to the base limiter and applied, then set them back to the child queues and applied again). I now see some traffic in the queues.

    It's hard to say that fast if it's working as expected as the limiter info is neither fast nor gives too much detailed information, but I'll observe it.



  • @jacotec Hey!! I'm glad u're having the same problem I initially reported at the beginning of my post. For a while I thought I was being ignored by developers...

    Now that u have created your subqueues on the GUI (which of course it's an upgrade bug, they shouldn't have been deleted at first), we can move forward and analyze why rules are not sending traffic to subqueues, which IMHO it's a serious bug because it implies low level pieces of software (kernel, pf, etc).

    So, @jimp here it's my contents:

    [2.4.4-RELEASE][root@firewall]/root: ipfw queue show
    q00001  50 sl. 0 flows (256 buckets) sched 1 weight 20 lmax 0 pri 0 droptail
        mask:  0x00 0x00000000/0x0000 -> 0xffffffff/0x0000
    q00002  50 sl. 0 flows (256 buckets) sched 1 weight 1 lmax 0 pri 0 droptail
        mask:  0x00 0x00000000/0x0000 -> 0xffffffff/0x0000
    q00003  50 sl. 0 flows (256 buckets) sched 2 weight 20 lmax 0 pri 0 droptail
        mask:  0x00 0xffffffff/0x0000 -> 0x00000000/0x0000
    q00004  50 sl. 0 flows (256 buckets) sched 2 weight 1 lmax 0 pri 0 droptail
        mask:  0x00 0xffffffff/0x0000 -> 0x00000000/0x0000
    
    [2.4.4-RELEASE][root@firewall]/root: ipfw pipe show
    00001:   9.500 Mbit/s    0 ms burst 0 
    q131073  50 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail
     sched 65537 type FIFO flags 0x0 0 buckets 0 active
    00002: 950.000 Kbit/s    0 ms burst 0 
    q131074  50 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail
     sched 65538 type FIFO flags 0x0 0 buckets 0 active
    
    [2.4.4-RELEASE][root@firewall]/root: cat /tmp/rules.limiter 
    
    pipe 1 config  bw 9500Kb droptail
    sched 1 config pipe 1 type fifo
    queue 1 config pipe 1 weight 20 mask dst-ip6 /128 dst-ip 0xffffffff droptail
    queue 2 config pipe 1 weight 1 mask dst-ip6 /128 dst-ip 0xffffffff droptail
     
    
    pipe 2 config  bw 950Kb droptail
    sched 2 config pipe 2 type fifo
    queue 3 config pipe 2 weight 20 mask src-ip6 /128 src-ip 0xffffffff droptail
    queue 4 config pipe 2 weight 1 mask src-ip6 /128 src-ip 0xffffffff droptail
    

    I will also post two of my rules that uses limiters from /tmp/rules.debug:

    pass  in  quick  on $LAN  $GWTC_failover_WiFi inet proto { tcp udp }  from 192.168.211.0/24 to any port $navegacion_libre tracker 0100000101 keep state  dnqueue( 3,1)  label "USER_RULE: LAN Nav"
    
    pass  in  quick  on $GUEST  $GWTC_failover_WiFi inet proto { tcp udp }  from 10.0.4.0/24 to any port $navegacion_libre tracker 1522121250 keep state  dnqueue( 4,2)  label "USER_RULE: Guest nav"
    

    I can also confirm that if I move traffic to base limiters as @jacotec did, I can see some activity. But of course we loose the dynamic bandwidth assignment if doing that:

    [2.4.4-RELEASE][root@firewall]/root: ipfw pipe show
    00001:   9.500 Mbit/s    0 ms burst 0 
    q131073  50 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0 droptail
     sched 65537 type FIFO flags 0x0 0 buckets 1 active
    BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
      0 ip           0.0.0.0/0             0.0.0.0/0        6     4581  0    0   0
    00002: 950.000 Kbit/s    0 ms burst 0 
    q131074  50 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail
     sched 65538 type FIFO flags 0x0 0 buckets 1 active
      0 ip           0.0.0.0/0             0.0.0.0/0        6      654  0    0   0
    

    Hope we can work together to debug this problem and find a solution. Thanks.
    Victor



  • @vpreatoni In my opinion at least your /tmp/rules.limiter seems to match your configuration. When you move all your firewall rules to the base queues, then press "reload the firewall" ... and then move all rules back to the child queues (reload firewall again then) - do you still see no traffic in the limiters?

    I've changed mine to "Codel" / "QFQ" now instead of "Taildrop/FIFO". Here I see the traffic in the children.

    As mentioned I needed to delete my /tmp/rules.limiter file and let it recreate by pfSense. But in my case it did not reflect my config. Maybe you do the same to be on the safe side? Do this before you switch away and back your firewall rules.



  • Just a shot in the dark, did you use fq_codel in the previous pfSense version? If yes, do still have a shell command, system patch or script active that was used to switch your limiters to fq_codel?

    I'm running a pretty complex HFSC based traffic shaping setup with multiple (child) queues, and that upgraded fine from 2.4.3p1 to 2.4.4 and works flawless here.



  • @jacotec WOW WOW WOW!!!! We have something!!! Changing Limiter to Codel/QFQ as suggested made it work.

    0_1538604620188_1.png

    In my case I just changed the Download Limiter, and u can see how dynamic Download queues get filled in:

    Limiters:
    00001:   9.500 Mbit/s    0 ms burst 0 
    q131073  50 sl. 0 flows (1 buckets) sched 65537 weight 0 lmax 0 pri 0  AQM CoDel target 5ms interval 100ms NoECN
     sched 65537 type FIFO flags 0x0 0 buckets 0 active
    00002: 950.000 Kbit/s    0 ms burst 0 
    q131074  50 sl. 0 flows (1 buckets) sched 65538 weight 0 lmax 0 pri 0 droptail
     sched 65538 type FIFO flags 0x0 0 buckets 0 active
    
    
    Queues:
    q00001  50 sl. 5 flows (256 buckets) sched 1 weight 20 lmax 1500 pri 0 droptail
        mask:  0x00 0x00000000/0x0000 -> 0xffffffff/0x0000
    BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pkt/Byte Drp
     90 ip           0.0.0.0/0      192.168.211.11/0        9     4635  0    0   0
     94 ip           0.0.0.0/0      192.168.211.15/0        1      330  0    0   0
     99 ip           0.0.0.0/0      192.168.211.50/0      112    20579  0    0   0
    108 ip           0.0.0.0/0      192.168.211.61/0        5      260  0    0   0
    109 ip           0.0.0.0/0      192.168.211.60/0       94    22025  0    0   0
    q00002  50 sl. 0 flows (256 buckets) sched 1 weight 1 lmax 1500 pri 0 droptail
        mask:  0x00 0x00000000/0x0000 -> 0xffffffff/0x0000
    q00003  50 sl. 0 flows (256 buckets) sched 2 weight 20 lmax 0 pri 0 droptail
        mask:  0x00 0xffffffff/0x0000 -> 0x00000000/0x0000
    q00004  50 sl. 0 flows (256 buckets) sched 2 weight 1 lmax 0 pri 0 droptail
        mask:  0x00 0xffffffff/0x0000 -> 0x00000000/0x0000
    

    I wouldn't imagine that choosing a more complex scheduler would make it work...
    Anyway, I see this as a "Yellow alarm", because there should be no reason for Taildrop/FIFO aqm/sched to fail.

    PS: I didn't need to delete /tmp/rules.limiters. As soon as I applied Limiters config, it began to work.



  • @grimson Nope, I've never used that. I was on a regular 2.4.3 installation.



  • @vpreatoni Good to hear it works for you now as well! 👍🏻



  • Can see that our traffic shaper is nonfunctional now as of 2.4.4 in terms of per-host dynamic bandwidth shaping.

    Work around for the missing queues/inability to create queues in 2.4.4 was to delete all limiters & queues, then recreate them, then reassign queues to firewall rules. This part worked eventually. Then, found out that the limiter diagnostic info was not functional with Taildrop/FIFO (ipfw pipe show and ipfw queue show). Switching to Codel/QFQ allowed monitoring queues using ipfw pipe show and ipfw queue show, but while the overall bandwidth limiting was working (capping to the max allocated bandwidth), the shaper was NOT actually shaping, either with Taildrop/FIFO or with Codel/QFQ. Codel/QFQ caused system to crash eventually and had to be manually restarted.

    Whole point of the setup was to do the amazing per-host dynamic bandwidth dividing that pfsense was so good with. Can confirm now that although the limiters/queues are recreated and working to limit the maximum aggregate bandwidth, the mask by destination (Down_LAN) or sources addresses (Up_LAN) does not seem to work. These queues are under the DownLimiter and UpLimiter limiters. Up_LAN is assigned to In pipe on a LAN interface firewall rule and Down_LAN is assigned to Out pipe. All was working before 2.4.4! The hosts always showed identical traffic during peak usage, dividing the total bandwidth evenly. This is nonfunctional now.

    Is there anyway I can directly verify if traffic is actually going through per-host queues when using Taildrop/FIFO though ipfw pipe show and ipfw queue show do not show that happening?



  • Update: Have now switched to Codel/Round Robin. This combination seems to work -- traffic goes to child queues as expected and we can achieve the per-host dynamic bandwidth allocation. Would be nice if any other combinations including Taildrop with FIFO or QFQ would also work in the future to try and find optimal settings.



  • After some testing, I can confirm CoDel/QFQ is pretty fucked up!. My server restarts evry 2 or 3 days use, and I get flooded with:

    qfq_dequeue BUG/* non-workconserving leaf */
    

    I'm attaching my debug info here: 0_1539455038230_textdump.tar.0
    Please any dev reply to this post. I'm happy to provide any debugging information or do some testing, but IT IS NOT SERIOUS TO RELEASE A VERSION SO FUCKED UP WITH BUGS, and there was no rush to do it. 2.4.3_1 was working fine. Our servers are production machines.



  • @vpreatoni said in Traffic not going to Limiters after 2.4.4:

    After some testing, I can confirm CoDel/QFQ is pretty fucked up!. My server restarts evry 2 or 3 days use, and I get flooded with:

    qfq_dequeue BUG/* non-workconserving leaf */
    

    This problem was driving me crazy, but I fix it by doing this:

    • Removed every limiter on traffic shaper page.
    • Full reboot of pfsense box
    • Recreated the limiters and I even used different names for them, just in case...
    • Assigned the new limiters to the relevant firewall rules.

    That fixed my problem.



  • I'm also having problems with limiters and shaper in general.
    I have upgraded to 2.4.4, and limiters dissapeared. I recreated them, but there was no incoming traffic at all. Upload traffic seemed ok. So, i reverted to 2.4.3 (it's a VM), but then i realized that in 2.4.3 i cannot edit the limiters, or the shaper, or bad things happen. Just by modifying the bandwidth on the limiters resulted in very low incoming bandwidth (something like 700 kbps instead of 3.5 mbps). So, i decided to delete all limiters and re-create them, and i got the same problem than with the upgrade to 2.4.4: no incoming traffic. Removing the traffic shaper (CBQ) also resulted on the same problem. Fortunately, i just reverted to a previous snapshot, and restored normal functionality. But it seems like i cannot modify anything at all on the shapers/limiters, or bad things happen. Not good.
    Maybe part of the issue was already on 2.4.3? 🤔
    Adding new functionality is nice, but lots of things can go wrong with every change. To me, reliability is much more important than new stuff, and lately i have second thoughts about clicking the update button.