HFSC shaper dropping packets before bandwidth reached

Harvy66

I see your queue sizes for some off the other queues are only 50. I'd recommend checking "Codel Active Queue" under a queues "Scheduler options" section. Averages are very deceiving, networks are bursty.

YouTube likes to send me an average of about 5Mb/s video streams, by bursting about 250KiB (170 1500byte packets) at 1Gb/s 2-3x per second. Percentiles are much more useful.

churchtechguy

@Harvy66:

I see your queue sizes for some off the other queues are only 50. I'd recommend checking "Codel Active Queue" under a queues "Scheduler options" section. Averages are very deceiving, networks are bursty.

YouTube likes to send me an average of about 5Mb/s video streams, by bursting about 250KiB (170 1500byte packets) at 1Gb/s 2-3x per second. Percentiles are much more useful.

Thanks for the info. I hadn't checked the Codel Active Queue because the documentation is so sketchy on all those options that I figured I best leave them alone. Should I enable that on every queue, parent and child? Is there somewhere I can read up on what exactly "Codel Active Queue" does?

What do you mean by "percentiles are more useful"? Are you meaning looking at bandwidth used as an aggregate over time like under the monitoring RRD graphs?

Harvy66

With HFSC, only leaf queues can have traffic, so no point checking parent queues.

Codel is a dynamic AQM that has a goal of not having any packet spending more than some target(10ms default I think) in the queue, and uses an interval (100ms default I think) where it will drop a packet and wait another interval before dropping another packet, with non-linear reducing intervals. 10ms is saying it does not want more than 10ms of bufferbloat and 100ms is because assuming a 100ms RTT(ping), it will take that long before the sender gets the signal to back-off.

Codel does a few things

Has a large queue that allows transient bursts to be readily absorbed
Uses Head-drop instead of tail-drop, which penalizes old traffic, not new traffic, while also giving quicker signalling to the sender
Ramps up the drop rate, trying to stay as small as possible. Since it drops 1 packet at a time instead of bursts like a full FIFO buffer, it allows for better overall utilization.

It also has a secondary affect of affecting "bandwidth hogs" more than small light flows. This not only helps stabilize the bandwidth, but it keeps latency low, minimizes packetloss, and to some degree helps distribute bandwidth.

This is one of the first of its kind. There are newer better ones, like fq_Codel or Cake.

Yes, like the aggregate over time, but at much smaller time-slices, which do not exist.

churchtechguy

I'm at the remote site this afternoon and I'm testing different settings for the traffic shaper. I cannot find one that is consistently good. I'm trying to check using the DSLReports buffer bloat test tool.

What I'm seeing is that the leaf queue that I have qStreaming is dropping and queueing packets at around 400Kbps bandwidth when the queue has 2880K available to it. Sometimes I get all A's on the DSL reports test, and then I test again and I'm getting C's and F's. What would cause a queue to prematurely queue packets before it gets to its bandwidth limit?

I Enabled "Codel Active Queue" on all of my queues and I thought it made a difference. But now I'm back to getting F's on the test, and dropping all kinds of packets when my queue length never goes above zero. I thought I used to understand hfsc traffic shaping on pfsense, but nothing makes sense anymore. I'm not sure if I'm dealing with bugs or misconfigurations or both.

Are there any other testing tools than the ones I'm using? I used to rely on pftop and the web gui to see what is happening with the queues, but that doesn't seem to work any more.

Should I be assigning Codel Active to all Queues, or just those I need low delay on? Should it be applied to a leaf queue that has child queues?

Another fun fact I'll throw in there, I'm using a LAN rule to "mark" packets in order to catch them on the WAN side of things for shaping. Could that be causing any issues?

churchtechguy

I noticed something when I was comparing pfTop with the web GUI interface. It looks like if I multiply the B/S column value by 8 I get close to the number showing in the GUI. So, if I'm not off base here I would say the GUI is displaying the traffic incorrctly as Kbits/s when it is really showing the value of KBytes per second. So, that probably accounts for some of the reason that I thought my queues were dropping packets before they were full. Is this a bug?

photo upload

Harvy66

Does qLimit have its upper limit set?
Are you using RealTime at all? Because it pulls directly from the root queue and IGNORES upper limits
Simple thing to try is to set your WAN interface to be limited to your 4.8Mb/s that qLimit has. This way there is zero question if some traffic is being missed and bypassing qLimit.
Don't forget to shape LAN. While downloading tends to have much less bufferbloat, if you're not shaping your downloads, it could be adding to your overall bloat.

churchtechguy

@Harvy66:

Does qLimit have its upper limit set?

Yes, I have qLimit set at 4800 bits per second.

Are you using RealTime at all? Because it pulls directly from the root queue and IGNORES upper limits

Yes, I have my qVOIP setup to have 180Kb/second realtime. However, there was no VOIP traffic on the network at the time of my testing.

Simple thing to try is to set your WAN interface to be limited to your 4.8Mb/s that qLimit has. This way there is zero question if some traffic is being missed and bypassing qLimit.

I will try this suggestion and see what it does for me. I have set them both the same in the past, but I don't know what other settings I was using in combination with them at the time.

Don't forget to shape LAN. While downloading tends to have much less bufferbloat, if you're not shaping your downloads, it could be adding to your overall bloat.

I guess this is a possiblity, I don't shape my LAN traffic for two reasons: One is that lots of people say it makes no difference to shape traffic that's already delivered to the interface (I guess unless you wanted to artifically limit some traffic). The other is that I have multiple LAN interfaces and from what I've figured out the queues don't talk to each other. (Though I could be wrong on this) At the time of my testing my 60Mbit per second download speed was not tapped more than 50% so I don't think that the downloads are a contributing factor.

So, I'm wondering what you think about the difference between pfTop and the Monitoring > Queues gui for traffic reporting? Which can I trust? Is there any way to get a more instantaneous refresh on these so I can see what's happening better in real time?

Thanks for your help with this.

Nullity

@churchtechguy:

I noticed something when I was comparing pfTop with the web GUI interface. It looks like if I multiply the B/S column value by 8 I get close to the number showing in the GUI. So, if I'm not off base here I would say the GUI is displaying the traffic incorrctly as Kbits/s when it is really showing the value of KBytes per second. So, that probably accounts for some of the reason that I thought my queues were dropping packets before they were full. Is this a bug?

The GUI has always been screwy. pftop is the best source. Did pftop ever show dropped packets?

churchtechguy

@Nullity:

The GUI has always been screwy. pftop is the best source. Did pftop ever show dropped packets?

Yes, pfTop shows dropped packets, however I hardly ever see the queue length change from zero even when it is dropping packets, and it seems to drop the packets at just a fraction of the bandwidth I have designated for the queue.

Nullity

@churchtechguy:

@Nullity:

The GUI has always been screwy. pftop is the best source. Did pftop ever show dropped packets?

Yes, pfTop shows dropped packets, however I hardly ever see the queue length change from zero even when it is dropping packets, and it seems to drop the packets at just a fraction of the bandwidth I have designated for the queue.

Have you double-checked that your WAN link isn't having bandwidth problems?