A definitive, example-driven, HFSC Reference Thread

Nullity

"If my 600kbit upload is 80% utilized with a 480kbit backlog" you mean 100% utilized? A backlog indicates that packets are coming in faster than they're going out, which means your interface is at 100%. And your packets are not actually transferred at 12kb/s, they're transferred at full line rate.

"decoupling bandwidth and delay" just means latency is kept stable while bandwidth is honored through advanced scheduling, include noting how large the head packet is in each queue because dequeuing large packets takes longer than smaller ones.

It took me a bit to find a speedtest server that I didn't get my full 100mb to, but I found some in Europe that gave me around 80Mb. My queue sizes were pretty much 0 the entire time, a few blips into the teens. My upload did manage to reach 100 for a brief moment during the TCP building phase, which then backed off and stabilized with a 0 queue. My point being that if you're below 80% utilization, your queue should be pretty much empty the entire time.

I meant 12kbits, as in size, not bitrate. I was trying to simplify the bit/byte conversions.
About the utilization percentage; during a 1-second time-span, if I send 480kbits through a 600kbit connection, is it not 80% utilized? (kinda confused about this myself, but perhaps it is a conversation for another thread)

Just for clarity, can you point out what corrections I need to make to my original post?
I have kinda got lost in our long-winded posts.

Regarding how HFSC defines delay and what "decoupled bandwidth and delay" means;
Here is an excerpt about "Delay and fairness properties of H-FSC" and "Real-time Guarantees" from an HFSC paper: http://www.ecse.rpi.edu/homepages/koushik/shivkuma-teaching/sp2003/case/CaseStudy/stoica-hfsc-ton00.pdf

For the rest of the discussion, we consider the arrival time of a packet to be the time when the last bit of the packet has been received, and the departing time to be the time when the last bit of the packet has been transmitted.
…
Clearly, H-FSC achieves much lower delays for both audio and video sessions. The reduction in delay with H-FSC is especially significant for the audio session. This is a direct consequence of H-FSC’s ability to decouple delay and bandwidth allocation.

To achieve decoupled delay and bandwidth you must use a 2-part service curve by setting the m1, d, and m2 parameters. This is not done automatically and cannot be achieved any other way (except the somewhat depreciated u-max/d-max parameters). I do not think "decoupled bandwidth and delay" means what you think it means.

Can you please cite some sources with your posts?
I would rather focus on understanding the HFSC papers and share personal anecdotes later.

Harvy66

@Nullity:

@Harvy66:

"If my 600kbit upload is 80% utilized with a 480kbit backlog" you mean 100% utilized? A backlog indicates that packets are coming in faster than they're going out, which means your interface is at 100%. And your packets are not actually transferred at 12kb/s, they're transferred at full line rate.

"decoupling bandwidth and delay" just means latency is kept stable while bandwidth is honored through advanced scheduling, include noting how large the head packet is in each queue because dequeuing large packets takes longer than smaller ones.

It took me a bit to find a speedtest server that I didn't get my full 100mb to, but I found some in Europe that gave me around 80Mb. My queue sizes were pretty much 0 the entire time, a few blips into the teens. My upload did manage to reach 100 for a brief moment during the TCP building phase, which then backed off and stabilized with a 0 queue. My point being that if you're below 80% utilization, your queue should be pretty much empty the entire time.

I meant 12kbits, as in size, not bitrate. I was trying to simplify the bit/byte conversions.
About the utilization percentage; during a 1-second time-span, if I send 480kbits through a 600kbit connection, is it not 80% utilized? (kinda confused about this myself, but perhaps it is a conversation for another thread)

Just for clarity, can you point out what corrections I need to make to my original post?
I have kinda got lost in our long-winded posts.

Regarding how HFSC defines delay and what "decoupled bandwidth and delay" means;
Here is an excerpt about "Delay and fairness properties of H-FSC" and "Real-time Guarantees" from an HFSC paper: http://www.ecse.rpi.edu/homepages/koushik/shivkuma-teaching/sp2003/case/CaseStudy/stoica-hfsc-ton00.pdf

For the rest of the discussion, we consider the arrival time of a packet to be the time when the last bit of the packet has been received, and the departing time to be the time when the last bit of the packet has been transmitted.
…
Clearly, H-FSC achieves much lower delays for both audio and video sessions. The reduction in delay with H-FSC is especially significant for the audio session. This is a direct consequence of H-FSC’s ability to decouple delay and bandwidth allocation.

To achieve decoupled delay and bandwidth you must use a 2-part service curve by setting the m1, d, and m2 parameters. This is not done automatically and cannot be achieved any other way (except the somewhat depreciated u-max/d-max parameters). I do not think "decoupled bandwidth and delay" means what you think it means.

Can you please cite some sources with your posts?
I would rather focus on understanding the HFSC papers and share personal anecdotes later.

Backlog and utilization are two separate things. A backlog means packets are enqueuing faster than they're dequeuing, which also means your interface is at 100%. You can have 80% utilization without any backlog/buffering and is actually more often than not. The further you get past 80% the more common buffering becomes. And 80% doesn't mean a smooth 80%, it's an average, which means above and below 80%.

Ahhh.. Seems I misunderstood your 12kbit example.

"Decoupling" bandwidth and latency is talking about how buffering has been heavily used for naive traffic shapers. Bandwidth and delay(buffering) has gone hand-in-hand for a long time for many algorithms because they are much simpler to implement. What is hard to do is to give a queue a certain amount of bandwidth while maintaining guarantees on latency.

Harvy66

Realtime can only use up to 80% but gets a delay guarantee and linkshare can use up to 100% but can have "delay issues". 80% is fine for non-bulk flow types. It is plenty enough for my games, DNS, ICMP, NTP, etc. I can keep those kinds of traffic with guaranteed delays. HTTP, P2P like to have up to 100%, but are less latency sensitive. I'm not talking about large latency, just dequeuing jitter.

It is generally "bad" if a queue has realtime bandwidth assigned, but does not have enough realtime to satisfy its bandwidth needs. It would be typically undesirable to have your VoIP to have 1Mb of realtime and 1Mb of linkshare, but needs 2Mb of bandwidth. I assume the primary thing that would happen is as you go further and further beyond your realtime, jitter will start to approach that of linkshare. But below the realtime limit, all bandwidth should be "delay free".

In theory, if your total bandwidth usage is below 80%, there is no real difference between Linkshare and Realtime. In practice, packets arrive in bursts for one reason or another, so HFSC can smooth out those bursts and keep jitter low. For constant rate UDP style traffic, there is little can be done, but with TCP, the PPS per flow will attempt to stabilize.

I wonder what the target delay is for HFSC in PFSense, because it could possibly negatively interact with Codel which is also delay sensitive. This would probably be an issue as your approach link saturation, or above 80% utilization because HFSC is also "fair".

Based on the abstract summaries of HFSC, it sounds like there is no real "target delay", so much as tweaking the quantum of bytes to dequeue per iteration. I think HFSC's "delay" is effectively it's quantum. If you assume the quantum is 10,000 bytes, which is a number I have seen thrown around, then your delay will be roughly capped by the time it takes to transfer 10,000 bytes of data. I think this is why they recommend certain minimum connection speeds, because large quantum on slow connections can mess up the delays. But your quantum can be no smaller than your MTU.

Another issue that can affect delays is the thread scheduler. If the thread is only woken every 10ms, then the packet scheduler can be no more accurate. With PFSense on my current box, I see the CPU timer is 2,000/s which is 0.5ms.

Nullity

@Harvy66:

I wonder what the target delay is for HFSC in PFSense, because it could possibly negatively interact with Codel which is also delay sensitive. This would probably be an issue as your approach link saturation, or above 80% utilization because HFSC is also "fair".

Based on the abstract summaries of HFSC, it sounds like there is no real "target delay", so much as tweaking the quantum of bytes to dequeue per iteration. I think HFSC's "delay" is effectively it's quantum. If you assume the quantum is 10,000 bytes, which is a number I have seen thrown around, then your delay will be roughly capped by the time it takes to transfer 10,000 bytes of data. I think this is why they recommend certain minimum connection speeds, because large quantum on slow connections can mess up the delays. But your quantum can be no smaller than your MTU.

Another issue that can affect delays is the thread scheduler. If the thread is only woken every 10ms, then the packet scheduler can be no more accurate. With PFSense on my current box, I see the CPU timer is 2,000/s which is 0.5ms.

When you say things like this it is painfully obvious that you have not read the HFSC papers. The target delay is set by the user with the m1, d, and m2 parameters. If you read more than the paper's abstraction you would know this.

In the HFSC paper I cited earlier, dequeueing and delay are defined as two seperate conditions and should not be used interchangeably. Dequeueing overhead is measured, in the cited paper, to be below 20 microseconds on out-dated hardware. Delay is measured and configured using milliseconds. This is another example that you have not read the HFSC papers.

The term "quantum" is not used once in the HFSC paper I cited. Your use of this term is confusing. The paper also says nothing about a mimimum connection speed. Perhaps you are confusing HFSC with Codel?

The standard kern.hz is 1000 (though, nanobsd uses 100), so changing this setting is unneeded for most users. You are correct about system clock tick rates being important, but aside from that you are bordering on spreading misinformation.

Again, please cite your sources.

Harvy66

@Nullity:

@Harvy66:

I wonder what the target delay is for HFSC in PFSense, because it could possibly negatively interact with Codel which is also delay sensitive. This would probably be an issue as your approach link saturation, or above 80% utilization because HFSC is also "fair".

Based on the abstract summaries of HFSC, it sounds like there is no real "target delay", so much as tweaking the quantum of bytes to dequeue per iteration. I think HFSC's "delay" is effectively it's quantum. If you assume the quantum is 10,000 bytes, which is a number I have seen thrown around, then your delay will be roughly capped by the time it takes to transfer 10,000 bytes of data. I think this is why they recommend certain minimum connection speeds, because large quantum on slow connections can mess up the delays. But your quantum can be no smaller than your MTU.

Another issue that can affect delays is the thread scheduler. If the thread is only woken every 10ms, then the packet scheduler can be no more accurate. With PFSense on my current box, I see the CPU timer is 2,000/s which is 0.5ms.

When you say things like this it is painfully obvious that you have not read the HFSC papers. The target delay is set by the user with the m1, d, and m2 parameters. If you read more than the paper's abstraction you would know this.

In the HFSC paper I cited earlier, dequeueing and delay are defined as two seperate conditions and should not be used interchangeably. Dequeueing overhead is measured, in the cited paper, to be below 20 microseconds on out-dated hardware. Delay is measured and configured using milliseconds. This is another example that you have not read the HFSC papers.

The term "quantum" is not used once in the HFSC paper I cited. Your use of this term is confusing. The paper also says nothing about a mimimum connection speed. Perhaps you are confusing HFSC with Codel?

The standard kern.hz is 1000 (though, nanobsd uses 100), so changing this setting is unneeded for most users. You are correct about system clock tick rates being important, but aside from that you are bordering on spreading misinformation.

Again, please cite your sources.

If you actually read about implementations of HFSC, they all talk about quantum. "m1, d" are completely optional if you don't care about burst. You can still benefit from the service curves without those two parameters. The notion of a quantum is extremely wide spread for many types of buffer management that work with bytes, all with every so slightly different usages of the term, but pretty much the same no matter what. An example is fq_codel treats the term "quantum" nearly identically.

A quantum in this context is the number of maximum bytes to be dequeued for an pass/iteration/whatever. The scheduler will look at the head packet of each queue and combine the current service curves with the size of the head packet to determine which packets will get processed this quantum. Once the total number of bytes the quantum represents has been consumed, the scheduler will decide if another quantum must be consumed or to go to sleep. With HFSC, the priority will determine the order in which the packets are consumed for the quantum, but has no influence on which packets will be consumed.

If you have an older 10ms timer, many quantums may be consumed in order to play catch up, so packet scheduling tends to be more bursty.

Derelict

Please start another thread. HFSC Theory or something. This one is supposed to be for configuration examples.

Nullity

@Harvy66:

If you actually read about implementations of HFSC, they all talk about quantum. "m1, d" are completely optional if you don't care about burst. You can still benefit from the service curves without those two parameters. The notion of a quantum is extremely wide spread for many types of buffer management that work with bytes, all with every so slightly different usages of the term, but pretty much the same no matter what. An example is fq_codel treats the term "quantum" nearly identically.

A quantum in this context is the number of maximum bytes to be dequeued for an pass/iteration/whatever. The scheduler will look at the head packet of each queue and combine the current service curves with the size of the head packet to determine which packets will get processed this quantum. Once the total number of bytes the quantum represents has been consumed, the scheduler will decide if another quantum must be consumed or to go to sleep. With HFSC, the priority will determine the order in which the packets are consumed for the quantum, but has no influence on which packets will be consumed.

If you have an older 10ms timer, many quantums may be consumed in order to play catch up, so packet scheduling tends to be more bursty.

Please post a link to an HFSC paper that includes references to quantums.
Edit: Please PM me the link since our fruitless back and forth is cluttering this thread.

@Derelict:

Please start another thread. HFSC Theory or something. This one is supposed to be for configuration examples.

Apologies Derelict. :( I will delete my posts.
Could you add the few links from https://forum.pfsense.org/index.php?topic=79589.msg494188#msg494188 and perhaps http://www.ecse.rpi.edu/homepages/koushik/shivkuma-teaching/sp2003/case/CaseStudy/stoica-hfsc-ton00.pdf to the OP?

Derelict

Please don't delete. It's good content. If anything ask a mod to move them to a new thread.

1activegeek

First off, a hats off to @Derelict for starting this thread. And a HUGE thank you to @Georgeman for contributing the time and patience to help educate. Along with the others contributing. This thread has single handedly given me the most complete and best view of how to use HFSC for traffic shaping. I've read a lot, but some of the (as the title suggests) example-driven scenarios to help solidify and illustrate what is meant by a whole blurb of words. I liken this to watching Law & Order vs just reading the jury transcript and interpreting. (ignoring the fact of course that Law & Order is all fake).

That said, I've got a similar setup to Derelict from the first post, but currently just focusing on the WAN/LAN and will handle DMZ/Guest later. I did not start from scratch, only because I wanted some of the rules from the Wizard in place (RDP ports, Xbox ports, etc). Rather than note them all down and re-create, I figure I would let the wizard do the work. Since doing so, I've then created some additional categories and will be removing others once I figure out my issues illustrated here. I know there may be some nuance changes in the latest 2.3.x version which I'm currently running vs these threads from over 2yrs ago. So I figure why not continue a very USEFUL thread.

I've included a few screenshots to help illustrate a few points, and need a few bits of clarity to help:

1. Why are my Queues WAY above the actual Link values? My WAN link is ~125/13 (cable variations). As seen in the indication, I've got values for Bandwidth that measure ABOVE both of those numbers on LAN and WAN. I believe my understanding was Up/Down could happen on either link (examples of FTP established from LAN, but could upload or download files). But in my example I'm seeing numbers even above my 125Mb possible.
2. How can I identify why I have such a LARGE amount of DROPS? Are these all DROPS from actual Queue traffic, or is this also including FW drops based on bad traffic or rules terminating attempts to hit my network? I'm assuming this is only showing DROPS for the Queues, which when uploading data at about 7Mb Up, it starts dropping about 100 packets every refresh cycle watching the queues.
3. Assuming the DROPS are correct, why is the Quality monitoring showing NO packet loss? I should assume this would look similar to the trend of DROPS?

I've also added an additional screenshot of the 2 speed tests run on DSLreports speedtest to help illustrate a side benefit of enabling the Traffic Shaping - BufferBloat resolution! You can see it goes from a C, to an A and the quality even jumps up to an A+. Hoping some of you are still around the boards, and can offer some input to help me in moving along.

pf1.png_thumb

pf2.png_thumb

pf3.png_thumb

pf4.png_thumb

Harvy66

You can see in your LAN traffic that all of your data is actually going into qLink, that's at least one reason why you're not actually shaping
did you actually set an upper limit to your qInternet? Shaping only works well if you tell it how much bandwidth you have

georgeman

It looks that you are not properly tagging the outbound traffic, since most of your download ends up in the default queue. Tagging is best done with floating rules, action match, interface WAN, direction OUT (I neither remember nor care about what the wizard does). Remember that floating rules processing for match rules don't stop with the first match, so the LAST matching rule wins. Make sure you catch all relevant traffic with these rules.

As regards the drops, remember that drops are NOT a bad thing. Dropping packets is a natural way TCP has to tell the other end that the packet rate needs to be lowered. It is better for it to happen on your router, where you have control of it, instead of on some upstream ISP router. This is why it is SO important to set the correct upper limits for all this to work, as Harvy66 just said. If you set a higher-than-real upper limit, your pfSense will never drop packets, they will be dropped by the ISP router instead, so you won't be actually shaping anything

georgeman

Also, do you have a qDefault queue on your LAN? If you don't, this is (another) flaw in the wizard. When you tag a TCP packet going out of WAN, the return traffic (the actual download) gets into the queue on LAN that has the same name as the one previously tagged on WAN. If it is not there, ends up in the default queue. This seems to be your case

1activegeek

Removing my message, as 1 question is irrelevant, and the other is below in next post with more detail.

1activegeek

Ok, so I've worked on putting together the queues, and run into an issue stopping me from being able to get very far. It seems I can't leave the Bandwidth blank on the WAN/LAN "top level" queues. So I input the 95% values in there (10/125). I then attempted putting in the numbers as advised for qInternet (95% aka 10Mb) and qLink (20%) - but I can't save and create the qLink queue. I continue to get the message:

"The sum of child bandwidth is higher than parent."

And for clarity and reference, this is the current "planned" setup. For the time being I've used 5% so I can at least build out the Queues:
(All below are Bandwidth/Linkshare m2 values made to be the same per George instructions)

WAN - 10Mb (95%)
-qInternet - 95% or 10Mb
-qHighest - 15%
-qACK - 20%
-qHigh - 15%
-qMedium - 20%
-qDefault - 20%
-qLow - 8%
-qLowest - 2%
-qLink (Default) 20%
LAN - 125Mb (95%)
-qInternet - 95% or 125Mb
-qHighest - 15%
-qACK - 20%
-qHigh - 15%
-qMedium - 20%
-qDefault - 20%
-qLow - 8%
-qLowest - 2%
-qLink (Default) 20%

![Screen Shot 2016-06-25 at 10.34.15 PM.png](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.34.15 PM.png)
![Screen Shot 2016-06-25 at 10.34.15 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.34.15 PM.png_thumb)
![Screen Shot 2016-06-25 at 10.35.09 PM.png](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.35.09 PM.png)
![Screen Shot 2016-06-25 at 10.35.09 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.35.09 PM.png_thumb)
![Screen Shot 2016-06-25 at 10.40.45 PM.png](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.40.45 PM.png)
![Screen Shot 2016-06-25 at 10.40.45 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.40.45 PM.png_thumb)

georgeman

That's because the child queues are indeed exceeding the parent one!!

Just put 1Gbps or whatever the physical interface is (on the interface queue)

Harvy66

@georgeman:

That's because the child queues are indeed exceeding the parent one!!

Just put 1Gbps or whatever the physical interface is (on the interface queue)

If he sets the Link to 1Gb, then he needs to set the Upper Limit in the qInternet queues, which he has not.

Remember guys, "bandwidth" is the minimum bandwidth, but you need to still set your maximum.

@1activegeek:

Ok, so I've worked on putting together the queues, and run into an issue stopping me from being able to get very far. It seems I can't leave the Bandwidth blank on the WAN/LAN "top level" queues. So I input the 95% values in there (10/125). I then attempted putting in the numbers as advised for qInternet (95% aka 10Mb) and qLink (20%) - but I can't save and create the qLink queue. I continue to get the message:

"The sum of child bandwidth is higher than parent."

And for clarity and reference, this is the current "planned" setup. For the time being I've used 5% so I can at least build out the Queues:
(All below are Bandwidth/Linkshare m2 values made to be the same per George instructions)

WAN - 10Mb (95%)
-qInternet - 95% or 10Mb
-qHighest - 15%
-qACK - 20%
-qHigh - 15%
-qMedium - 20%
-qDefault - 20%
-qLow - 8%
-qLowest - 2%
-qLink (Default) 20%
LAN - 125Mb (95%)
-qInternet - 95% or 125Mb
-qHighest - 15%
-qACK - 20%
-qHigh - 15%
-qMedium - 20%
-qDefault - 20%
-qLow - 8%
-qLowest - 2%
-qLink (Default) 20%

95% + 20% > 100%

I don't get the LAN when you show "LAN - 125Mb (95%)" and "qInternet - 95% or 125Mb". If qInternet is 95% of LAN, and LAN is 125, then qInternet is ~119.

Nullity

Also, isn't "Bandwidth" really just link-share's m2 param (except on root interface queue, yeah?)? If so, that means it is only a proportional value, not a hard minimum/maximum. So, to easily avoid exceeding the parent just use low but proportionally equivilant values like 2Kb and 5Kb rather than 20Mb and 50Mb, respectively.

1activegeek

All - I'm going to thank everyone is advance for their help and patience. I think the picture is starting to become a bit clearer.

The reason I was setting my WAN/LAN is because pfSense won't let me NOT set these values. So I opted to fill them with the 95% of REAL bandwidth as suggested originally. This worked out to WAN = 10Mb, LAN =125Mb. The next instruction was to create the hierarchy of qInternet/qDNS-qBulk and qLink. Where qInternet was to match the 95% of REAL bandwidth (again inputting 10Mb/125Mb). qLink was then supposed to be set to 20%, which yes I get is above 100% (95+20>100). I basically hit the wall there not understanding how I could mimic the original setup.

So I think I get this but wanted to confirm a few ideas:

WAN should be set to the actual limit (aka 95% of REAL speed) for the interface because there is no overhead available between my modem and the provider?
qLink isn't really necessary on the WAN interface, because qLink is intended for local LAN traffic only?
LAN should be set to the line/link speed of the interface (aka 1Gb port, set to 1Gb Bandwidth) to allow for handling LOCAL traffic as well
UL needs to be set on qInternet for LAN (per Harvvy comment)

Assuming those 4 statements are correct, my setup would then be:

WAN - Bandwidth = 10Mb
- qInternet - Bandwidth = 10Mb / UL = 10Mb / LS = 10Mb
-qHighest - LS = 15%
-qACK - LS = 20%
-qHigh - LS = 15%
-qMedium - LS = 20%
-qDefault - LS = 20% (default)
-qLow - LS = 8%
-qLowest - LS = 2%
LAN - Bandwidth =1Gb
- qInternet - Bandwidth = 125Mb / UL = 125Mb / LS = 125Mb
-qHighest - LS = 15%
-qACK - LS = 20%
-qHigh - LS = 15%
-qMedium - LS = 20%
-qDefault - LS = 20%
-qLow - LS = 8%
-qLowest - LS = 2%
- qLink - Bandwidth = 875Mb / UL = 1Gb / LS = 875Mb (default)

(Crossing my fingers I'm headed in the right direction!) Thanks guys!

georgeman

Looks pretty much OK.

Not really a need to set UL on qLink, and I would still enforce the limit on qInternet on WAN and set the interface to the physical interface speed (so the limits are always in the same place, if you have to raise the limit in the future you will forget that it's set as well on the interface and will end up debugging this)

Tell us how it goes!

Harvy66

I agree with georgeman. Nothing wrong stands out.