A definitive, example-driven, HFSC Reference Thread

Derelict

I highly doubt it. I don't know where they'd put them in the frame/packet.

Verified -

"Tags are internal identifiers. Tags are not sent out over the wire."

http://www.openbsd.org/faq/pf/tagging.html

You should be able to match on a pf internal tag and set a DSCP tag instead. That should survive the trip if your gear is set to trust them.

Nullity

Edit: I created a thread to dedicated to understanding HFSC, focusing on it's "decoupled bandwidth and delay" capabilities. That thread will (hopefully) have a better revision of this post.
https://forum.pfsense.org/index.php?topic=89367.0

–--Original post----

Great thread! :)
Though, I see nothing about the primary reason for HFSC's adoption; the separation of bandwidth and delay allocation. Without employing this feature HFSC is only a slight improvement over previous class-based hierarchical link-sharing algorithms.

Only use real-time when you require it. It is "unfair", unlike link-share. Please read documentation for more details. Generally, only NTP and VOIP should be employing real-time queueing.

The wording used in the HFSC paper is "decoupled delay and bandwidth allocation". Usually, you can only allocate bandwidth and delay together. Let's say you allocate 25kbit to NTP (average speed). Sadly, this means in a worst-case scenario that a 1500byte (12kbit) NTP packet may take approximately 500ms to completely transmit upon receipt. This 500ms delay is unacceptable with NTP.

HFSC allows you to allocate not only an average bitrate of 25kbit but also an initial bandwidth or a "burst", as it is sometimes called. To improve the delay of NTP we will set m1 to 480kbit (80% of my 600kbit upper-limit for upload, which I think is the max for real-time allocation) which means a 1500byte (12kbit) packet would send in 25ms, so set d to 26ms (give d a little room to relax, so I add 1ms to 25ms), then set m2 to 25kbit. Now NTP packets are guaranteed to send within 26ms. Hell of an improvement over 500ms and I still have only 25kbit allocated of my 600kbit upload. NTP packets are now allocated the delay of a 480kbit connection, but the bandwidth of a 25kbit connection. This is delay and bandwidth decoupled.

Delay is measured as the time between the last bit being received and the last bit being transmitted.

Here are some links that helped me.
http://man7.org/linux/man-pages/man7/tc-hfsc.7.html
http://www.sonycsl.co.jp/person/kjc/software/TIPS.txt
http://serverfault.com/questions/105014/does-anyone-really-understand-how-hfsc-scheduling-in-linux-bsd-works
http://linux-ip.net/articles/hfsc.en/
and any texts I could find from the HFSC authors. The papers include lots of non-academian-level information, so do not be afraid to read them.

Please post any questions or corrections. :)

Harvy66

HFSC does not affect how quickly a packet will be serialized, but will affect when a packet will be sent. An issue you can get on low bandwidth connections as your example is that a 1500MTU is quite large relative to the time slices the scheduler is targeting.

fq_Codel mentioned this same issue when determining which bucket to dequeue from next is connections below 10Mb need to increase their target latency to accommodate large packets. 100Mb+ is optimal for 1500MTUs and the default 5ms target.

Realtime is useful for any traffic you feel should have crazy low jitter, but should not use linkshare at all. If your traffic makes use of link share at all, then realtime is not a good fit, just use linkshare. This kind of goes with link utilization info that many upstream providers have been talking about over the years. A link is considered 100% at 80%. This is because prior to 80% utilization, the buffers are primarily empty. As you get past 80%, buffers start to grow. Many hardware QoS implementations on even high end managed switches also use this logic. If the port is below 80% utilization, QoS is disabled.

This same idea applies to realtime in HFSC. If you are at or below 80%, there should be roughly "0" dequeue latency, even if the connection is at 100%. The remaining 20% that is above the 80% is your link share and is subject to increasing amounts of jitter as you approach 100%, but realtime should be nearly unaffected. "Zero" latency is relative to the quantum of time that HFSC is targeting.

I have a relatively stable ping to Google, like fractional milliseconds of variation when averaged over 30+ seconds. With HFSC on PFSense, I can be at 100% utilization and not see a difference. The measurements I was taking at the time was using hrping which gives the jitter within a single standard deviation. When my upload was at 100%, the jitter was "identical" to the tenths position(0.1ms). I probably get crazy good results because I have a 1Gb connection that is rate limited to 100Mb. This means my NIC can put packets on the line really fast relative to my bandwidth. If I was trying to move 1Gb over my 1Gb link, it probably wouldn't be as stable, but I'm sure it would still be "great".

Nullity

@Harvy66:

HFSC does not affect how quickly a packet will be serialized, but will affect when a packet will be sent. An issue you can get on low bandwidth connections as your example is that a 1500MTU is quite large relative to the time slices the scheduler is targeting.

fq_Codel mentioned this same issue when determining which bucket to dequeue from next is connections below 10Mb need to increase their target latency to accommodate large packets. 100Mb+ is optimal for 1500MTUs and the default 5ms target.

Realtime is useful for any traffic you feel should have crazy low jitter, but should not use linkshare at all. If your traffic makes use of link share at all, then realtime is not a good fit, just use linkshare. This kind of goes with link utilization info that many upstream providers have been talking about over the years. A link is considered 100% at 80%. This is because prior to 80% utilization, the buffers are primarily empty. As you get past 80%, buffers start to grow. Many hardware QoS implementations on even high end managed switches also use this logic. If the port is below 80% utilization, QoS is disabled.

This same idea applies to realtime in HFSC. If you are at or below 80%, there should be roughly "0" dequeue latency, even if the connection is at 100%. The remaining 20% that is above the 80% is your link share and is subject to increasing amounts of jitter as you approach 100%, but realtime should be nearly unaffected. "Zero" latency is relative to the quantum of time that HFSC is targeting.

I have a relatively stable ping to Google, like fractional milliseconds of variation when averaged over 30+ seconds. With HFSC on PFSense, I can be at 100% utilization and not see a difference. The measurements I was taking at the time was using hrping which gives the jitter within a single standard deviation. When my upload was at 100%, the jitter was "identical" to the tenths position(0.1ms). I probably get crazy good results because I have a 1Gb connection that is rate limited to 100Mb. This means my NIC can put packets on the line really fast relative to my bandwidth. If I was trying to move 1Gb over my 1Gb link, it probably wouldn't be as stable, but I'm sure it would still be "great".

If my 600kbit upload is 80% utilized with a 480kbit backlog and then received 1500byte (12kbit) I would have a 800ms delay added to the best-case delay of 20ms (1500bytes/12kbits @ 600kbit) without QoS. I do not agree with your statement that "Many hardware QoS implementations on even high end managed switches also use this logic. If the port is below 80% utilization, QoS is disabled."

Have you read the HFSC paper(s)?
HFSC is all about delay improvements over previous queueuing algorithms by decoupling bandwidth and delay. It says this in the introduction to the paper. It is not really a point that can be argued with because they have the mathematical proofs and ran simulations to back up their claims.

Harvy66

@Nullity:

@Harvy66:

HFSC does not affect how quickly a packet will be serialized, but will affect when a packet will be sent. An issue you can get on low bandwidth connections as your example is that a 1500MTU is quite large relative to the time slices the scheduler is targeting.

fq_Codel mentioned this same issue when determining which bucket to dequeue from next is connections below 10Mb need to increase their target latency to accommodate large packets. 100Mb+ is optimal for 1500MTUs and the default 5ms target.

Realtime is useful for any traffic you feel should have crazy low jitter, but should not use linkshare at all. If your traffic makes use of link share at all, then realtime is not a good fit, just use linkshare. This kind of goes with link utilization info that many upstream providers have been talking about over the years. A link is considered 100% at 80%. This is because prior to 80% utilization, the buffers are primarily empty. As you get past 80%, buffers start to grow. Many hardware QoS implementations on even high end managed switches also use this logic. If the port is below 80% utilization, QoS is disabled.

This same idea applies to realtime in HFSC. If you are at or below 80%, there should be roughly "0" dequeue latency, even if the connection is at 100%. The remaining 20% that is above the 80% is your link share and is subject to increasing amounts of jitter as you approach 100%, but realtime should be nearly unaffected. "Zero" latency is relative to the quantum of time that HFSC is targeting.

I have a relatively stable ping to Google, like fractional milliseconds of variation when averaged over 30+ seconds. With HFSC on PFSense, I can be at 100% utilization and not see a difference. The measurements I was taking at the time was using hrping which gives the jitter within a single standard deviation. When my upload was at 100%, the jitter was "identical" to the tenths position(0.1ms). I probably get crazy good results because I have a 1Gb connection that is rate limited to 100Mb. This means my NIC can put packets on the line really fast relative to my bandwidth. If I was trying to move 1Gb over my 1Gb link, it probably wouldn't be as stable, but I'm sure it would still be "great".

If my 600kbit upload is 80% utilized with a 480kbit backlog and then received 1500byte (12kbit) I would have a 800ms delay added to the best-case delay of 20ms (12kbit @ 600kbit) without QoS. I do not agree with your statement that "Many hardware QoS implementations on even high end managed switches also use this logic. If the port is below 80% utilization, QoS is disabled."

Have you read the HFSC paper(s)?
HFSC is all about delay improvements over previous queueuing algorithms by decoupling bandwidth and delay. It says this in the introduction to the paper. It is not really a point that can be argued with because they have the mathematical proofs and ran simulations to back up their claims.

"If my 600kbit upload is 80% utilized with a 480kbit backlog" you mean 100% utilized? A backlog indicates that packets are coming in faster than they're going out, which means your interface is at 100%. And your packets are not actually transferred at 12kb/s, they're transferred at full line rate.

"decoupling bandwidth and delay" just means latency is kept stable while bandwidth is honored through advanced scheduling, include noting how large the head packet is in each queue because dequeuing large packets takes longer than smaller ones.

It took me a bit to find a speedtest server that I didn't get my full 100mb to, but I found some in Europe that gave me around 80Mb. My queue sizes were pretty much 0 the entire time, a few blips into the teens. My upload did manage to reach 100 for a brief moment during the TCP building phase, which then backed off and stabilized with a 0 queue. My point being that if you're below 80% utilization, your queue should be pretty much empty the entire time.

Nullity

@Harvy66:

"If my 600kbit upload is 80% utilized with a 480kbit backlog" you mean 100% utilized? A backlog indicates that packets are coming in faster than they're going out, which means your interface is at 100%. And your packets are not actually transferred at 12kb/s, they're transferred at full line rate.

"decoupling bandwidth and delay" just means latency is kept stable while bandwidth is honored through advanced scheduling, include noting how large the head packet is in each queue because dequeuing large packets takes longer than smaller ones.

It took me a bit to find a speedtest server that I didn't get my full 100mb to, but I found some in Europe that gave me around 80Mb. My queue sizes were pretty much 0 the entire time, a few blips into the teens. My upload did manage to reach 100 for a brief moment during the TCP building phase, which then backed off and stabilized with a 0 queue. My point being that if you're below 80% utilization, your queue should be pretty much empty the entire time.

I meant 12kbits, as in size, not bitrate. I was trying to simplify the bit/byte conversions.
About the utilization percentage; during a 1-second time-span, if I send 480kbits through a 600kbit connection, is it not 80% utilized? (kinda confused about this myself, but perhaps it is a conversation for another thread)

Just for clarity, can you point out what corrections I need to make to my original post?
I have kinda got lost in our long-winded posts.

Regarding how HFSC defines delay and what "decoupled bandwidth and delay" means;
Here is an excerpt about "Delay and fairness properties of H-FSC" and "Real-time Guarantees" from an HFSC paper: http://www.ecse.rpi.edu/homepages/koushik/shivkuma-teaching/sp2003/case/CaseStudy/stoica-hfsc-ton00.pdf

For the rest of the discussion, we consider the arrival time of a packet to be the time when the last bit of the packet has been received, and the departing time to be the time when the last bit of the packet has been transmitted.
…
Clearly, H-FSC achieves much lower delays for both audio and video sessions. The reduction in delay with H-FSC is especially significant for the audio session. This is a direct consequence of H-FSC’s ability to decouple delay and bandwidth allocation.

To achieve decoupled delay and bandwidth you must use a 2-part service curve by setting the m1, d, and m2 parameters. This is not done automatically and cannot be achieved any other way (except the somewhat depreciated u-max/d-max parameters). I do not think "decoupled bandwidth and delay" means what you think it means.

Can you please cite some sources with your posts?
I would rather focus on understanding the HFSC papers and share personal anecdotes later.

Harvy66

@Nullity:

@Harvy66:

"If my 600kbit upload is 80% utilized with a 480kbit backlog" you mean 100% utilized? A backlog indicates that packets are coming in faster than they're going out, which means your interface is at 100%. And your packets are not actually transferred at 12kb/s, they're transferred at full line rate.

"decoupling bandwidth and delay" just means latency is kept stable while bandwidth is honored through advanced scheduling, include noting how large the head packet is in each queue because dequeuing large packets takes longer than smaller ones.

It took me a bit to find a speedtest server that I didn't get my full 100mb to, but I found some in Europe that gave me around 80Mb. My queue sizes were pretty much 0 the entire time, a few blips into the teens. My upload did manage to reach 100 for a brief moment during the TCP building phase, which then backed off and stabilized with a 0 queue. My point being that if you're below 80% utilization, your queue should be pretty much empty the entire time.

I meant 12kbits, as in size, not bitrate. I was trying to simplify the bit/byte conversions.
About the utilization percentage; during a 1-second time-span, if I send 480kbits through a 600kbit connection, is it not 80% utilized? (kinda confused about this myself, but perhaps it is a conversation for another thread)

Just for clarity, can you point out what corrections I need to make to my original post?
I have kinda got lost in our long-winded posts.

Regarding how HFSC defines delay and what "decoupled bandwidth and delay" means;
Here is an excerpt about "Delay and fairness properties of H-FSC" and "Real-time Guarantees" from an HFSC paper: http://www.ecse.rpi.edu/homepages/koushik/shivkuma-teaching/sp2003/case/CaseStudy/stoica-hfsc-ton00.pdf

For the rest of the discussion, we consider the arrival time of a packet to be the time when the last bit of the packet has been received, and the departing time to be the time when the last bit of the packet has been transmitted.
…
Clearly, H-FSC achieves much lower delays for both audio and video sessions. The reduction in delay with H-FSC is especially significant for the audio session. This is a direct consequence of H-FSC’s ability to decouple delay and bandwidth allocation.

To achieve decoupled delay and bandwidth you must use a 2-part service curve by setting the m1, d, and m2 parameters. This is not done automatically and cannot be achieved any other way (except the somewhat depreciated u-max/d-max parameters). I do not think "decoupled bandwidth and delay" means what you think it means.

Can you please cite some sources with your posts?
I would rather focus on understanding the HFSC papers and share personal anecdotes later.

Backlog and utilization are two separate things. A backlog means packets are enqueuing faster than they're dequeuing, which also means your interface is at 100%. You can have 80% utilization without any backlog/buffering and is actually more often than not. The further you get past 80% the more common buffering becomes. And 80% doesn't mean a smooth 80%, it's an average, which means above and below 80%.

Ahhh.. Seems I misunderstood your 12kbit example.

"Decoupling" bandwidth and latency is talking about how buffering has been heavily used for naive traffic shapers. Bandwidth and delay(buffering) has gone hand-in-hand for a long time for many algorithms because they are much simpler to implement. What is hard to do is to give a queue a certain amount of bandwidth while maintaining guarantees on latency.

Harvy66

Realtime can only use up to 80% but gets a delay guarantee and linkshare can use up to 100% but can have "delay issues". 80% is fine for non-bulk flow types. It is plenty enough for my games, DNS, ICMP, NTP, etc. I can keep those kinds of traffic with guaranteed delays. HTTP, P2P like to have up to 100%, but are less latency sensitive. I'm not talking about large latency, just dequeuing jitter.

It is generally "bad" if a queue has realtime bandwidth assigned, but does not have enough realtime to satisfy its bandwidth needs. It would be typically undesirable to have your VoIP to have 1Mb of realtime and 1Mb of linkshare, but needs 2Mb of bandwidth. I assume the primary thing that would happen is as you go further and further beyond your realtime, jitter will start to approach that of linkshare. But below the realtime limit, all bandwidth should be "delay free".

In theory, if your total bandwidth usage is below 80%, there is no real difference between Linkshare and Realtime. In practice, packets arrive in bursts for one reason or another, so HFSC can smooth out those bursts and keep jitter low. For constant rate UDP style traffic, there is little can be done, but with TCP, the PPS per flow will attempt to stabilize.

I wonder what the target delay is for HFSC in PFSense, because it could possibly negatively interact with Codel which is also delay sensitive. This would probably be an issue as your approach link saturation, or above 80% utilization because HFSC is also "fair".

Based on the abstract summaries of HFSC, it sounds like there is no real "target delay", so much as tweaking the quantum of bytes to dequeue per iteration. I think HFSC's "delay" is effectively it's quantum. If you assume the quantum is 10,000 bytes, which is a number I have seen thrown around, then your delay will be roughly capped by the time it takes to transfer 10,000 bytes of data. I think this is why they recommend certain minimum connection speeds, because large quantum on slow connections can mess up the delays. But your quantum can be no smaller than your MTU.

Another issue that can affect delays is the thread scheduler. If the thread is only woken every 10ms, then the packet scheduler can be no more accurate. With PFSense on my current box, I see the CPU timer is 2,000/s which is 0.5ms.

Nullity

@Harvy66:

I wonder what the target delay is for HFSC in PFSense, because it could possibly negatively interact with Codel which is also delay sensitive. This would probably be an issue as your approach link saturation, or above 80% utilization because HFSC is also "fair".

Based on the abstract summaries of HFSC, it sounds like there is no real "target delay", so much as tweaking the quantum of bytes to dequeue per iteration. I think HFSC's "delay" is effectively it's quantum. If you assume the quantum is 10,000 bytes, which is a number I have seen thrown around, then your delay will be roughly capped by the time it takes to transfer 10,000 bytes of data. I think this is why they recommend certain minimum connection speeds, because large quantum on slow connections can mess up the delays. But your quantum can be no smaller than your MTU.

Another issue that can affect delays is the thread scheduler. If the thread is only woken every 10ms, then the packet scheduler can be no more accurate. With PFSense on my current box, I see the CPU timer is 2,000/s which is 0.5ms.

When you say things like this it is painfully obvious that you have not read the HFSC papers. The target delay is set by the user with the m1, d, and m2 parameters. If you read more than the paper's abstraction you would know this.

In the HFSC paper I cited earlier, dequeueing and delay are defined as two seperate conditions and should not be used interchangeably. Dequeueing overhead is measured, in the cited paper, to be below 20 microseconds on out-dated hardware. Delay is measured and configured using milliseconds. This is another example that you have not read the HFSC papers.

The term "quantum" is not used once in the HFSC paper I cited. Your use of this term is confusing. The paper also says nothing about a mimimum connection speed. Perhaps you are confusing HFSC with Codel?

The standard kern.hz is 1000 (though, nanobsd uses 100), so changing this setting is unneeded for most users. You are correct about system clock tick rates being important, but aside from that you are bordering on spreading misinformation.

Again, please cite your sources.

Harvy66

@Nullity:

@Harvy66:

I wonder what the target delay is for HFSC in PFSense, because it could possibly negatively interact with Codel which is also delay sensitive. This would probably be an issue as your approach link saturation, or above 80% utilization because HFSC is also "fair".

Based on the abstract summaries of HFSC, it sounds like there is no real "target delay", so much as tweaking the quantum of bytes to dequeue per iteration. I think HFSC's "delay" is effectively it's quantum. If you assume the quantum is 10,000 bytes, which is a number I have seen thrown around, then your delay will be roughly capped by the time it takes to transfer 10,000 bytes of data. I think this is why they recommend certain minimum connection speeds, because large quantum on slow connections can mess up the delays. But your quantum can be no smaller than your MTU.

Another issue that can affect delays is the thread scheduler. If the thread is only woken every 10ms, then the packet scheduler can be no more accurate. With PFSense on my current box, I see the CPU timer is 2,000/s which is 0.5ms.

When you say things like this it is painfully obvious that you have not read the HFSC papers. The target delay is set by the user with the m1, d, and m2 parameters. If you read more than the paper's abstraction you would know this.

In the HFSC paper I cited earlier, dequeueing and delay are defined as two seperate conditions and should not be used interchangeably. Dequeueing overhead is measured, in the cited paper, to be below 20 microseconds on out-dated hardware. Delay is measured and configured using milliseconds. This is another example that you have not read the HFSC papers.

The term "quantum" is not used once in the HFSC paper I cited. Your use of this term is confusing. The paper also says nothing about a mimimum connection speed. Perhaps you are confusing HFSC with Codel?

The standard kern.hz is 1000 (though, nanobsd uses 100), so changing this setting is unneeded for most users. You are correct about system clock tick rates being important, but aside from that you are bordering on spreading misinformation.

Again, please cite your sources.

If you actually read about implementations of HFSC, they all talk about quantum. "m1, d" are completely optional if you don't care about burst. You can still benefit from the service curves without those two parameters. The notion of a quantum is extremely wide spread for many types of buffer management that work with bytes, all with every so slightly different usages of the term, but pretty much the same no matter what. An example is fq_codel treats the term "quantum" nearly identically.

A quantum in this context is the number of maximum bytes to be dequeued for an pass/iteration/whatever. The scheduler will look at the head packet of each queue and combine the current service curves with the size of the head packet to determine which packets will get processed this quantum. Once the total number of bytes the quantum represents has been consumed, the scheduler will decide if another quantum must be consumed or to go to sleep. With HFSC, the priority will determine the order in which the packets are consumed for the quantum, but has no influence on which packets will be consumed.

If you have an older 10ms timer, many quantums may be consumed in order to play catch up, so packet scheduling tends to be more bursty.

Derelict

Please start another thread. HFSC Theory or something. This one is supposed to be for configuration examples.

Nullity

@Harvy66:

If you actually read about implementations of HFSC, they all talk about quantum. "m1, d" are completely optional if you don't care about burst. You can still benefit from the service curves without those two parameters. The notion of a quantum is extremely wide spread for many types of buffer management that work with bytes, all with every so slightly different usages of the term, but pretty much the same no matter what. An example is fq_codel treats the term "quantum" nearly identically.

A quantum in this context is the number of maximum bytes to be dequeued for an pass/iteration/whatever. The scheduler will look at the head packet of each queue and combine the current service curves with the size of the head packet to determine which packets will get processed this quantum. Once the total number of bytes the quantum represents has been consumed, the scheduler will decide if another quantum must be consumed or to go to sleep. With HFSC, the priority will determine the order in which the packets are consumed for the quantum, but has no influence on which packets will be consumed.

If you have an older 10ms timer, many quantums may be consumed in order to play catch up, so packet scheduling tends to be more bursty.

Please post a link to an HFSC paper that includes references to quantums.
Edit: Please PM me the link since our fruitless back and forth is cluttering this thread.

@Derelict:

Please start another thread. HFSC Theory or something. This one is supposed to be for configuration examples.

Apologies Derelict. :( I will delete my posts.
Could you add the few links from https://forum.pfsense.org/index.php?topic=79589.msg494188#msg494188 and perhaps http://www.ecse.rpi.edu/homepages/koushik/shivkuma-teaching/sp2003/case/CaseStudy/stoica-hfsc-ton00.pdf to the OP?

Derelict

Please don't delete. It's good content. If anything ask a mod to move them to a new thread.

1activegeek

First off, a hats off to @Derelict for starting this thread. And a HUGE thank you to @Georgeman for contributing the time and patience to help educate. Along with the others contributing. This thread has single handedly given me the most complete and best view of how to use HFSC for traffic shaping. I've read a lot, but some of the (as the title suggests) example-driven scenarios to help solidify and illustrate what is meant by a whole blurb of words. I liken this to watching Law & Order vs just reading the jury transcript and interpreting. (ignoring the fact of course that Law & Order is all fake).

That said, I've got a similar setup to Derelict from the first post, but currently just focusing on the WAN/LAN and will handle DMZ/Guest later. I did not start from scratch, only because I wanted some of the rules from the Wizard in place (RDP ports, Xbox ports, etc). Rather than note them all down and re-create, I figure I would let the wizard do the work. Since doing so, I've then created some additional categories and will be removing others once I figure out my issues illustrated here. I know there may be some nuance changes in the latest 2.3.x version which I'm currently running vs these threads from over 2yrs ago. So I figure why not continue a very USEFUL thread.

I've included a few screenshots to help illustrate a few points, and need a few bits of clarity to help:

1. Why are my Queues WAY above the actual Link values? My WAN link is ~125/13 (cable variations). As seen in the indication, I've got values for Bandwidth that measure ABOVE both of those numbers on LAN and WAN. I believe my understanding was Up/Down could happen on either link (examples of FTP established from LAN, but could upload or download files). But in my example I'm seeing numbers even above my 125Mb possible.
2. How can I identify why I have such a LARGE amount of DROPS? Are these all DROPS from actual Queue traffic, or is this also including FW drops based on bad traffic or rules terminating attempts to hit my network? I'm assuming this is only showing DROPS for the Queues, which when uploading data at about 7Mb Up, it starts dropping about 100 packets every refresh cycle watching the queues.
3. Assuming the DROPS are correct, why is the Quality monitoring showing NO packet loss? I should assume this would look similar to the trend of DROPS?

I've also added an additional screenshot of the 2 speed tests run on DSLreports speedtest to help illustrate a side benefit of enabling the Traffic Shaping - BufferBloat resolution! You can see it goes from a C, to an A and the quality even jumps up to an A+. Hoping some of you are still around the boards, and can offer some input to help me in moving along.

pf1.png_thumb

pf2.png_thumb

pf3.png_thumb

pf4.png_thumb

Harvy66

You can see in your LAN traffic that all of your data is actually going into qLink, that's at least one reason why you're not actually shaping
did you actually set an upper limit to your qInternet? Shaping only works well if you tell it how much bandwidth you have

georgeman

It looks that you are not properly tagging the outbound traffic, since most of your download ends up in the default queue. Tagging is best done with floating rules, action match, interface WAN, direction OUT (I neither remember nor care about what the wizard does). Remember that floating rules processing for match rules don't stop with the first match, so the LAST matching rule wins. Make sure you catch all relevant traffic with these rules.

As regards the drops, remember that drops are NOT a bad thing. Dropping packets is a natural way TCP has to tell the other end that the packet rate needs to be lowered. It is better for it to happen on your router, where you have control of it, instead of on some upstream ISP router. This is why it is SO important to set the correct upper limits for all this to work, as Harvy66 just said. If you set a higher-than-real upper limit, your pfSense will never drop packets, they will be dropped by the ISP router instead, so you won't be actually shaping anything

georgeman

Also, do you have a qDefault queue on your LAN? If you don't, this is (another) flaw in the wizard. When you tag a TCP packet going out of WAN, the return traffic (the actual download) gets into the queue on LAN that has the same name as the one previously tagged on WAN. If it is not there, ends up in the default queue. This seems to be your case

1activegeek

Removing my message, as 1 question is irrelevant, and the other is below in next post with more detail.

1activegeek

Ok, so I've worked on putting together the queues, and run into an issue stopping me from being able to get very far. It seems I can't leave the Bandwidth blank on the WAN/LAN "top level" queues. So I input the 95% values in there (10/125). I then attempted putting in the numbers as advised for qInternet (95% aka 10Mb) and qLink (20%) - but I can't save and create the qLink queue. I continue to get the message:

"The sum of child bandwidth is higher than parent."

And for clarity and reference, this is the current "planned" setup. For the time being I've used 5% so I can at least build out the Queues:
(All below are Bandwidth/Linkshare m2 values made to be the same per George instructions)

WAN - 10Mb (95%)
-qInternet - 95% or 10Mb
-qHighest - 15%
-qACK - 20%
-qHigh - 15%
-qMedium - 20%
-qDefault - 20%
-qLow - 8%
-qLowest - 2%
-qLink (Default) 20%
LAN - 125Mb (95%)
-qInternet - 95% or 125Mb
-qHighest - 15%
-qACK - 20%
-qHigh - 15%
-qMedium - 20%
-qDefault - 20%
-qLow - 8%
-qLowest - 2%
-qLink (Default) 20%

![Screen Shot 2016-06-25 at 10.34.15 PM.png](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.34.15 PM.png)
![Screen Shot 2016-06-25 at 10.34.15 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.34.15 PM.png_thumb)
![Screen Shot 2016-06-25 at 10.35.09 PM.png](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.35.09 PM.png)
![Screen Shot 2016-06-25 at 10.35.09 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.35.09 PM.png_thumb)
![Screen Shot 2016-06-25 at 10.40.45 PM.png](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.40.45 PM.png)
![Screen Shot 2016-06-25 at 10.40.45 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2016-06-25 at 10.40.45 PM.png_thumb)

georgeman

That's because the child queues are indeed exceeding the parent one!!

Just put 1Gbps or whatever the physical interface is (on the interface queue)