A definitive, example-driven, HFSC Reference Thread

georgeman

I've been quite busy lately, but I would like to keep helping here :)

@georgeman:

To keep this simple, always try that the sum of the linkshare values from the children queues sum up to the value of the parent queue. This is because HSFC uses a "sustractive" method for the percentages (I can elaborate of this later).

Could you elaborate on this? Your other information has helped me learn more about this.

A CBQ parent queue assigns a generic "100%" of its bandwidth to be shared by its children. With HFSC, the percentage is an absolute value, not a fraction of the parent.

Best way to understand this is to analyze this example, taken from the book "Building Firewalls with OpenBSD and PF - 2nd Edition", by Jacek Artymiak

While CBQ uses a 'proportional' method, HFSC uses a 'subtractive' method. To see how it works in practice, compare the following rules, which divide bandwidth in the same way, yet the percentage notation is different:

CBQ

altq on $ext_if cbq bandwidth 20Mb
queue{dmznet, prvnet, others}

prvnet gets 8Mb

queue prvnet bandwidth 40% queue{host1, host2}

host1 gets 4Mb

queue host1 bandwidth 50%

host2 gets 4Mb

queue host2 bandwidth 50%

–--

HFSC

altq on $ext_if hfsc bandwidth 20Mb
queue{dmznet, prvnet, others}

prvnet gets 8Mb

queue prvnet hfsc(linkshare 40%) queue{host1, host2}

host1 gets 4Mb

queue host1 hfsc(linkshare 20%)

host2 gets 4Mb

queue host2 hfsc(linkshare 20%)

Basically, the 40% assigned to the parent is divided (50%-50%) on CBQ (relative percentage), while is (20%-20%) on HFSC (absolute percentage)

Best regards!

killerb81

I was wondering if anyone here can lend a hand.
I've read this thread line by line - there's a lot of great info here, but I still can't get my QoS to work.
Everything is still on the default queues for each interface.

I'm using CBQ, but it's more or less the same setup.

I have 3 WANS, and 1 LAN.
The WANs are:
WAN
PIAUS <– OpenVPN to a Canadian server
PIACA <-- OpenVPN to a US server
(I've attached a screen shot of the how the queues are setup).
I made sure to divvy up the BW properly, set the right queues to default, etc...

Easy enough so far...

The I go to make the floating rules (see attachments).
I want to prioritize entire host machines, not ports. I have dedicated VMs on my network that I give certain tasks to (torrent downloading, usenet downloading, etc... ) so I want to be able to route all traffic from certain hosts to certain queues.
I made aliases: queHigh, queMed, queLow to put the hosts in.

Setup the floating rules as instructed, except for the small difference that I'm using more than 1 WAN, so I chose all 3 WANs as the Interface (with direction: OUT) in each of these rules. Then I picked the alias as the Source address. Not sure if this is correct, but I tried it in Source and Destination and neither worked.

So, on to my LAN interface rules (see screen shot).
Here's where I send hosts on my LAN to the specific gateway I want them to leave on, either one of the VPNs or unencrypted on WAN.

Thing is, at one point I had this working... then just the other day I did a full reinstall of pfSense in order to install the 64-bit version.
It was so long ago now that I got it up and running that I can't remember what the heck I did!!

I really hope someone can lend some advice here, I've been researching all day and still haven't found a solution.

Thanks in advance!!!

Capture.JPG_thumb

floatingrules.JPG_thumb

LAN.JPG_thumb

Derelict

This thread is about HFSC.

killerb81

Yeah, I get that… but as far as the firewall rules are concerned, are they not very similiar, if not, exactly the same?

KOM

Yes, HFSC vs CBQ vs PRIQ is just the underlying algorithm. The rules and queue defs are mostly the same with some exceptions like bandwidth definitions and allocation.

Impossible to tell whats going on without seeing your rule definitions. Your floating rules look funky. You have aliases that look like queue names as the Source. Here is my Floating Rules page as a simple example.

float.png_thumb

killerb81

Hey KOM, thanks for the reply and screen shot.
I've done a bit more reading and found this thread: https://forum.pfsense.org/index.php?topic=61106.0

In it, it's stated (regarding floating rules), "Note that NAT has happened before the rules apply so you can't match on a private IP source that has gone through NAT, you have to match on the destination or the translated source."

This is something I've always wondered but could never find an answer to. What order do things happen in pfSense?

LAN traffic -> into pfSense firewall -> LAN interface rules -> NAT -> floating rules -> desired gateway -> out of pfSense firewall -> modem / internet ?
If I had a better understanding of the "signal path", so to speak, I'm sure I could figure this out.

In my LAN interface rules I have rules that put certain hosts on my LAN into certain gateways - either one of the two VPNs or the WAN.
Does this happen before the floating rules are applied? If so, the "source" in my floating rules wouldn't be my private LAN IP addresses like I've done there, because the address would already have been translated to the WAN or VPN interface.

Am I getting this right?

So, if I want to send certain LAN hosts to: a) a specific gateway and b) a certain traffic shaping queue… I have to... ?

Make LAN interface rules for every combination of priority queue and gateway, then place the hosts into the alias that would correspond to that rule?

Oh, and regarding my aliases that look like traffic queues - I name them that way (high, low, med) and then place the IPs of the hosts in them that I want ALL their traffic to be. I don't do anythng at a port level.

sideout

So the way it works is: (at least my understanding )

Traffic In –---> WAN Interface ----->Floating Rules parsed ------> Traffic hits queues ------> Traffic out from LAN -----> Hits inverse of Floating Rule queues ------> LAN Rules parsed ------> Traffic Out Internet.

When a packet comes in from the WAN , Floating rules are applied and traffic hit queues and then any traffic that was matched goes out the inverse queue automatically created. Interface rules are processed last and first matching rule win's so the order of the rules on the interface side is critical.

Case in point - I had applied a limiter to say that any traffic from the LAN Subnet with TCP protocol that was not going to the LAN Subnet via a gateway group was subjected to a limiter. Well it seems that Plants Versus Zombies and ESO both use TCP now for gaming traffic so they were hitting this and causing them to not connect or have lag.

To fix it I had to make a LAN interface rule for the ports they were using and place that above the limiter rule and apply it to the traffic queue - qGames .

Who would have thought that a game is using TCP for actual gaming traffic since the majority of the games out there have been using UDP for the longest time.

You want to queue traffic coming in from the WAN before the NAT happens so you apply the floating rules to the WAN interface. You would use LAN rules to send specific traffic out specific gateways.

I would recommend you create multiple gateway groups for this. Something like:

1. ALLGATES - All your WAN gateways
2. HIGATE - Traffic for your High queues that need the bandwidth
3. LOWGATE - Traffic for your lower queues.

In the ALLGATES group you have all gateways all set equally with fail on packet loss or member down.
In the HIGATE group you have a primary WAN and a secondary WAN with a lower setting with fail on packet loss or member down
In the LOWGATE group you would have the same thing just with different WAN's.

In your LAN rules the last rule - the any / any rule would use the ALLGATES. Split your other rules up to the other gateways. Make sure you have DNS allocated to each gateway under the system tab otherwise it will not work.

The goal with the groups is to make it so if a WAN goes down - everything will still function. Make the groups and then test by unplugging a WAN and see what happens. If you dont get the desired results then make some modification and try again.

Bottomline here is that you need to test , check , test again and then know how to troubleshoot to resolve the issue.

HFSC is a constant tuning process especially in a LAN party setting where you are dealing with large packet amounts and periods of high demand and then low demand. I regularly adjust bandwidth amounts several times during the event to provide the maximum amount of bandwidth to tourney games when needed.

killerb81

Thanks for the reply.

I'm not sure the WAN groups are necessary as I only have one physical WAN. My other two are VPNs that go out over that WAN.
Currently, I'm able to successfully send traffic out over whichever "WAN" I want by using LAN interface rules and applying the alias (containing hosts) to a rule that selects the gateway I want them to leave on.
That part is working fine, it's getting the same traffic into the right queue that's I'm struggling with.

I think my issue was that I followed this guide (earlier posts) where it said to place the traffic into queues using floating rules setup as:
interface = WAN
direction = out

I then placed my LAN IP addresses that I wanted put in a certain queue into the same floating rules, as the "source", except I didn't realize that NAT had already happened, effectively changing the "source" IP address from the private LAN address to whichever gateway that host was put on in the LAN rules…. I think...

So I feel like I need to place the traffic into the queues BEFORE NAT happens. Which I assume would be in LAN rules.

I'm going to test this when I get home from work later today.

When I drew my little "signal path" in my previous post, I was starting from the point of view of a host on my LAN.
Are you looking at it the other way around?

sideout

Yes you always look at it from traffic coming in from the WAN as that is how the shaping is designed to work before the NAT happens. The only way to shape like that is to use floating rules with Interface set to WAN and I use direction any on my rules.

Derelict

So I ran into a problem tonight. I wanted to take a specific OPT2 device, 192.168.225.65, and place it in a "Penalty Box" for egress to the WAN. I created an alias Penaltybox containing Host 192.168.225.65 and created a floating rule on WAN out placing anything sourced from that alias into qPenaltyBox.

There is also a pass any any any rule on OPT2 that assigns no queues.

No traffic was ever placed in qPenaltyBox. States cleared several times. 0 packets put into qPenaltyBox ever.

Does the Pass rule on OPT2 create the state with no queue assigned before the floating rule has a chance to assign the queue?

I did manage to get this traffic into qPenaltyBox by creating a rule on OPT2 that passed traffic from the Penaltybox host and marked it with "PB". I then created a floating rule on WAN out putting all traffic from any to any and marked with PB into qPenaltyBox.

Ecnerwal

In the stickies for traffic shaping, there is a note from Ermal (perhaps collecting several postings together, from the look of it and the thread it links to) that makes reference to needing to kill the any/any rule. That effectively implies that we need to nail down getting everything else classified and sorted. There's also the confusing (to me, thus far) bits about order sensitivity in firewall rule processing, and that being different for interface rules .vs. floating rules, and…in short, I often find things don't do what I think they should do there; I read, I think I get it, I try things, they don't work as expected based on my reading, lather, rinse, repeat. Perhaps if I had a month and nothing else to do....

Ermal wrote:

Now back to why you need to disable the anti-lockout rule and the default LAN rule.
The pf packet filter is stateful and if it registers a state about a stream of traffic it will not check the ruleset again.
On this packet filter that is used in pfSense traffic is assigned to a queue by specifying it explicitly with the rule that matches the traffic/ the rule that creates the state.
The default anti-lockout rule is the same as the default lan rule just createt automatically for the user to prevent his from doing stupid things.
But this rule is to generic as it matches all the traffic passing from lan and nothing else in the ruleset gets executed. As such it sends all the traffic to the default queue which is not what the user wants with a QoS policy on.
The same applies to the default LAN rule pfSense ships with. Since now you have to explicitly choose the queue the traffic has to go when creating a rule there is no easy solution to this other than disable these settings and have more fine tuned rules for classifying traffic to the propper queue.

killerb81

Maybe someone can expand on WHY this is, I just know that for the order of rules processing, it follows that:

WAN and LAN rules are applied to the first matching condition working its way down from top to bottom.
Floating rules apply to the LAST matching rule from top to bottom.

Hope this helps.

killerb81

@Derelict:

I did manage to get this traffic into qPenaltyBox by creating a rule on OPT2 that passed traffic from the Penaltybox host and marked it with "PB". I then created a floating rule on WAN out putting all traffic from any to any and marked with PB into qPenaltyBox.

Can you expand on the marking functionality? I have some ideas on how this would be useful to me but not 100% sure on how to implement it.
I want to mark certain packets in the LAN rules, then find those marked packets in the outgoing WAN rules (floating rules) to put them on a certain gateway.

I posted a thread about it here: https://forum.pfsense.org/index.php?topic=83972.msg460314#msg460314

Thank you!

Derelict

Not sure what you want. It's pretty simple. In the advanced section of a firewall rule you can either mark a packet or match based on a previous mark.

killerb81

Do these marks stay on the packets for upstream pfSense instances to read?
(I know this question is a little (a lot) off the thread topic)… sorry.

Derelict

I highly doubt it. I don't know where they'd put them in the frame/packet.

Verified -

"Tags are internal identifiers. Tags are not sent out over the wire."

http://www.openbsd.org/faq/pf/tagging.html

You should be able to match on a pf internal tag and set a DSCP tag instead. That should survive the trip if your gear is set to trust them.

Nullity

Edit: I created a thread to dedicated to understanding HFSC, focusing on it's "decoupled bandwidth and delay" capabilities. That thread will (hopefully) have a better revision of this post.
https://forum.pfsense.org/index.php?topic=89367.0

–--Original post----

Great thread! :)
Though, I see nothing about the primary reason for HFSC's adoption; the separation of bandwidth and delay allocation. Without employing this feature HFSC is only a slight improvement over previous class-based hierarchical link-sharing algorithms.

Only use real-time when you require it. It is "unfair", unlike link-share. Please read documentation for more details. Generally, only NTP and VOIP should be employing real-time queueing.

The wording used in the HFSC paper is "decoupled delay and bandwidth allocation". Usually, you can only allocate bandwidth and delay together. Let's say you allocate 25kbit to NTP (average speed). Sadly, this means in a worst-case scenario that a 1500byte (12kbit) NTP packet may take approximately 500ms to completely transmit upon receipt. This 500ms delay is unacceptable with NTP.

HFSC allows you to allocate not only an average bitrate of 25kbit but also an initial bandwidth or a "burst", as it is sometimes called. To improve the delay of NTP we will set m1 to 480kbit (80% of my 600kbit upper-limit for upload, which I think is the max for real-time allocation) which means a 1500byte (12kbit) packet would send in 25ms, so set d to 26ms (give d a little room to relax, so I add 1ms to 25ms), then set m2 to 25kbit. Now NTP packets are guaranteed to send within 26ms. Hell of an improvement over 500ms and I still have only 25kbit allocated of my 600kbit upload. NTP packets are now allocated the delay of a 480kbit connection, but the bandwidth of a 25kbit connection. This is delay and bandwidth decoupled.

Delay is measured as the time between the last bit being received and the last bit being transmitted.

Here are some links that helped me.
http://man7.org/linux/man-pages/man7/tc-hfsc.7.html
http://www.sonycsl.co.jp/person/kjc/software/TIPS.txt
http://serverfault.com/questions/105014/does-anyone-really-understand-how-hfsc-scheduling-in-linux-bsd-works
http://linux-ip.net/articles/hfsc.en/
and any texts I could find from the HFSC authors. The papers include lots of non-academian-level information, so do not be afraid to read them.

Please post any questions or corrections. :)

Harvy66

HFSC does not affect how quickly a packet will be serialized, but will affect when a packet will be sent. An issue you can get on low bandwidth connections as your example is that a 1500MTU is quite large relative to the time slices the scheduler is targeting.

fq_Codel mentioned this same issue when determining which bucket to dequeue from next is connections below 10Mb need to increase their target latency to accommodate large packets. 100Mb+ is optimal for 1500MTUs and the default 5ms target.

Realtime is useful for any traffic you feel should have crazy low jitter, but should not use linkshare at all. If your traffic makes use of link share at all, then realtime is not a good fit, just use linkshare. This kind of goes with link utilization info that many upstream providers have been talking about over the years. A link is considered 100% at 80%. This is because prior to 80% utilization, the buffers are primarily empty. As you get past 80%, buffers start to grow. Many hardware QoS implementations on even high end managed switches also use this logic. If the port is below 80% utilization, QoS is disabled.

This same idea applies to realtime in HFSC. If you are at or below 80%, there should be roughly "0" dequeue latency, even if the connection is at 100%. The remaining 20% that is above the 80% is your link share and is subject to increasing amounts of jitter as you approach 100%, but realtime should be nearly unaffected. "Zero" latency is relative to the quantum of time that HFSC is targeting.

I have a relatively stable ping to Google, like fractional milliseconds of variation when averaged over 30+ seconds. With HFSC on PFSense, I can be at 100% utilization and not see a difference. The measurements I was taking at the time was using hrping which gives the jitter within a single standard deviation. When my upload was at 100%, the jitter was "identical" to the tenths position(0.1ms). I probably get crazy good results because I have a 1Gb connection that is rate limited to 100Mb. This means my NIC can put packets on the line really fast relative to my bandwidth. If I was trying to move 1Gb over my 1Gb link, it probably wouldn't be as stable, but I'm sure it would still be "great".

Nullity

@Harvy66:

HFSC does not affect how quickly a packet will be serialized, but will affect when a packet will be sent. An issue you can get on low bandwidth connections as your example is that a 1500MTU is quite large relative to the time slices the scheduler is targeting.

fq_Codel mentioned this same issue when determining which bucket to dequeue from next is connections below 10Mb need to increase their target latency to accommodate large packets. 100Mb+ is optimal for 1500MTUs and the default 5ms target.

Realtime is useful for any traffic you feel should have crazy low jitter, but should not use linkshare at all. If your traffic makes use of link share at all, then realtime is not a good fit, just use linkshare. This kind of goes with link utilization info that many upstream providers have been talking about over the years. A link is considered 100% at 80%. This is because prior to 80% utilization, the buffers are primarily empty. As you get past 80%, buffers start to grow. Many hardware QoS implementations on even high end managed switches also use this logic. If the port is below 80% utilization, QoS is disabled.

This same idea applies to realtime in HFSC. If you are at or below 80%, there should be roughly "0" dequeue latency, even if the connection is at 100%. The remaining 20% that is above the 80% is your link share and is subject to increasing amounts of jitter as you approach 100%, but realtime should be nearly unaffected. "Zero" latency is relative to the quantum of time that HFSC is targeting.

I have a relatively stable ping to Google, like fractional milliseconds of variation when averaged over 30+ seconds. With HFSC on PFSense, I can be at 100% utilization and not see a difference. The measurements I was taking at the time was using hrping which gives the jitter within a single standard deviation. When my upload was at 100%, the jitter was "identical" to the tenths position(0.1ms). I probably get crazy good results because I have a 1Gb connection that is rate limited to 100Mb. This means my NIC can put packets on the line really fast relative to my bandwidth. If I was trying to move 1Gb over my 1Gb link, it probably wouldn't be as stable, but I'm sure it would still be "great".

If my 600kbit upload is 80% utilized with a 480kbit backlog and then received 1500byte (12kbit) I would have a 800ms delay added to the best-case delay of 20ms (1500bytes/12kbits @ 600kbit) without QoS. I do not agree with your statement that "Many hardware QoS implementations on even high end managed switches also use this logic. If the port is below 80% utilization, QoS is disabled."

Have you read the HFSC paper(s)?
HFSC is all about delay improvements over previous queueuing algorithms by decoupling bandwidth and delay. It says this in the introduction to the paper. It is not really a point that can be argued with because they have the mathematical proofs and ran simulations to back up their claims.

Harvy66

@Nullity:

@Harvy66:

HFSC does not affect how quickly a packet will be serialized, but will affect when a packet will be sent. An issue you can get on low bandwidth connections as your example is that a 1500MTU is quite large relative to the time slices the scheduler is targeting.

fq_Codel mentioned this same issue when determining which bucket to dequeue from next is connections below 10Mb need to increase their target latency to accommodate large packets. 100Mb+ is optimal for 1500MTUs and the default 5ms target.

Realtime is useful for any traffic you feel should have crazy low jitter, but should not use linkshare at all. If your traffic makes use of link share at all, then realtime is not a good fit, just use linkshare. This kind of goes with link utilization info that many upstream providers have been talking about over the years. A link is considered 100% at 80%. This is because prior to 80% utilization, the buffers are primarily empty. As you get past 80%, buffers start to grow. Many hardware QoS implementations on even high end managed switches also use this logic. If the port is below 80% utilization, QoS is disabled.

This same idea applies to realtime in HFSC. If you are at or below 80%, there should be roughly "0" dequeue latency, even if the connection is at 100%. The remaining 20% that is above the 80% is your link share and is subject to increasing amounts of jitter as you approach 100%, but realtime should be nearly unaffected. "Zero" latency is relative to the quantum of time that HFSC is targeting.

I have a relatively stable ping to Google, like fractional milliseconds of variation when averaged over 30+ seconds. With HFSC on PFSense, I can be at 100% utilization and not see a difference. The measurements I was taking at the time was using hrping which gives the jitter within a single standard deviation. When my upload was at 100%, the jitter was "identical" to the tenths position(0.1ms). I probably get crazy good results because I have a 1Gb connection that is rate limited to 100Mb. This means my NIC can put packets on the line really fast relative to my bandwidth. If I was trying to move 1Gb over my 1Gb link, it probably wouldn't be as stable, but I'm sure it would still be "great".

If my 600kbit upload is 80% utilized with a 480kbit backlog and then received 1500byte (12kbit) I would have a 800ms delay added to the best-case delay of 20ms (12kbit @ 600kbit) without QoS. I do not agree with your statement that "Many hardware QoS implementations on even high end managed switches also use this logic. If the port is below 80% utilization, QoS is disabled."

Have you read the HFSC paper(s)?
HFSC is all about delay improvements over previous queueuing algorithms by decoupling bandwidth and delay. It says this in the introduction to the paper. It is not really a point that can be argued with because they have the mathematical proofs and ran simulations to back up their claims.

"If my 600kbit upload is 80% utilized with a 480kbit backlog" you mean 100% utilized? A backlog indicates that packets are coming in faster than they're going out, which means your interface is at 100%. And your packets are not actually transferred at 12kb/s, they're transferred at full line rate.

"decoupling bandwidth and delay" just means latency is kept stable while bandwidth is honored through advanced scheduling, include noting how large the head packet is in each queue because dequeuing large packets takes longer than smaller ones.

It took me a bit to find a speedtest server that I didn't get my full 100mb to, but I found some in Europe that gave me around 80Mb. My queue sizes were pretty much 0 the entire time, a few blips into the teens. My upload did manage to reach 100 for a brief moment during the TCP building phase, which then backed off and stabilized with a 0 queue. My point being that if you're below 80% utilization, your queue should be pretty much empty the entire time.