Traffic Shapping using HFSC

shaunjstokes

After experimenting with different types of traffic shaping under pfSense I've found the wizard always follows the same template (using the selected scheduler) which isn't necessarily a problem, however the WAN and LAN queues follow a slightly different structure which can make it tricky to understand precisely how the queues work and doesn't always achieve the best results.

Rather than using the wizard we've configured HFSC manually which has given us far better results, we can see that once the links become saturated packets will start dropping on the relevant queue with assigned bandwidths\upper limits and real-time bandwidth being respected, but that still doesn't mean our traffic shape is perfect.

I thought I would share my experience of HFSC and see if anyone was able to confirm if this is the best approach or offer any suggestions. Perhaps we could then make an article available to help other users manually configure HFSC on pfSense?

Building the Traffic Shape

Set the bandwidth on WAN and LAN to roughly 95% (i.e. 95Mbit/s) of the available bandwidth for that link (due to inefficiencies in ALTQ).
Create a new parent queue 'qInternet' for WAN and LAN

Enable 'Explicit Congestion Notification'
Bandwidth: as configured on the parent (WAN or LAN) interface in step 1
Upper Limit m2 (this is the normal rate): as configured on the parent (WAN or LAN) interface in step 1
Link Share (this is bandwidth available to other queues): Set the same as the upper limit

Create a new queue 'qACK' under 'qInternet' for WAN and LAN

Enable 'Explicit Congestion Notification'
Bandwidth: 15%
Link Share m2: 15%

Create a new queue 'qDefault' under 'qInternet' for WAN and LAN

Queue Limit (this increases the maximum queue\buffer size which will help guarantee packets, only add this if you will have a separate queue for real time traffic): 500
Default Queue
Enable 'Explicit Congestion Notification'
Bandwidth (this sets the bandwidth for this queue, don't think of this as a limit, during congestion bandwidth will be automatically borrowed from other queues using link share): 10%

Create a new queue for any traffic types you would like to segregate under 'qInternet' for WAN and LAN

Queue Limit shouldn't be changed for real time traffic, can be safely increased to 500 for non-real-time traffic.
Enable 'Explicit Congestion Notification'
Bandwidth should be set to the bandwidth you'd like to shape to for this queue, keep in mind that during congestion bandwidth will be automatically borrowed from other queues using link share if there is additional bandwidth available, if other queues also need bandwidth then the shape will be used to ensure that each queue gets their fair share. For low priority traffic such as P2P use a low value such as 1% to 5%, for high priority traffic reserve enough bandwidth so that those services won't be affected during congestion (once you have a rough idea of how much traffic those services need).
Use 'Upper Limit m2' to specify the maximum bandwidth for this queue and 'Link Share m2' to specify bandwidth which can be shared with other queues.

If you have real-time traffic such as VoIP then create a new queue 'qVoIP' under 'qInternet' for WAN and LAN

Enable 'Explicit Congestion Notification'
Bandwidth should be set to the amount of real-time traffic you would like to reserve\guarantee (other services won't be able to use\borrow this bandwidth) (i.e. 4Mbit/s).
Real Time m2: configure this to the same value as your bandwidth (i.e. 4Mb)

Building the inbound Firewall Rules

Add an inbound 'Floating' rule for each of your queues (except default) as created in 'Building the Traffic Shape' steps 5 and 6.
Set protocols (if in doubt click TCP/UDP).
Decide on whether you want to identify traffic for this queue based on the source IPs\ports (where it's coming from) or the destination IPs\ports (where it's going to), if you have a range of IPs and Ports then configure Aliases and use these in the Firewall rule.
For non-real-time traffic select the 'Ackqueue' as 'qACK' and the associated queue for this rule, for real-time traffic don't specify an 'Ackqueue' instead just set the queue to 'qVoIP'.

Building the outbound Firewall Rules

Add an outbound 'Floating' rule for each of your queues (except default) as created in 'Building the Traffic Shape' steps 5 and 6.
Set protocols (if in doubt click TCP/UDP).
Decide on whether you want to identify traffic for this queue based on the source IPs\ports (where it's coming from) or the destination IPs\ports (where it's going to), if you have a range of IPs and Ports then configure Aliases and use these in the Firewall rule.
For non-real-time traffic select the 'Ackqueue' as 'qACK' and the associated queue for this rule, for real-time traffic don't specify an 'Ackqueue' instead just set the queue to 'qVoIP'.

Finally you should create some constant pings to various destinations to check latency, monitor your queues then attempt to create some artificial traffic to cause your links to become saturated, watch where the packets are being dropped then tweak the above bandwidths\upper limits and real-time bandwidth on your queues as required.

Nullity

HFSC's m1 & d params don't work like that. It's not a "burst". It allows you to define the per-packet latency & averaged bandwidth separately. To properly use it you need to do some calculations based on the particular queue's packet size and allocated bandwidth. See this thread: https://forum.pfsense.org/index.php?topic=89367.0
Edit: Most don't need to use m1 & d, so I'd just stick to m2. HFSC is complex enough without trying to use the "decoupling of bandwidth & delay" feature those params offer. Additionally, unless you're using m1 & d, I'd also avoid using real-time and use only link-share & upper-limit. Keep it simple(r). :)

Also, you can forego qlimit tuning by enabling Codel on latency sensitive queues. The qlimit it kinda inaccurate since it only defines the number of packets but not the size of the packets. Codel tunes queue size more intelligently. I'd only use it on TCP though.

shaunjstokes

You're right, we won't need m1 and d. I've just read another article which mentions m1 and d in HFSC http://linux-tc-notes.sourceforge.net/tc/doc/sch_hfsc.txt, looks like the burst rate m1 is only used when the link becomes backlogged and only for the time as defined by d but effectively all this really does is determine which of packets get sent first when there are two queues ready to send packets at the same time. In most cases this probably isn't needed.

The real-time option is the only way to ensure that packets are 'guaranteed'. The way I understand it the real-time queue always gets picked up first which is exactly what we want, we don't want to shape or queue this traffic as that can impact calls, call signalling (SIP) and quality (RTP) is of the highest importance. There is a similar feature for Cisco IOS 'Low Latency Queuing'.

Nullity

@shaunjstokes:

You're right, we won't need m1 and d. I've just read another article which mentions m1 and d in HFSC http://linux-tc-notes.sourceforge.net/tc/doc/sch_hfsc.txt, looks like the burst rate m1 is only used when the link becomes backlogged and only for the time as defined by d but effectively all this really does is determine which of packets get sent first when there are two queues ready to send packets at the same time. In most cases this probably isn't needed.

m1 & d literally only allow the "decoupling of bandwidth & delay". Here's a good quote from the text you linked:
The term "burst" hides more than it reveals. Yes, it specifies
bursts, but it is not used to control bursts of data. It is used
to control latency. This is not some minor quibble about naming
conventions. Wrapping your head around this is fundamental to
understanding HFSC.

The real-time option is the only way to ensure that packets are 'guaranteed'. The way I understand it the real-time queue always gets picked up first which is exactly what we want, we don't want to shape or queue this traffic as that can impact calls, call signalling (SIP) and quality (RTP) is of the highest importance. There is a similar feature for Cisco IOS 'Low Latency Queuing'.

Real-time does have priority over link-share, but another important feature is that the numbers you give it are absolute (and therefore predictable, a necessity when using m1 & d) unlike link-share whose values are dynamic.

An interesting note about HFSC is that the original implementation had only an "SC" parameter which set both link-share & real-time simultaneously to the same value. Later, people split it apart (not sure why…) and added upper-limit.

reinhart47

Thanks for this clear guide and explanation of setting up TS. I do have a few questions.

As you have defined the use of the floating rules, does this approach eliminate the need for queue assignment on the individual vlan interfaces? Especially when each is designed for the traffic of each queue?
Related to a multi WAN environment where load balancing is being used, what needs to be considered beyond differences in the BW of each. For example we have 3; 1 - 40MB and 2 - 10MB.
In your floating rules, are they match or pass?
Again in the floating rules what interfaces are selected for the in and out. Is it correct to assume Wans for in and LANs for out?

shaunjstokes

As you have defined the use of the floating rules, does this approach eliminate the need for queue assignment on the individual vlan interfaces? Especially when each is designed for the traffic of each queue?
I believe that's correct, although yet to test queues with VLANs myself. I would recommend generating some traffic for each of your queues and monitoring the status of your queues to ensure traffic is being sent to the correct queue and that they are working as expected.
Related to a multi WAN environment where load balancing is being used, what needs to be considered beyond differences in the BW of each. For example we have 3; 1 - 40MB and 2 - 10MB.
I'm not familiar with multi WAN environments on pfSense but I would expect there to be a separate queue for each interface, like you said with individual bandwidth. I would follow the same naming convention for your queues and floating rules then again just generate some traffic for each of your queues and monitor the status of your queues to ensure traffic is being sent to the correct queue and that they work as expected.
In your floating rules, are they match or pass?
Match
Again in the floating rules what interfaces are selected for the in and out. Is it correct to assume Wans for in and LANs for out?
You shouldn't need to select an interface, but hypothetically I believe it will match the traffic as it comes into the interface then add it to the correct queue before transmitting on the outbound interface.

dennypage

@shaunjstokes:

After experimenting with different types of traffic shaping under pfSense I've found the wizard always follows the same template (using the selected scheduler) which isn't necessarily a problem, however the WAN and LAN queues follow a slightly different structure which can make it tricky to understand precisely how the queues work and doesn't always achieve the best results.

So,

WAN
qInternet
aACK
qDefault

LAN
qInternet
aACK
qDefault

versus

WAN
qInternet
aACK
qDefault

LAN
qLink
qInternet
qACK

I am curious about this. Why does the template use different hierarchies for positioning of the default queue? Are there advantages to this organization? Disadvantages?

Thanks

[Edit: fixed indentation issue]

dennypage

I was really hoping someone would explain this.

Nullity

I think qInternet and qLink is only needed if you have multiple LANs.