HFSC explained - decoupled bandwidth and delay - Q&A - Ask anything

xbipin

i recently moved back to HFSC from CBQ and so far it works good, in my case what Nullity says, his theory holds true. I deal in voip only but only difference is where i live, isp blocks voip so we setup a udp openvpn tunnel a route voip calls only through it and after doing packet captures on pfsense i realized the voip audio packets instead of 160bytes were 279 byes for me considering the openvpn overhead so based on this i set me voip queue as Nullity described in his voip post and i noticed my call jitter went down by a few ms on a saturated line (my line speed is 12Mb down and 3Mb up, but it goes to 12.5/3.5).

what i wanted to know is if we use this realtime values do we still need to use link share because in all the argument above i got confused if that holds true in a saturated situation or realtime

Nullity

@xbipin:

i recently moved back to HFSC from CBQ and so far it works good, in my case what Nullity says, his theory holds true. I deal in voip only but only difference is where i live, isp blocks voip so we setup a udp openvpn tunnel a route voip calls only through it and after doing packet captures on pfsense i realized the voip audio packets instead of 160bytes were 279 byes for me considering the openvpn overhead so based on this i set me voip queue as Nullity described in his voip post and i noticed my call jitter went down by a few ms on a saturated line (my line speed is 12Mb down and 3Mb up, but it goes to 12.5/3.5).

what i wanted to know is if we use this realtime values do we still need to use link share because in all the argument above i got confused if that holds true in a saturated situation or realtime

In the original HFSC implementation, there was no separate real-time & link-share (or upper-limit) parameter. There was only a single "service curve" parameter that simultaneously set both real-time & link-share to the same values.

So, set them both to the same (unless you know a good reason to do otherwise). Simultaneously using link-share & rea-time should not hurt since real-time has priority anyway.

Also, remember that most pfSense queues already have link-share's m2 param set to the queue's "Bandwidth" value automatically…

xbipin

ok thanks so shall i set the bandwidth, link share and realtime to same or can bandwidth be set higher than realtime, if we use link share then that overrides bandwidth right.

the other thing i noticed is suppose if the line speed is 12.5Mbps down and we set the root queue to 12Mbps then the effective speed i get is around 11Mbps or at most 11.5Mbps, it goes no where near the 12Mbps mark set on the root queue, now the same if i change to CBQ then it very accurately touches the full 12Mnps (this i tested just with one lan client connected and maxed out the line with speedtest as well as tried with torrents)

the other issue is pfsense doesnt allow setting a root queue bandwidth in decimals so if line is 12.5Mbps, i cant set 12.25Mbps so i have to convert to Kbps and use that

Harvy66

I wouldn't use realtime, it's non-intuitive and very easy to use it wrong. Link-share is good enough.

xbipin

Well I don't understand what's wrong in using it, all it does is give that queue a minium value all the time compared to link share which would give that queue the set bandwidth when it's back logged, and in the case of voip you need to have real-time bandwidth available as soon as the call starts so the signaling isn't delayed which would inturn increase call setup and RTP time. Even if there is no traffic for that queue which has real-time set that bandwidth isn't lost and still available for other queues.

In voip we at least know the packet size and what interval it's sent so I guess it's straight forward to set a proper value for real-time

Nullity

@xbipin:

Well I don't understand what's wrong in using it, all it does is give that queue a minium value all the time compared to link share which would give that queue the set bandwidth when it's back logged, and in the case of voip you need to have real-time bandwidth available as soon as the call starts so the signaling isn't delayed which would inturn increase call setup and RTP time. Even if there is no traffic for that queue which has real-time set that bandwidth isn't lost and still available for other queues.

In voip we at least know the packet size and what interval it's sent so I guess it's straight forward to set a proper value for real-time

Real-time's values are absolute (and limited to ~80% of total bandwidth). This is important when you are precisely allocating known traffic types like VOIP.

Link-share's values are not absolute, they are proportional (and can use 100% of total bandwidth, unlike real-time). This means accurate, completely predictable allocation is not possible with link-share. Link-share, as the name suggests, is meant only to share bandwidth. See this post for a clear example of link-share's proportional nature. (Pay attention to the first test, where my WAN bandwidth was correctly configured to 640Kbit.)

I would use real-time to improve/guarantee latency.
I would use link-share to control bandwidth or make latency worse, like you could allocate 5Mbit (m2) but also set m1=0Kbit & d=500, which would mean that traffic is allocated 5Mbit but packets can be delayed for up to 500ms, allowing other traffic to have better latency. The traffic will not be delayed unless there is other traffic.

xbipin

ok i understood realtime now, basically it can be used in cases where u know the type of traffic and its bitrate so then can reduce the latency on it.

one thing i didnt understand regarding the other link u gave me is do we set linkshare on the root queue or just the bandwdith

below is my current setup, can u tell me if its fine or simply point out whats wrong

WAN - bandwidth 100Mb
-qInternet - bandwidth 3000Kb - UL 3000Kb - LS 3000Kb - codel
–qACK - bandwidth 30% - LS 30% - codel - priority 6
--qOthersDefault - bandwidth 10% - LS 10% - codel - priority 4
--qP2P - bandwidth 5% - LS 5% - codel - priority 1 - default
--qVoIP - bandwidth 45% - RT 447Kb/5/224Kb - LS 45% - codel - priority 7
--qOthersHigh - bandwidth 10% - LS 10% - codel - priority 5

LAN - bandwidth 100Mb
-qInternet - bandwidth 12000Kb - UL 12000Kb - LS 12000Kb - codel
--qACK - bandwidth 15% - LS 15% - codel - priority 6
--qOthersDefault - bandwidth 50% - LS 50% - codel - priority 4
--qP2P - bandwidth 5% - LS 5% - codel - priority 1 - default
--qVoIP - bandwidth 20% - RT 447Kb/5/224Kb - LS 20% - codel - priority 7
--qOthersHigh - bandwidth 10% - LS 10% - codel - priority 5

Harvy66

@xbipin:

Well I don't understand what's wrong in using it, all it does is give that queue a minium value all the time compared to link share which would give that queue the set bandwidth when it's back logged, and in the case of voip you need to have real-time bandwidth available as soon as the call starts so the signaling isn't delayed which would inturn increase call setup and RTP time. Even if there is no traffic for that queue which has real-time set that bandwidth isn't lost and still available for other queues.

In voip we at least know the packet size and what interval it's sent so I guess it's straight forward to set a proper value for real-time

Realtime always comes from the root queue, and can be greater than the parent queue's available bandwidth and even upperlimit, allowing you to starve your parent queue
Realtime counts bandwidth consumed above your real-time negatively against you and will lower your bandwidth, violating the "minimum" exception. Rule of thumb, NEVER allow a queue with realtime to exceed it's allotted realtime bandwidth
Link-share already gives me "immediate" bandwidth. I currently cannot measure a difference between my link being saturated or idle to within an accuracy or 0.01ms and 0.001% loss, and I only use linkshare.

Likshare has 99% of the benefit and none of the dangers. The primary benefit of realtime seems that it makes it simpler to not have to think about managing parent queues, except that simplicity comes at the cost of possibly harming the parent queues.

They added realtime for a reason, but without knowing more about the reason and implementation, I cannot empirically measure a benefit over linkshare. There may be some benefit for very slow links where the MTU is relatively large compared to the bandwidth

xbipin

Well I don't quiet agree with you on the theory that real-time would take from root instead of the parent queue because if that were the case then parent queue would hold no importance and all the tests I did so far connecting to a voip server which is almost 250ms away in a different continent it doesn't seem real-time borrows from root queue and starves parent queue when I saturated the line. The limit on the parent queue comes in effect and the root queue doesn't exceed the traffic set as upper limit on the parent of the queue which I set real-time on. In fact after setting real-time on my voip queue I find the calls connect more quickly, jitter reduced and with active torrents saturating the line I didn't see any quality drops in voip but in fact voip experience improved.

The main thing is to know the proper bitrate and set proper values in real-time if you use that on any queue and so far what i read no where does it say real-time would borrow from root queue going over the limit that the parent queue has set.

jetblackwolf

Thanks for bumping this thread up. (it should be a sticky next to ermal's post which is something like 7 years old now)

Anyway, is it a limitation of the GUI to have only one interface per HFSC queue? This would solve the issue of splitting bandwidth using UPPERLIMIT on multiple LAN scenarios? Really knocks the flexibility of link share around when bandwidth is carved up on a per interface level.

Harvy66

@xbipin:

Well I don't quiet agree with you on the theory that real-time would take from root instead of the parent queue..

It's not debatable, it's a fact.

queue root_igb0 on igb0 bandwidth 99Mb priority 0 {qACK, qUnclassified, qClassified}
queue qACK on igb0 bandwidth 19.80Mb qlimit 1024
queue qUnclassified on igb0 bandwidth 29.70Mb {qUDP, qDefault}
queue qUDP on igb0 bandwidth 13.07Mb qlimit 1024 hfsc( codel linkshare(16.34Mb 5 13.07Mb) )
queue qDefault on igb0 bandwidth 13.07Mb qlimit 1024 hfsc( codel default )
queue qClassified on igb0 bandwidth 1Mb {qNormal, qHigh}
queue qNormal on igb0 bandwidth 440Kb qlimit 1024 hfsc( codel )
queue qHigh on igb0 bandwidth 440Kb qlimit 1024 hfsc( codel realtime 40Mb linkshare(550Kb 5 440Kb) )

Notice the parent queue qClassified is assigned 1Mb of bandwidth and the child queue qHigh is assigned 440Kb of linkshare and 40Mb of realtime. This is a perfectly valid configuration. And if qHigh decided to pull down 40Mb a sec and qClassified only has 1Mb of bandwidth, how much bandwidth is left over for qNormal? None.

Realtime is ALWAYS guaranteed to be available, assuming the queue has not gone over the provisioned amount of realtime, and all bandwidth the realtime consumes counts towards the parent queue(s).

And of course your realtime didn't starve the parent queue in your simple test because realtime cannot consume more than 80% of the bandwidth. That means there's always at least 20% free for link share. As long as there is some free bandwidth and you don't have an upper limit on the parent queue that is being exceeded by the realtime child queue, the sibling queues will still have some bandwidth to work with.

jetblackwolf

Whoops I feel like my question above may have also been irrelevant to the specific topic of this thread. (discussing decoupled bw/delay)

Perhaps a more direct question about the thread topic would be on the ping test by Nullity (thanks for that hands on example)…

How would CODEL affect that test, if the goal for CODEL is to maintain a 5ms buffer length, the packets in that test are essentially held for 10ms (the d parameter) due to the 0Kb bandwidth of the M1 parameter?

Would CODEL just drop all the packets? And if not, why? (Unfortunately I don't have access to a pfsense lab environment where I can run interesting tests like these.)

Nullity

@jetblackwolf:

Whoops I feel like my question above may have also been irrelevant to the specific topic of this thread. (discussing decoupled bw/delay)

Perhaps a more direct question about the thread topic would be on the ping test by Nullity (thanks for that hands on example)…

How would CODEL affect that test, if the goal for CODEL is to maintain a 5ms buffer length, the packets in that test are essentially held for 10ms (the d parameter) due to the 0Kb bandwidth of the M1 parameter?

Would CODEL just drop all the packets? And if not, why? (Unfortunately I don't have access to a pfsense lab environment where I can run interesting tests like these.)

CoDel is designed to improve delay but keep all other aspects unchanged (like bandwidth). CoDel keeps the queue size low, but at least 1 packet needs to be queued before CoDel drops other incoming packets.

Artificially adding delay with HFSC (or ipfw once the CoDel & fq_codel are added: http://caia.swin.edu.au/freebsd/aqm/) might affect CoDel's queue delay calculation, or maybe not since I don't know the implementation details. Regardless, CoDel should not cause a problem, by design, but the easiest answer is to just run a test a see. :)

I really need to set up a (virtual?) lab too…

Harvy66

Codel has an extremely large internal buffer, but the buffering algorithm is does not do a hard cut-off. When the latency gets greater than 10ms, it will drop one packet, not all packets. It will then check one interval later to see if the buffer has gotten below the threshold. If not, drop another packet.

When playing around, I was able to get Codel to cause multi-second ping times, but you really have to game it.

curtisgrice

@Nullity:

Advanced real-time VOIP configuration

Say you want to efficiently improve your VOIP setup. You have 7 VOIP lines and a 5Mbit upload. Each VOIP packet is 160 bytes (G.711), with an average bitrate of 64Kbit/s per line (a 160byte packet sent every 20ms). We want to improve our worst-case delay (which also improves worst-case jitter and overall call quality) from 20ms to 5ms, so we calculate the bitrate needed to achieve that:
160 bytes × 8 = 1280 bits
1280 bits × (1000ms ÷ 5ms) = 256000 bits/s

So, to send a 160 byte packet within 5ms we need to allocate 256Kbit/s. This gives us our m1 (256Kb) and our d (5).

Now we need to calculate our maximum average bitrate:
7 lines × 64Kbit/s = 448Kbit/s
Just to be safe, we allocate bandwidth for an extra line, so
8 lines × 64Kbit/s = 512Kbit/s

This gives us our m2 (512Kb).

To make sure that your m1 (the per packet delay) can always be fulfilled by your connection, make sure that the m1 (256Kb) multiplied by the maximum number of simultaneous sessions (7 or 8) is less than your maximum upload. 2048Kbit (256Kb × 8) is less than 5000Kbit, good.

Our finalized configuration is:
m1=256Kb
d=5
m2=512Kb

This configuration will guarantee a 7.4ms (5ms+MTU transmission delay) worst-case delay for each packet, with a limit of 8 simultaneous calls (512Kbit/s). We get low-delay, low-jitter calls as though you had allocated 2048Kbit/s of bandwidth, but you actually only allocated 512Kbit/s of bandwidth.

I may be misunderstanding the m1 value. It would seem to me that with m1 set to 256Kb with packets taking ~7.5ms to send if I had 8 active lines and by chance each 20ms packet happened at the same time, I would have a potential 80ms (8 x 7.5) delay as the queue becomes backlogged. While not disastrous to VoIP, if this scales up I would begin to see the delays in the call. Perhaps my understanding of VoIP is lacking.

Sorry about the zombie thread…

Harvy66

I think it's simpler to just use the single packet case. If you have 1 160byte(1280bit) packet every 50ms, that's 64kbit/s. But, at 64Kb/s, HFSC is allowed to schedule that packet in any manner, as long as it completes before the 50ms to send the next packet. This means your absolutely worst case is 50ms.

If you want this packet to be sent out faster, say 5ms, then you need "5ms" of bandwidth for scheduling reasons. That will take 256Kb of bandwidth to send 1280bit packet in 5ms. But you don't need 256Kb average, just 256Kb of burst. So you tell HFSC, 64Kb average, 256Kb burst for 5ms.

Or something along these lines.

Nullity

@curtisgrice:

@Nullity:

Advanced real-time VOIP configuration

Say you want to efficiently improve your VOIP setup. You have 7 VOIP lines and a 5Mbit upload. Each VOIP packet is 160 bytes (G.711), with an average bitrate of 64Kbit/s per line (a 160byte packet sent every 20ms). We want to improve our worst-case delay (which also improves worst-case jitter and overall call quality) from 20ms to 5ms, so we calculate the bitrate needed to achieve that:
160 bytes × 8 = 1280 bits
1280 bits × (1000ms ÷ 5ms) = 256000 bits/s

So, to send a 160 byte packet within 5ms we need to allocate 256Kbit/s. This gives us our m1 (256Kb) and our d (5).

Now we need to calculate our maximum average bitrate:
7 lines × 64Kbit/s = 448Kbit/s
Just to be safe, we allocate bandwidth for an extra line, so
8 lines × 64Kbit/s = 512Kbit/s

This gives us our m2 (512Kb).

To make sure that your m1 (the per packet delay) can always be fulfilled by your connection, make sure that the m1 (256Kb) multiplied by the maximum number of simultaneous sessions (7 or 8) is less than your maximum upload. 2048Kbit (256Kb × 8) is less than 5000Kbit, good.

Our finalized configuration is:
m1=256Kb
d=5
m2=512Kb

This configuration will guarantee a 7.4ms (5ms+MTU transmission delay) worst-case delay for each packet, with a limit of 8 simultaneous calls (512Kbit/s). We get low-delay, low-jitter calls as though you had allocated 2048Kbit/s of bandwidth, but you actually only allocated 512Kbit/s of bandwidth.

I may be misunderstanding the m1 value. It would seem to me that with m1 set to 256Kb with packets taking ~7.5ms to send if I had 8 active lines and by chance each 20ms packet happened at the same time, I would have a potential 80ms (8 x 7.5) delay as the queue becomes backlogged. While not disastrous to VoIP, if this scales up I would begin to see the delays in the call. Perhaps my understanding of VoIP is lacking.

Sorry about the zombie thread…

The scenerio was for 7 lines, not 8. It could handle 8, but that'd be 100% usage… I dunno what standard operating prodecure is, but allowing for no overhead seems dangerous.

There will be no problematic backlog. The scenerio above can handle 7 simultaneous lines while guaranteeing a per-packet latency of ~7.5ms for every VOIP packet, because both the internet connection and queue are not overused at any point, even during 7 simultaneous VOIP calls.

curtisgrice

The scenerio was for 7 lines, not 8. It could handle 8, but that'd be 100% usage… I dunno what standard operating prodecure is, but allowing for no overhead seems dangerous.

There will be no problematic backlog. The scenerio above can handle 7 simultaneous lines while guaranteeing a per-packet latency of ~7.5ms for every VOIP packet, because both the internet connection and queue are not overused at any point, even during 7 simultaneous VOIP calls.

Thanks for the response! It sounds like the m1 is per packet. Is that correct? So if pfSense received 2 packets within 2ms each packet would be allocated m1 and I would see a burst of 256Kb/s x 2 = 512Kb/s?

You also talk about the connection not being over used. If I have a 5Mb/s upload (rated for 6Mb/s), 4Mb/s for linkshare in other queues (4Mb/s upper limit), and 1Mb/s for the qVoIP, I should never have to worry about over saturation.
Note: Not exact numbers but the same idea.

Nullity

@curtisgrice:

Thanks for the response! It sounds like the m1 is per packet. Is that correct? So if pfSense received 2 packets within 2ms each packet would be allocated m1 and I would see a burst of 256Kb/s x 2 = 512Kb/s?

Yeah, exactly. As I understand it m1/d are per packet. If m1/d were per queue, then the per packet latency would be dependant on how many packets were queued, which would make latency unpredictable and practically break HFSC's latency guarantees.

You also talk about the connection not being over used. If I have a 5Mb/s upload (rated for 6Mb/s), 4Mb/s for linkshare in other queues (4Mb/s upper limit), and 1Mb/s for the qVoIP, I should never have to worry about over saturation.
Note: Not exact numbers but the same idea.

There are a number of ways to do it, but that setup seems fine. You should stress test it to confirm proper functionality though, regardless.

Harvy66

It's kind of both. The scheduling is done at the queue level, but it is done in discrete steps of packets. The service curve is just the priority, which every curve is "greater" at any given moment is scheduled next. The m1+d modify the curve to make it look like it is greater. If sending the packet will push it beyond its limit, it won't send the packet yet. Of course if there is spare bandwidth and no other packets, it'll just send the packet anyway.

I just think of HFSC as a dynamic priority based queue where the priority is adjusted in real-time per packet processed, such that all queue configurations are respected. Of course my simplification breaks down in extreme situations, like absurdly low bandwidths, absurdly large MTUs, or absurdly low target latencies, but it's a good rule of thumb.