HFSC explained - decoupled bandwidth and delay - Q&A - Ask anything
-
You could over-allocate more bandwidth to improve delay, but this is inefficient and wastes precious bandwidth.
Yes and no. For fixed bandwidth traffic like VoIP, it does not increase bandwidth to fill, and any unused bandwidth will be shared with the other queues. Even if if you set the bandwidth to 256Kb and VoIP used it all, if it did not have that bandwidth, a non-linear service curve would do nothing to help because it would be bandwidth starved.
A non-linear service curve used to decouple delay and bandwidth is only useful if the flows in that queue are using less-than-or-equal-to the amount of bandwidth. In theory, you could just give that queue 256Kb. In practice, this opens you up to that queue potentially using more bandwidth than expected, which could harm the bandwidth of the other queues. This really is a management tool that says "I will give you only this amount of bandwidth, but I will make sure your latency stays good, as long as you don't try to go over".
This really is a management tool that says "I don't trust you to manage yourself, and if you don't manage yourself, you're going to have a bad time, but if you do, you'll have good latency".
The only reason I point this out is it may be less useful for home users than business/power users. Much simpler to just tell a home user to give VoIP enough bandwidth with a linear service curve since all unused bandwidth gets shared anyway. Not to say this is how it should be told all the time, just be careful about your current audience, if you think they're good enough to properly micromanage their traffic.
You are confusing real-time, which does not use more than the allocated bandwidth, and link-share, which does use extra/unused bandwidth. (Though, they can both be used simultaneously in the same queue.)
About "potentially using more bandwidth than expected"; in my post just above yours, I point out how you can assure a 15ms improvement in latency while assuring average bandwidth will not be above 512Kbit/sec, while never "bursting" above 2048Kbit/sec. If you disagree, please be precise in the ways my example is incorrect.
It's much more simple to just say "do not use HFSC", but I prefer the route of attempting to understand something new. I will leave it up to those viewing this thread, regarding whether or not they find it to be a good use of their time. :)
But yes, HFSC, just like pfSense on a dual-Xeon PC, is probably overkill for most home-users… but we like overkill around here. :D
-
I dragged this over from another thread
I feel fairly confident that the d(duration), for the purpose of latency sensitive traffic, should be set to your target worst latency. m1 should be thought of not as bandwidth, but the size in bits of the total number of packets relieved during that duration. m2 would be set as the average amount of bandwidth consumed.
Incorrect. m1 and m2 define bandwidth.
This is where I'm a little confused. While m1 and m2 define bandwidth, the over-all average may not *exceed m2, even though m1 is larger. And the way the one paper calculated what bandwidth to use in m1 was the total the desired burst amount, which was 160bytes, converted to bits, 1280, divided by d, which was 5ms, which resulted in 256Kb, which they used as their "bandwidth".
*Exceed: I'm not sure if their is hard or soft limit, such that it may or may not temporarily exceed, as long as the average is approximately held within some exact bound related to m1 and d.
Average bandwidth and short-term "burst" bandwidth (which is used to define the per packet transsmission delay of the virtual-circuit/queue) are different things.
Think of it like a graph, for example;
(Assume the "graph" shows a 1-second time-span.)
This -> [||||_] is m1/d. It can, when the packet arrives, peak to 128Kbps quickly and get the packet out quickly.
This -> [–---------] is the averaged bitrate of 64Kbps.
So, it peaks to 128Kbps, but then it sends nothing for a while, until the average dips low enough for another packet, transmitting at 128Kbps, to be sent, while keeping the average at or below 64Kbps.I will try to find an actual graphing program so I can show this more intuitively.
Edit: Changed "*Kb" to "*Kbps" for clarity.
-
i'm thinking out loud here, so add thoughts if you want.
"Decoupled delay and bandwidth"
There seems to be two different but related ideas of this concept.
-
HFSC m1 and d settings. HFSC is so good at providing bandwidth, that it can accurately replicate a synchronous link of a given rate. Using m1 and d can decouple bandwidth and delay by allowing a low bandwidth queue experience the reduced delays of a high bandwidth queue, assuming it does not try to over-consume bandwidth.
-
Some traffic shapers add additional delay to the lower bandwidth queues because the algorithms are characteristically delayed similar to weighted round robin and will process higher bandwidth queues first. HFSC does not process higher bandwidth queues first, it is constantly interleaving among the queues trying to maintain the service curves.
To me it seems that HFSC has two types of decouplings of delay and bandwidth.
Decoupled bandwidth and delay allocation (point #1), are a different subject than Fair Queueing (point #2). Though, that is partly what HFSC is exactly about; the combination of real-time scheduling with priority and Fair Queueing. This is where things get confusing and overly mathematical, and I agree with you that, for almost everyone, it is not something that needs to be known to properly use HFSC.
If you want a good paper to read, check out http://en.wikipedia.org/wiki/Generalized_processor_sharing
HFSC, or any queueing algorithm, is somewhat based on Generalized Processor Sharing, BUT with the added complexity of needing to deal with packets, instead of the idealized "fluid" (infinitesimally small packets). Each queue with HFSC is a virtual-circuit, but HFSC only has a single pipe that it can send things out of, so choosing how to interleave the packets is… complex. Check out http://www.cs.cmu.edu/~hzhang/HFSC/TALK/sld014.htm and the next dozen slides for more information about how "fairness" and worst-case delay interact.Or you can use HFSC to decouple bandwidth and delay and assume it is the "fairest" thing out there. :)
-
-
You could over-allocate more bandwidth to improve delay, but this is inefficient and wastes precious bandwidth.
Yes and no. For fixed bandwidth traffic like VoIP, it does not increase bandwidth to fill, and any unused bandwidth will be shared with the other queues. Even if if you set the bandwidth to 256Kb and VoIP used it all, if it did not have that bandwidth, a non-linear service curve would do nothing to help because it would be bandwidth starved.
A non-linear service curve used to decouple delay and bandwidth is only useful if the flows in that queue are using less-than-or-equal-to the amount of bandwidth. In theory, you could just give that queue 256Kb. In practice, this opens you up to that queue potentially using more bandwidth than expected, which could harm the bandwidth of the other queues. This really is a management tool that says "I will give you only this amount of bandwidth, but I will make sure your latency stays good, as long as you don't try to go over".
This really is a management tool that says "I don't trust you to manage yourself, and if you don't manage yourself, you're going to have a bad time, but if you do, you'll have good latency".
The only reason I point this out is it may be less useful for home users than business/power users. Much simpler to just tell a home user to give VoIP enough bandwidth with a linear service curve since all unused bandwidth gets shared anyway. Not to say this is how it should be told all the time, just be careful about your current audience, if you think they're good enough to properly micromanage their traffic.
You are confusing real-time, which does not use more than the allocated bandwidth, and link-share, which does use extra/unused bandwidth. (Though, they can both be simultaneously in the same queue.)
About "potentially using more bandwidth than expected"; in my post just above yours, I point out how you can assure a 15ms improvement in latency while assuring average bandwidth will not be above 512Kbit/sec, while never "bursting" above 2048Kbit/sec. If you disagree, please be precise in the ways my example is incorrect.
It's much more simple to just say "do not use HFSC", but I prefer the route of attempting to understand something new. I will leave it up to those viewing this thread, regarding whether or not they find it to be a good use of their time. :)
But yes, HFSC, just like pfSense on a dual-Xeon PC, is probably overkill for most home-users… but we like overkill around here. :D
Linkshare cannot use more bandwidth that it has allocated if the link is saturated, and that's the only time any of these discussions even matter. No point in traffic shaping if you have enough bandwidth.
You can't assure a 15ms reduction in latency while keeping it at 512Kb/s if it's trying to use more bandwidth. "Burst" only works if you're not trying to use more than your assigned amount.
What I was going after is if you need 2048Kb to maintain latency, it would be simpler to just tell a regular home user to set it to that amount for m2 instead of configuring m1 and d. It is even more important if using realtime because if you don't have enough realtime and you start using link share, you get punished for not having enough bandwidth allocated.
Don't think I'm trying to say that m1+d are bad, they're just easy to mess up. An example is if someone was like, well, Mumble only uses 128Kb/s, so I'll set it to 128Kb for VoIP and the burst to 256Kb, or whatever the burst should be, all of that configuration goes down the drain the instant someone else comes over to their house and also tried using Mumble at the same time. Better off just telling them to set it to 256Kb. The worst case for giving a queue more bandwidth that it can use isn't much of anything, while not giving enough can be detrimental.
To reiterate, m1+d is only useful when you've given it enough bandwidth in the first place, but the bandwidth is so small that it affects the delay of the packets. Once you have enough bandwidth, the delay is no longer an issue.
Mumble with a 128Kb/s stream and 10ms per 160 byte packet. Similar to the HFSC example, but twice as much bandwidth, which also means 1/2 the delay. Instead of 20ms delays, now we're at 10ms delays, for one stream. If you want to have enough bandwidth for 4 steams, you need 512Kb/s, which now means you have a 2.5ms delay. Is it really worth configuring m1+d over 2.5ms?
This example doesn't work as simple for for a few streams because there is a good chance all 4 people are on the same Murmur server at the same time, meaning they'll be all getting packets at roughly the same time, increasing the chance of packets arriving in a single burst. As you start to approach a larger business and you have 100 people, most will not be talking at the same time or in the same group at the same time. Very little synchronizing.
I still think m1+d is primarily a tool for managing low bandwidth or very special situations.
It's late and I know I'm missing a bit of something in my numbers, but they are ballpark and delay should approach 0ms as the bandwidth goes up. Even one of the HFSC papers shows the maximum 160byte packet delay as a simple function of packet-size/bandwidth, with the 160byte packet @ 64Kb/s having a just under 20ms measured latency during their torture test.
Again, I'm not trying to downplay m1+d, but just be careful of your audience if trying to help a home user.
-
Linkshare cannot use more bandwidth that it has allocated if the link is saturated, and that's the only time any of these discussions even matter. No point in traffic shaping if you have enough bandwidth.
I need to make a post in here specifically about link-share, but to put it plainly, you are incorrect again. Link-share uses "virtual time", which means all link-share queues get traffic allocated based on a ratio. Link-share only keeps the ratios correct, not any actual hard-set bitrate, just a ratio. The following example is taken from this post https://forum.pfsense.org/index.php?topic=90512.msg505122#msg505122
pfTop: Up Queue 1-6/6, View: queue, Cache: 10000 16:55:57 QUEUE BW SCH PR PKTS BYTES DROP_P DROP_B QLEN BORR SUSP P/S B/S root_pppoe0 640K hfsc 0 0 0 0 0 0 0 0 qBulk 1000 hfsc 214 76697 0 0 1 1 695 qACK 500K hfsc 1697 94153 0 0 0 3 211 qNTP 25000 hfsc 0 0 0 0 0 0 0 qTEST1 **1000** hfsc 1285 1500K 19 7521 38 9 **11K** qTEST2 **6000** hfsc 3067 3792K 0 0 35 52 **67K**
Notice how the queue is using ~88x (convert bytes to bits) more than is allocated. (1Kb vs 11KB and 6Kb vs 67KB)
You can't assure a 15ms reduction in latency while keeping it at 512Kb/s if it's trying to use more bandwidth. "Burst" only works if you're not trying to use more than your assigned amount.
You are incorrect. The whole point of the decoupling of bandwidth and delay is the offer the delay (m1 & d) of a larger bandwidth allocation while still only allocating a smaller, long-term average bitrate with m2. My example shows that, at worst, the "burst" will be using 2048Kbit/sec, which is below the link's 5Mbit/sec maximum. Meaning the delay of a 256Kbit/sec connection (5ms) is guaranteed, for up to 8 simultaneous VOIP sessions.
I will make a post later including and explaining a quoted example from the HFSC paper that will prove my assertions. (I hope, lol)
What I was going after is if you need 2048Kb to maintain latency, it would be simpler to just tell a regular home user to set it to that amount for m2 instead of configuring m1 and d. It is even more important if using realtime because if you don't have enough realtime and you start using link share, you get punished for not having enough bandwidth allocated.
Don't think I'm trying to say that m1+d are bad, they're just easy to mess up. An example is if someone was like, well, Mumble only uses 128Kb/s, so I'll set it to 128Kb for VoIP and the burst to 256Kb, or whatever the burst should be, all of that configuration goes down the drain the instant someone else comes over to their house and also tried using Mumble at the same time. Better off just telling them to set it to 256Kb. The worst case for giving a queue more bandwidth that it can use isn't much of anything, while not giving enough can be detrimental.
To reiterate, m1+d is only useful when you've given it enough bandwidth in the first place, but the bandwidth is so small that it affects the delay of the packets. Once you have enough bandwidth, the delay is no longer an issue.
Mumble with a 128Kb/s stream and 10ms per 160 byte packet. Similar to the HFSC example, but twice as much bandwidth, which also means 1/2 the delay. Instead of 20ms delays, now we're at 10ms delays, for one stream. If you want to have enough bandwidth for 4 steams, you need 512Kb/s, which now means you have a 2.5ms delay. Is it really worth configuring m1+d over 2.5ms?
This example doesn't work as simple for for a few streams because there is a good chance all 4 people are on the same Murmur server at the same time, meaning they'll be all getting packets at roughly the same time, increasing the chance of packets arriving in a single burst. As you start to approach a larger business and you have 100 people, most will not be talking at the same time or in the same group at the same time. Very little synchronizing.
I still think m1+d is primarily a tool for managing low bandwidth or very special situations.
It's late and I know I'm missing a bit of something in my numbers, but they are ballpark and delay should approach 0ms as the bandwidth goes up. Even one of the HFSC papers shows the maximum 160byte packet delay as a simple function of packet-size/bandwidth, with the 160byte packet @ 64Kb/s having a just under 20ms measured latency during their torture test.
Again, I'm not trying to downplay m1+d, but just be careful of your audience if trying to help a home user.
Although the conversation of when to use HFSC and it's ability to decouple bandwidth and delay could be interesting, I would like to postpone until we have understood how to properly use HFSC. Talking about when to use it when you have no idea how to use it seems like a bad idea.
Like I keep saying, I am trying to show people how HFSC works. It is up to them to figure out whether they want/need it.
-
You can't assure a 15ms reduction in latency while keeping it at 512Kb/s if it's trying to use more bandwidth. "Burst" only works if you're not trying to use more than your assigned amount.
My example hinges on the fact you are only using the 7 lines (but I've allocated for 8, just to be safe). If you try to use more bandwidth that you have allocated, then yes, obviously things will not work. Just as if I try to guarantee 20Mbit of a 5Mbit connection. If you misuse something, well… it may not work, that is obvious.
I will show-case an example from the HFSC paper. Hopefully this will clear a few things up.
From page-12 of the SIGCOM97.pdf paper in my OP.Consider the two-level class hierarchy shown in Figure 10. The value under each class represents the bandwidth guaranteed to that class. In our experiment, the audio session sends 160 byte packets every 20 ms, while the video session sends 8 KB packets every 33 ms. All the other sessions send 4 KB packets and the FTP session is continuously backlogged.
To demonstrate H-FSC’s ability to ensure low delay for real-time connections, we target for a 5 ms delay for the audio session, and a 10 ms delay for the video session. To achieve these objectives, we assign to the audio session the service curve Sa = (u-max=160 bytes, d-max=5 ms, r=64 Kbps), and to the video session the service curve Sv=(u-max=8 KB, d-max=10 ms, r=2 Mbps). Also, in order to pass the admission control test, we assign to the FTP session the service curve SFTP=(u-max=4 KB, d-max=16.25 ms, r=5 Mbps). The service curves of all the other sessions and classes
are linear.Building upon my quote from this post https://forum.pfsense.org/index.php?topic=89367.msg504184#msg504184, we can convert to the following
Real-time audio queue:
m1=256Kb [ u-max × 8 × (1000ms ÷ 5ms) ]
d=5
m2=64KbReal-time video queue:
m1=6.6Mb (Approximately. I am unsure whether KB is kilobytes or kibibytes. I think KB was still 1024 bytes back in ~1997.)
d=10
m2=2MbFTP queue:
m1=2Mb (Approximately. See my note above.)
d=16.25
m2=5MbYou can see, by the examples in the paper, that you can set delay (m1 & d) completely separately from the average bandwidth (m2). The audio queue is guaranteed to send packets at 256Kb (5ms, an improvement over 20ms) as long as the average bitrate for the queue does not exceed 64Kb (m2). The video queue is guaranteed to send packets at 6.6Mb (10ms, an improvement over ~33ms) as long as the average bitrate for the queue does not exceed 2Mb (m2).
To better understand the FTP queue's increased delay configuration, the following quote from Kenjiro Cho (author of ALTQ) is useful;
In the original HFSC paper, a single service curve is used for both real-time scheduling and link-sharing scheduling. We have extended HFSC to have independent service curves for real-time and link-sharing.
In my opinion, only link-share should be used to directly increase worst-case delay, and real-time should be used to directly decrease worst-case delay.
-
Mumble with a 128Kb/s stream and 10ms per 160 byte packet. Similar to the HFSC example, but twice as much bandwidth, which also means 1/2 the delay. Instead of 20ms delays, now we're at 10ms delays, for one stream. If you want to have enough bandwidth for 4 steams, you need 512Kb/s, which now means you have a 2.5ms delay. Is it really worth configuring m1+d over 2.5ms?
This example doesn't work as simple for for a few streams because there is a good chance all 4 people are on the same Murmur server at the same time, meaning they'll be all getting packets at roughly the same time, increasing the chance of packets arriving in a single burst. As you start to approach a larger business and you have 100 people, most will not be talking at the same time or in the same group at the same time. Very little synchronizing.
I still think m1+d is primarily a tool for managing low bandwidth or very special situations.
2.5ms is best-case delay, but your worst-case delay is still 10ms.
If 512Kb is enough bandwidth for 4 servers, you have not decreased the worst-case/guaranteed delay. If all 4 sessions were active simultaneously, the worst-case delay would be exactly the same as if there was 1 session active with 128Kb allocated. Four 160 byte packets @ 512Kbit/sec = 10ms. One 160 byte packet @ 128Kb = 10ms.
An important feature of HFSC, or any QoS implementation, is guaranteed delay bounds. QoS relies on worst-case guarantees. "Best-effort" is rather useless for real-time services.
It's late and I know I'm missing a bit of something in my numbers, but they are ballpark and delay should approach 0ms as the bandwidth goes up. Even one of the HFSC papers shows the maximum 160byte packet delay as a simple function of packet-size/bandwidth, with the 160byte packet @ 64Kb/s having a just under 20ms measured latency during their torture test.
Again, I'm not trying to downplay m1+d, but just be careful of your audience if trying to help a home user.
Please cite your source for that. I believe you are mistaken.
I am looking at the HFSC paper and the graph for the audio session shows ~1ms delays. Here is a (poor quality) picture from the slideshow at the author's website:
-
Here is a quick and easy "hands-on" example of using HFSC's decoupled delay and bandwidth allocation. We will use upper-limit to add a 10ms delay to ping replies. Just ping pfSense from your LAN, before and after, to see results.
In the traffic-shaper, on your LAN interface, create an HFSC leaf queue called "qPing". Configure qPing as follows:
Bandwidth=1Kb
Upper-limit.m1=0Kb
Upper-limit.d=10
Upper-limit.m2=100KbCreate a firewall rule that assigns all LAN Network->LAN Address ICMP Echo/Reply (ping/pong) into "qPing".
Before:
PING pfsense.wlan (192.168.1.1) 56(84) bytes of data.
64 bytes from pfsense.wlan (192.168.1.1): icmp_seq=1 ttl=64 time=0.225 ms
64 bytes from pfsense.wlan (192.168.1.1): icmp_seq=2 ttl=64 time=0.200 msAfter:
PING pfsense.wlan (192.168.1.1) 56(84) bytes of data.
64 bytes from pfsense.wlan (192.168.1.1): icmp_seq=1 ttl=64 time=11.3 ms
64 bytes from pfsense.wlan (192.168.1.1): icmp_seq=2 ttl=64 time=11.9 msAccording to our "qPing" configuration, packets will be delayed an additional 10ms and the queue's total bandwidth will be held at or below 100Kbit/sec.
I hope that this shows how (a packet's) delay is a separate function from bandwidth. Delay can be increased, or decreased (limited by your link's max bitrate), without affecting throughput. Bandwidth and delay are "decoupled". :)
PS - qPing's "Bandwidth" is set to 1Kb because it actually sets link-share's m2. Refer to my earlier posts about how link-share does not use absolute bitrates, only ratios. Short explanation; the number is unimportant unless you are saturating the queue's parent/root queue.
-
Ok. I'll bite.
Let's shape a site-to-site OpenVPN. I have one over which lots of different traffic gets sent and received. VoIP, file transfers, etc.
I realize it's pretty hard to shape the traffic inside the tunnel, so I want to start by shaping the OpenVPN connection itself.
I just ran a packet capture and when I am sending a file using scp, the maximum UDP packet sent is 1433 bytes, which is 11464 bits.
I have a 10Mbit/sec upload that is pretty reliable at that rate. That means it'll take .001156 seconds to send one packet, or 11.56ms
If I wanted to guarantee 15ms delay up to 2Mbit/sec average would I set:
m1: 11464
d: 11.56
m2: 20% (or 2,000,000)What about linkshare/bandwidth settings? When do they come into play if at all? After the 2Mbit/sec real-time is exceeded?
Should I just round up the packet size to 1500?
If I wanted to make sure this service curve was ALWAYS in effect, could I put a 2Mbit upperlimit in place as well, effectively limiting my OpenVPN to 2Mbits upload but with all traffic always in the real-time queue?
-
Interesting. I had a qVPN set at:
Bandwidth 5%
real-time - - 5%
link-share - - 5%–- 172.22.81.8 ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 48.403/50.989/68.843/2.543 msI changed that to:
Bandwidth 10%
real-time 11.5Kb 12 10%
link-share - - 10%–- 172.22.81.8 ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 56.007/58.560/66.571/1.727 ms--- 172.22.81.8 ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 56.491/59.047/71.021/2.572 msDelay increased by about 8ms. But my phone reported 12ms jitter throughout an upload speed test.
Note that you can't just leave the bandwidth blank. It puts something in there by default and the rules won't load. I think it uses the interface speed or something.
-
Ok. I'll bite.
Let's shape a site-to-site OpenVPN. I have one over which lots of different traffic gets sent and received. VoIP, file transfers, etc.
I realize it's pretty hard to shape the traffic inside the tunnel, so I want to start by shaping the OpenVPN connection itself.
I just ran a packet capture and when I am sending a file using scp, the maximum UDP packet sent is 1433 bytes, which is 11464 bits.
I have a 10Mbit/sec upload that is pretty reliable at that rate. That means it'll take .001156 seconds to send one packet, or 11.56ms
If I wanted to guarantee 15ms delay up to 2Mbit/sec average would I set:
m1: 11464
d: 11.56
m2: 20% (or 2,000,000)m1 needs to be bitrate and d needs to be your worst-case delay. Also, I think 0.001156 seconds is ~1.16ms. :P
If you want your OpenVPN to always get good response, you could give it the maximum real-time allocation of 80%. 8Mbit can send a 1433 byte packet in 1.4ms, so you config should be:
m1=8Mb
d=2 (round up from 1.4)
m2=2MbWhat about linkshare/bandwidth settings? When do they come into play if at all? After the 2Mbit/sec real-time is exceeded?
You could, depending on your setup, either allocate your OpenVPN queue with real-time, using your current configuration, or with link-share you could set all other queues to have an allowed increase in worst-case delay.
Imagine if a packet had a leaf queue real-time allocation of 10Mbit, a leaf queue link-share/"Bandwidth" allocation of 20Mbit, and a root queue "Bandwidth" (usually link-rate) of 50Mbit. This would mean the packet will get 10Mbit guaranteed, from real-time, and using the ratio of the 20Mbit link-share to additionally allocate all unused bandwidth. So, if there was a competing link-share queue with 10Mbit allocated, our original packet packet would have 10Mbit (guaranteed by real-time) plus the additional bandwidth, calculated by splitting the excess bandwidth proportionally between our 2 active link-share queues. Our packet would get a total bandwidth of 10Mbit plus ~67% of 40Mbit, so… like ~36.8Mbit for our packet.
Edit#2: Here is a more intuitve explanation of the above example. Imagine 2 packets were queued simultaneosly, with one being assigned to qPacket and the other being assigned to qOtherPkt. This would mirror the above example.
*WAN Bandwidth=**50Mb** –qPacket Bandwidth/Link-share.m2=**20Mb** Real-time.m2=**10Mb** –qOtherPkt Bandwidth/Link-share.m2=**10Mb**
Should I just round up the packet size to 1500?
Just for the sake of correctness and avoiding unknown consequences, I say no.
I am still trying to figure out the impact of over-sizing this value. It may just cause more CPU usage or it might cause … well, I dunno. :)Edit: Notice that, simply because of the limited granularity of milliseconds, the configuration I calculate above had to be rounded up to 2ms, and an 8Mbit connection can send 16000bits/2000bytes in 2ms, so… maybe it does not matter. I was hoping that the u-max (actual packet size, not bitrate) would act like admission control and put oversized packets into a different queue, but I have not experienced this in any of my simulations. I hope it proves to be true though.
If I wanted to make sure this service curve was ALWAYS in effect, could I put a 2Mbit upperlimit in place as well, effectively limiting my OpenVPN to 2Mbits upload but with all traffic always in the real-time queue?
Yes.
In theory, real-time allocation defines both the minimum and maximum bandwidth, but I have not been able to allocate real-time without also having link-share functionality in the queue. This is somewhat tolerable though, since we have upper-limit.Let me know if any of this works. :)
-
Interesting. I had a qVPN set at:
Bandwidth 5%
real-time - - 5%
link-share - - 5%–- 172.22.81.8 ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 48.403/50.989/68.843/2.543 msI changed that to:
Bandwidth 10%
real-time 11.5Kb 12 10%
link-share - - 10%–- 172.22.81.8 ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 56.007/58.560/66.571/1.727 ms--- 172.22.81.8 ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 56.491/59.047/71.021/2.572 msDelay increased by about 8ms. But my phone reported 12ms jitter throughout an upload speed test.
Note that you can't just leave the bandwidth blank. It puts something in there by default and the rules won't load. I think it uses the interface speed or something.
Your results are somewhat expected since you essentially told the queue to limit the bitrate of a packet to 11.5Kb for 12ms. See if the configuration in my above post improves your delay.
-
Yeah, being off by a place value doesn't help :/
-
Yeah, being off by a place value doesn't help :/
I blame the metric system. :)
…
Note that you can't just leave the bandwidth blank. It puts something in there by default and the rules won't load. I think it uses the interface speed or something.Though, I think you can put 1Kb (arbitrary number), so it will not throw errors, then put 0Kb into link-share's m2. I tested this and did not get the expected results. Here is a quote by the author of ALTQ to explain what my expected results were;
It is also possible to set either of the service curves to be 0.
When the real-time service curve is 0, a class receives only excess bandwidth.
When the link-sharing service curve is 0, a class cannot receive excess bandwidth. Note that 0 link-sharing makes the class non-work conserving.I need to test this more. Whether setting link-share to 0 works as expected or not is unimportant considering that we have upper-limit to accomplish practically the same thing.
-
Here is a graph that shows how a service curve works. This graph shows a service curve (blue color) with the following values;
m1=256Kb (slope of first segment)
d=5 (x-projection of where the slopes meet, aka the maximum transmission delay)
m2=64Kb (slope of second segment)This example is from the audio session example from the HFSC paper. The audio stream sends a single 160 byte packet every 20 ms (equivalent to 64Kbps).
The blue lines are the service curve (m1, d, m2).
The red lines show the audio packets. (Arrival/departure time and amount of data transmitted.)
The black lines show the time where no data is being transmitted.
The orange lines denote 20ms time-spans, starting from the arrival of the first audio packet at 5ms.
The x-axis is time and the y-axis is data transmitted.In the graph, the first audio packet arrives at 5ms. You can see that the transmission time of a 160 byte audio packet is 5ms (160 bytes @ 256Kbps), while the long-term average is 64Kbps. Notice how the graphing of the audio packets perfectly follows the service curve.
-
How to use link-share.
Link-share works in ratios because it's primary function is to "share" bandwidth between other link-share allocations. Link-share achieves this sharing ability by using "virtual time" for all link-share queues, which allows for bit-rates to increase or decrease, but the proportions will stay the same. Real-time, uses actual time. You can see my post here for an example of link-share's proportional throughput sharing. Do realize that you cannot guarantee absolute bitrates with link-share (unless there are no real-time allocations), but you can guarantee a proportional percentage of the unused bandwidth.Example: If I give 600Kb to queue-A, and 300Kb to queue-B, if a real-time queue needs to preempt, the virtual time may be increased from 1 second to 2 seconds (because real-time forced it's way in), causing the 600Kb of queue-A, and the 300Kb of queue-B to drop to 300Kb and 150Kb, respectively, for the given time-span. The proportions of the unused bandwidth are kept intact (2:1).
Because of link-share's non-absolute number "problem", I think it is nonsense to use link-share to decrease delay. Although, I do think it makes sense to use link-share to increase worst-case delay, for example m1=0Kb, d=50, m2=<arbitrary bitrate=""></arbitrary> would give a queue an increased worst-case (since it is link-share, it will get the best delay it can without preempting higher priority queues) delay of 50ms, while simultaneously still allowing whatever bandwidth ratio as set by m2. This means other link-share (and real-time, obviously) allocations would be able to preempt.
I need to do some rather complex testing (for my level of networking skills) to prove whether or not link-share's m1/d can be depended on to decrease delay, in relation to other link-share queues.
PS - Link-share may be able to decrease delay by setting m1 higher than m2 (ex. m1=256Kb, d=5, m2=64Kb), but the problem that confuses me is that link-share's m2 (and maybe m1?) is assuredly dynamic (unknown), which makes the relationship to m1/d very confusing, since their ratios may change whenever a higher priority queue needs throughput. This is why I think link-share should only be used to increase delay (0Kb cannot be proportioned any other way than 0, as far as I know).
Help Me!
I need to make a more concise and intuitive wiki entry about HFSC and if you guys could give me some pointers on what I need to include, explain more clearly, or any other pointers (does my graph make sense or just confuse?), I would like to hear them. What do I need to include? What should I leave out, or put into a separate "HFSC details" section? Do you want to see more real-world graphs, stats, etc? What popular traffic-shaping examples should I include?I have climbed so deep into the HFSC rabbit-hole that it is hard to see out. :P
:D
-
Interesting. I had a qVPN set at:
Bandwidth 5%
real-time - - 5%
link-share - - 5%–- 172.22.81.8 ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 48.403/50.989/68.843/2.543 msI changed that to:
Bandwidth 10%
real-time 11.5Kb 12 10%
link-share - - 10%–- 172.22.81.8 ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 56.007/58.560/66.571/1.727 ms--- 172.22.81.8 ping statistics ---
100 packets transmitted, 100 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 56.491/59.047/71.021/2.572 msDelay increased by about 8ms. But my phone reported 12ms jitter throughout an upload speed test.
Note that you can't just leave the bandwidth blank. It puts something in there by default and the rules won't load. I think it uses the interface speed or something.
Did you ever try this out? I remember it calculating to ~2ms delay vs ~6ms.
I prefer to look at it like you dropped your latency by 66%, instead of simply saying you only improved your latency by an unnoticeable 4ms… ;)
I bet gamers would be excited to drop their ping by a few %. Anyone know of any good articles on the general type of traffic that modern FPSs have? (Low bandwidth, small packet, high PPS UDP?) I found a few articles and papers, but most were old.
-
Not really. I'm currently running this on WAN:
queue qVPN on em2 bandwidth 15% hfsc ( realtime (30%, 4, 15%) )
It seems to have improved my remote desktop but I don't really have a way to measure it at the moment.
-
Mumble with a 128Kb/s stream and 10ms per 160 byte packet. Similar to the HFSC example, but twice as much bandwidth, which also means 1/2 the delay. Instead of 20ms delays, now we're at 10ms delays, for one stream. If you want to have enough bandwidth for 4 steams, you need 512Kb/s, which now means you have a 2.5ms delay. Is it really worth configuring m1+d over 2.5ms?
This example doesn't work as simple for for a few streams because there is a good chance all 4 people are on the same Murmur server at the same time, meaning they'll be all getting packets at roughly the same time, increasing the chance of packets arriving in a single burst. As you start to approach a larger business and you have 100 people, most will not be talking at the same time or in the same group at the same time. Very little synchronizing.
I still think m1+d is primarily a tool for managing low bandwidth or very special situations.
2.5ms is best-case delay, but your worst-case delay is still 10ms.
If 512Kb is enough bandwidth for 4 servers, you have not decreased the worst-case/guaranteed delay. If all 4 sessions were active simultaneously, the worst-case delay would be exactly the same as if there was 1 session active with 128Kb allocated. Four 160 byte packets @ 512Kbit/sec = 10ms. One 160 byte packet @ 128Kb = 10ms.
An important feature of HFSC, or any QoS implementation, is guaranteed delay bounds. QoS relies on worst-case guarantees. "Best-effort" is rather useless for real-time services.
It's late and I know I'm missing a bit of something in my numbers, but they are ballpark and delay should approach 0ms as the bandwidth goes up. Even one of the HFSC papers shows the maximum 160byte packet delay as a simple function of packet-size/bandwidth, with the 160byte packet @ 64Kb/s having a just under 20ms measured latency during their torture test.
Again, I'm not trying to downplay m1+d, but just be careful of your audience if trying to help a home user.
Please cite your source for that. I believe you are mistaken.
I am looking at the HFSC paper and the graph for the audio session shows ~1ms delays. Here is a (poor quality) picture from the slideshow at the author's website:
Please cite your source for that. I believe you are mistaken.
Real-time audio queue:
m1=256Kb [ u-max × 8 × (1000ms ÷ 5ms) ]
d=5
m2=64KbYou even used this as your example. This is what I was talking about when I said
Even one of the HFSC papers shows the maximum 160byte packet delay as a simple function of packet-size/bandwidth, with the 160byte packet @ 64Kb/s having a just under 20ms measured latency during their torture test.
Even your linked image shows the same thing, 1280bit(160byte) packet with a graph that shows a 20ms max, which is the time it takes a 64Kb/s link to send 160bytes, then a second graph that shows a 5ms max, which is the time it takes to send 160bytes down a 256Kb link. Yes, most of the time it's around 1ms, but the guaranteed max is 5ms, and there is a data point or two near the 5ms border.
If 512Kb is enough bandwidth for 4 servers, you have not decreased the worst-case/guaranteed delay. If all 4 sessions were active simultaneously, the worst-case delay would be exactly the same as if there was 1 session active with 128Kb allocated. Four 160 byte packets @ 512Kbit/sec = 10ms. One 160 byte packet @ 128Kb = 10ms.
Yes and no. The maximum delay is based on the maximum amount of consecutive data the must be sent at the "same time". In the case of a single 64Kb stream sending 160byte packets, the individual packets must be sent atomically. As your bandwidth increases, the number of packets increase, but the size remains the same. Yes, your worst case remains the same, but the chance of the worst case has been reduced by the number of streams.
Say you started off with a 50% chance of greater than 5ms of delay with 1 stream, but now you have 4 streams and 4x the bandwidth, so now your chance is 1/4 of 50%. At some point that chance of the worst case becomes statistically insignificant. Of course there can be pathological cases that are sometimes biased towards, and an absolute guarantee is nice to have. but it does come at the expense of requiring very intimate knowledge of your traffic patterns and bandwidth requirements, which are ever changing. But given a certain set of assumptions, you can know what your maximum delay will be.
It's rare to have exactly the amount of bandwidth you need, you either have too much or too little. With too much, delay should not be an issue, with too little, event HFSC can't help you. This is not a good argument for technicality reasons, but is a good argument for practicality reasons. I only say this because I don't want people to focus too much on micromanaging their bandwidth, when it's easier and safer to have a safety buffer.
This is why I personally went with a 45%/30% for my high priority queue. I should have plenty of raw bandwidth, but in case crap hits the fan, it can support a bit of burst.
-
Please cite your source for that. I believe you are mistaken.
Real-time audio queue:
m1=256Kb [ u-max × 8 × (1000ms ÷ 5ms) ]
d=5
m2=64KbYou even used this as your example. This is what I was talking about when I said
Even one of the HFSC papers shows the maximum 160byte packet delay as a simple function of packet-size/bandwidth, with the 160byte packet @ 64Kb/s having a just under 20ms measured latency during their torture test.
Even your linked image shows the same thing, 1280bit(160byte) packet with a graph that shows a 20ms max, which is the time it takes a 64Kb/s link to send 160bytes, then a second graph that shows a 5ms max, which is the time it takes to send 160bytes down a 256Kb link. Yes, most of the time it's around 1ms, but the guaranteed max is 5ms, and there is a data point or two near the 5ms border.
If 512Kb is enough bandwidth for 4 servers, you have not decreased the worst-case/guaranteed delay. If all 4 sessions were active simultaneously, the worst-case delay would be exactly the same as if there was 1 session active with 128Kb allocated. Four 160 byte packets @ 512Kbit/sec = 10ms. One 160 byte packet @ 128Kb = 10ms.
Yes and no. The maximum delay is based on the maximum amount of consecutive data the must be sent at the "same time". In the case of a single 64Kb stream sending 160byte packets, the individual packets must be sent atomically. As your bandwidth increases, the number of packets increase, but the size remains the same. Yes, your worst case remains the same, but the chance of the worst case has been reduced by the number of streams.
Say you started off with a 50% chance of greater than 5ms of delay with 1 stream, but now you have 4 streams and 4x the bandwidth, so now your chance is 1/4 of 50%. At some point that chance of the worst case becomes statistically insignificant. Of course there can be pathological cases that are sometimes biased towards, and an absolute guarantee is nice to have. but it does come at the expense of requiring very intimate knowledge of your traffic patterns and bandwidth requirements, which are ever changing. But given a certain set of assumptions, you can know what your maximum delay will be.
It's rare to have exactly the amount of bandwidth you need, you either have too much or too little. With too much, delay should not be an issue, with too little, event HFSC can't help you. This is not a good argument for technicality reasons, but is a good argument for practicality reasons. I only say this because I don't want people to focus too much on micromanaging their bandwidth, when it's easier and safer to have a safety buffer.
This is why I personally went with a 45%/30% for my high priority queue. I should have plenty of raw bandwidth, but in case crap hits the fan, it can support a bit of burst.
You are advising people in a thread dedicated to understanding a QoS algorithm, and you say "best effort" is good enough. By definition, "best effort" cannot support QoS, so you may be arguing to the wrong crowd. :( Then you follow up with "with too little [bandwidth], even HFSC can't help you.", like HFSC is at fault for being incapable of sending 20mbit through a 10mbit pipe… You need to spend more time on your posts.
Meh... I am tired of correcting your premature assumptions. Let us just pretend that I corrected a bunch of stuff in your post and move on.
Yeah, there are some unrealistic expectations of QoS/HFSC, but is that related to the topic of this thread? I say no, but as long as the posts are helpful, I have no problem. I know you are capable of making a better quality of post, but you chose not to, which is why I am angry.
If you have a simple, concise posit with some sources or detailed examples to support it, please share it. Heck, even just ask a question once or twice if you could use some help. I think I have corrected 40+ statements that were factually incorrect and got like 3 questions from you. I have made mistakes myself, but they were not for lack of effort (well, maybe sometimes...).
-
i recently moved back to HFSC from CBQ and so far it works good, in my case what Nullity says, his theory holds true. I deal in voip only but only difference is where i live, isp blocks voip so we setup a udp openvpn tunnel a route voip calls only through it and after doing packet captures on pfsense i realized the voip audio packets instead of 160bytes were 279 byes for me considering the openvpn overhead so based on this i set me voip queue as Nullity described in his voip post and i noticed my call jitter went down by a few ms on a saturated line (my line speed is 12Mb down and 3Mb up, but it goes to 12.5/3.5).
what i wanted to know is if we use this realtime values do we still need to use link share because in all the argument above i got confused if that holds true in a saturated situation or realtime
-
i recently moved back to HFSC from CBQ and so far it works good, in my case what Nullity says, his theory holds true. I deal in voip only but only difference is where i live, isp blocks voip so we setup a udp openvpn tunnel a route voip calls only through it and after doing packet captures on pfsense i realized the voip audio packets instead of 160bytes were 279 byes for me considering the openvpn overhead so based on this i set me voip queue as Nullity described in his voip post and i noticed my call jitter went down by a few ms on a saturated line (my line speed is 12Mb down and 3Mb up, but it goes to 12.5/3.5).
what i wanted to know is if we use this realtime values do we still need to use link share because in all the argument above i got confused if that holds true in a saturated situation or realtime
In the original HFSC implementation, there was no separate real-time & link-share (or upper-limit) parameter. There was only a single "service curve" parameter that simultaneously set both real-time & link-share to the same values.
So, set them both to the same (unless you know a good reason to do otherwise). Simultaneously using link-share & rea-time should not hurt since real-time has priority anyway.
Also, remember that most pfSense queues already have link-share's m2 param set to the queue's "Bandwidth" value automatically…
-
ok thanks so shall i set the bandwidth, link share and realtime to same or can bandwidth be set higher than realtime, if we use link share then that overrides bandwidth right.
the other thing i noticed is suppose if the line speed is 12.5Mbps down and we set the root queue to 12Mbps then the effective speed i get is around 11Mbps or at most 11.5Mbps, it goes no where near the 12Mbps mark set on the root queue, now the same if i change to CBQ then it very accurately touches the full 12Mnps (this i tested just with one lan client connected and maxed out the line with speedtest as well as tried with torrents)
the other issue is pfsense doesnt allow setting a root queue bandwidth in decimals so if line is 12.5Mbps, i cant set 12.25Mbps so i have to convert to Kbps and use that
-
I wouldn't use realtime, it's non-intuitive and very easy to use it wrong. Link-share is good enough.
-
Well I don't understand what's wrong in using it, all it does is give that queue a minium value all the time compared to link share which would give that queue the set bandwidth when it's back logged, and in the case of voip you need to have real-time bandwidth available as soon as the call starts so the signaling isn't delayed which would inturn increase call setup and RTP time. Even if there is no traffic for that queue which has real-time set that bandwidth isn't lost and still available for other queues.
In voip we at least know the packet size and what interval it's sent so I guess it's straight forward to set a proper value for real-time
-
Well I don't understand what's wrong in using it, all it does is give that queue a minium value all the time compared to link share which would give that queue the set bandwidth when it's back logged, and in the case of voip you need to have real-time bandwidth available as soon as the call starts so the signaling isn't delayed which would inturn increase call setup and RTP time. Even if there is no traffic for that queue which has real-time set that bandwidth isn't lost and still available for other queues.
In voip we at least know the packet size and what interval it's sent so I guess it's straight forward to set a proper value for real-time
Real-time's values are absolute (and limited to ~80% of total bandwidth). This is important when you are precisely allocating known traffic types like VOIP.
Link-share's values are not absolute, they are proportional (and can use 100% of total bandwidth, unlike real-time). This means accurate, completely predictable allocation is not possible with link-share. Link-share, as the name suggests, is meant only to share bandwidth. See this post for a clear example of link-share's proportional nature. (Pay attention to the first test, where my WAN bandwidth was correctly configured to 640Kbit.)
I would use real-time to improve/guarantee latency.
I would use link-share to control bandwidth or make latency worse, like you could allocate 5Mbit (m2) but also set m1=0Kbit & d=500, which would mean that traffic is allocated 5Mbit but packets can be delayed for up to 500ms, allowing other traffic to have better latency. The traffic will not be delayed unless there is other traffic. -
ok i understood realtime now, basically it can be used in cases where u know the type of traffic and its bitrate so then can reduce the latency on it.
one thing i didnt understand regarding the other link u gave me is do we set linkshare on the root queue or just the bandwdith
below is my current setup, can u tell me if its fine or simply point out whats wrong
WAN - bandwidth 100Mb
-qInternet - bandwidth 3000Kb - UL 3000Kb - LS 3000Kb - codel
–qACK - bandwidth 30% - LS 30% - codel - priority 6
--qOthersDefault - bandwidth 10% - LS 10% - codel - priority 4
--qP2P - bandwidth 5% - LS 5% - codel - priority 1 - default
--qVoIP - bandwidth 45% - RT 447Kb/5/224Kb - LS 45% - codel - priority 7
--qOthersHigh - bandwidth 10% - LS 10% - codel - priority 5LAN - bandwidth 100Mb
-qInternet - bandwidth 12000Kb - UL 12000Kb - LS 12000Kb - codel
--qACK - bandwidth 15% - LS 15% - codel - priority 6
--qOthersDefault - bandwidth 50% - LS 50% - codel - priority 4
--qP2P - bandwidth 5% - LS 5% - codel - priority 1 - default
--qVoIP - bandwidth 20% - RT 447Kb/5/224Kb - LS 20% - codel - priority 7
--qOthersHigh - bandwidth 10% - LS 10% - codel - priority 5 -
Well I don't understand what's wrong in using it, all it does is give that queue a minium value all the time compared to link share which would give that queue the set bandwidth when it's back logged, and in the case of voip you need to have real-time bandwidth available as soon as the call starts so the signaling isn't delayed which would inturn increase call setup and RTP time. Even if there is no traffic for that queue which has real-time set that bandwidth isn't lost and still available for other queues.
In voip we at least know the packet size and what interval it's sent so I guess it's straight forward to set a proper value for real-time
- Realtime always comes from the root queue, and can be greater than the parent queue's available bandwidth and even upperlimit, allowing you to starve your parent queue
- Realtime counts bandwidth consumed above your real-time negatively against you and will lower your bandwidth, violating the "minimum" exception. Rule of thumb, NEVER allow a queue with realtime to exceed it's allotted realtime bandwidth
- Link-share already gives me "immediate" bandwidth. I currently cannot measure a difference between my link being saturated or idle to within an accuracy or 0.01ms and 0.001% loss, and I only use linkshare.
Likshare has 99% of the benefit and none of the dangers. The primary benefit of realtime seems that it makes it simpler to not have to think about managing parent queues, except that simplicity comes at the cost of possibly harming the parent queues.
They added realtime for a reason, but without knowing more about the reason and implementation, I cannot empirically measure a benefit over linkshare. There may be some benefit for very slow links where the MTU is relatively large compared to the bandwidth
-
Well I don't quiet agree with you on the theory that real-time would take from root instead of the parent queue because if that were the case then parent queue would hold no importance and all the tests I did so far connecting to a voip server which is almost 250ms away in a different continent it doesn't seem real-time borrows from root queue and starves parent queue when I saturated the line. The limit on the parent queue comes in effect and the root queue doesn't exceed the traffic set as upper limit on the parent of the queue which I set real-time on. In fact after setting real-time on my voip queue I find the calls connect more quickly, jitter reduced and with active torrents saturating the line I didn't see any quality drops in voip but in fact voip experience improved.
The main thing is to know the proper bitrate and set proper values in real-time if you use that on any queue and so far what i read no where does it say real-time would borrow from root queue going over the limit that the parent queue has set.
-
Thanks for bumping this thread up. (it should be a sticky next to ermal's post which is something like 7 years old now)
Anyway, is it a limitation of the GUI to have only one interface per HFSC queue? This would solve the issue of splitting bandwidth using UPPERLIMIT on multiple LAN scenarios? Really knocks the flexibility of link share around when bandwidth is carved up on a per interface level.
-
Well I don't quiet agree with you on the theory that real-time would take from root instead of the parent queue..
It's not debatable, it's a fact.
queue root_igb0 on igb0 bandwidth 99Mb priority 0 {qACK, qUnclassified, qClassified}
queue qACK on igb0 bandwidth 19.80Mb qlimit 1024
queue qUnclassified on igb0 bandwidth 29.70Mb {qUDP, qDefault}
queue qUDP on igb0 bandwidth 13.07Mb qlimit 1024 hfsc( codel linkshare(16.34Mb 5 13.07Mb) )
queue qDefault on igb0 bandwidth 13.07Mb qlimit 1024 hfsc( codel default )
queue qClassified on igb0 bandwidth 1Mb {qNormal, qHigh}
queue qNormal on igb0 bandwidth 440Kb qlimit 1024 hfsc( codel )
queue qHigh on igb0 bandwidth 440Kb qlimit 1024 hfsc( codel realtime 40Mb linkshare(550Kb 5 440Kb) )Notice the parent queue qClassified is assigned 1Mb of bandwidth and the child queue qHigh is assigned 440Kb of linkshare and 40Mb of realtime. This is a perfectly valid configuration. And if qHigh decided to pull down 40Mb a sec and qClassified only has 1Mb of bandwidth, how much bandwidth is left over for qNormal? None.
Realtime is ALWAYS guaranteed to be available, assuming the queue has not gone over the provisioned amount of realtime, and all bandwidth the realtime consumes counts towards the parent queue(s).
And of course your realtime didn't starve the parent queue in your simple test because realtime cannot consume more than 80% of the bandwidth. That means there's always at least 20% free for link share. As long as there is some free bandwidth and you don't have an upper limit on the parent queue that is being exceeded by the realtime child queue, the sibling queues will still have some bandwidth to work with.
-
Whoops I feel like my question above may have also been irrelevant to the specific topic of this thread. (discussing decoupled bw/delay)
Perhaps a more direct question about the thread topic would be on the ping test by Nullity (thanks for that hands on example)…
How would CODEL affect that test, if the goal for CODEL is to maintain a 5ms buffer length, the packets in that test are essentially held for 10ms (the d parameter) due to the 0Kb bandwidth of the M1 parameter?
Would CODEL just drop all the packets? And if not, why? (Unfortunately I don't have access to a pfsense lab environment where I can run interesting tests like these.)
-
Whoops I feel like my question above may have also been irrelevant to the specific topic of this thread. (discussing decoupled bw/delay)
Perhaps a more direct question about the thread topic would be on the ping test by Nullity (thanks for that hands on example)…
How would CODEL affect that test, if the goal for CODEL is to maintain a 5ms buffer length, the packets in that test are essentially held for 10ms (the d parameter) due to the 0Kb bandwidth of the M1 parameter?
Would CODEL just drop all the packets? And if not, why? (Unfortunately I don't have access to a pfsense lab environment where I can run interesting tests like these.)
CoDel is designed to improve delay but keep all other aspects unchanged (like bandwidth). CoDel keeps the queue size low, but at least 1 packet needs to be queued before CoDel drops other incoming packets.
Artificially adding delay with HFSC (or ipfw once the CoDel & fq_codel are added: http://caia.swin.edu.au/freebsd/aqm/) might affect CoDel's queue delay calculation, or maybe not since I don't know the implementation details. Regardless, CoDel should not cause a problem, by design, but the easiest answer is to just run a test a see. :)
I really need to set up a (virtual?) lab too…
-
Codel has an extremely large internal buffer, but the buffering algorithm is does not do a hard cut-off. When the latency gets greater than 10ms, it will drop one packet, not all packets. It will then check one interval later to see if the buffer has gotten below the threshold. If not, drop another packet.
When playing around, I was able to get Codel to cause multi-second ping times, but you really have to game it.
-
Advanced real-time VOIP configuration
Say you want to efficiently improve your VOIP setup. You have 7 VOIP lines and a 5Mbit upload. Each VOIP packet is 160 bytes (G.711), with an average bitrate of 64Kbit/s per line (a 160byte packet sent every 20ms). We want to improve our worst-case delay (which also improves worst-case jitter and overall call quality) from 20ms to 5ms, so we calculate the bitrate needed to achieve that:
160 bytes × 8 = 1280 bits
1280 bits × (1000ms ÷ 5ms) = 256000 bits/sSo, to send a 160 byte packet within 5ms we need to allocate 256Kbit/s. This gives us our m1 (256Kb) and our d (5).
Now we need to calculate our maximum average bitrate:
7 lines × 64Kbit/s = 448Kbit/s
Just to be safe, we allocate bandwidth for an extra line, so
8 lines × 64Kbit/s = 512Kbit/sThis gives us our m2 (512Kb).
To make sure that your m1 (the per packet delay) can always be fulfilled by your connection, make sure that the m1 (256Kb) multiplied by the maximum number of simultaneous sessions (7 or 8) is less than your maximum upload. 2048Kbit (256Kb × 8) is less than 5000Kbit, good.
Our finalized configuration is:
m1=256Kb
d=5
m2=512KbThis configuration will guarantee a 7.4ms (5ms+MTU transmission delay) worst-case delay for each packet, with a limit of 8 simultaneous calls (512Kbit/s). We get low-delay, low-jitter calls as though you had allocated 2048Kbit/s of bandwidth, but you actually only allocated 512Kbit/s of bandwidth.
I may be misunderstanding the m1 value. It would seem to me that with m1 set to 256Kb with packets taking ~7.5ms to send if I had 8 active lines and by chance each 20ms packet happened at the same time, I would have a potential 80ms (8 x 7.5) delay as the queue becomes backlogged. While not disastrous to VoIP, if this scales up I would begin to see the delays in the call. Perhaps my understanding of VoIP is lacking.
Sorry about the zombie thread…
-
I think it's simpler to just use the single packet case. If you have 1 160byte(1280bit) packet every 50ms, that's 64kbit/s. But, at 64Kb/s, HFSC is allowed to schedule that packet in any manner, as long as it completes before the 50ms to send the next packet. This means your absolutely worst case is 50ms.
If you want this packet to be sent out faster, say 5ms, then you need "5ms" of bandwidth for scheduling reasons. That will take 256Kb of bandwidth to send 1280bit packet in 5ms. But you don't need 256Kb average, just 256Kb of burst. So you tell HFSC, 64Kb average, 256Kb burst for 5ms.
Or something along these lines.
-
Advanced real-time VOIP configuration
Say you want to efficiently improve your VOIP setup. You have 7 VOIP lines and a 5Mbit upload. Each VOIP packet is 160 bytes (G.711), with an average bitrate of 64Kbit/s per line (a 160byte packet sent every 20ms). We want to improve our worst-case delay (which also improves worst-case jitter and overall call quality) from 20ms to 5ms, so we calculate the bitrate needed to achieve that:
160 bytes × 8 = 1280 bits
1280 bits × (1000ms ÷ 5ms) = 256000 bits/sSo, to send a 160 byte packet within 5ms we need to allocate 256Kbit/s. This gives us our m1 (256Kb) and our d (5).
Now we need to calculate our maximum average bitrate:
7 lines × 64Kbit/s = 448Kbit/s
Just to be safe, we allocate bandwidth for an extra line, so
8 lines × 64Kbit/s = 512Kbit/sThis gives us our m2 (512Kb).
To make sure that your m1 (the per packet delay) can always be fulfilled by your connection, make sure that the m1 (256Kb) multiplied by the maximum number of simultaneous sessions (7 or 8) is less than your maximum upload. 2048Kbit (256Kb × 8) is less than 5000Kbit, good.
Our finalized configuration is:
m1=256Kb
d=5
m2=512KbThis configuration will guarantee a 7.4ms (5ms+MTU transmission delay) worst-case delay for each packet, with a limit of 8 simultaneous calls (512Kbit/s). We get low-delay, low-jitter calls as though you had allocated 2048Kbit/s of bandwidth, but you actually only allocated 512Kbit/s of bandwidth.
I may be misunderstanding the m1 value. It would seem to me that with m1 set to 256Kb with packets taking ~7.5ms to send if I had 8 active lines and by chance each 20ms packet happened at the same time, I would have a potential 80ms (8 x 7.5) delay as the queue becomes backlogged. While not disastrous to VoIP, if this scales up I would begin to see the delays in the call. Perhaps my understanding of VoIP is lacking.
Sorry about the zombie thread…
The scenerio was for 7 lines, not 8. It could handle 8, but that'd be 100% usage… I dunno what standard operating prodecure is, but allowing for no overhead seems dangerous.
There will be no problematic backlog. The scenerio above can handle 7 simultaneous lines while guaranteeing a per-packet latency of ~7.5ms for every VOIP packet, because both the internet connection and queue are not overused at any point, even during 7 simultaneous VOIP calls.
-
The scenerio was for 7 lines, not 8. It could handle 8, but that'd be 100% usage… I dunno what standard operating prodecure is, but allowing for no overhead seems dangerous.
There will be no problematic backlog. The scenerio above can handle 7 simultaneous lines while guaranteeing a per-packet latency of ~7.5ms for every VOIP packet, because both the internet connection and queue are not overused at any point, even during 7 simultaneous VOIP calls.
Thanks for the response! It sounds like the m1 is per packet. Is that correct? So if pfSense received 2 packets within 2ms each packet would be allocated m1 and I would see a burst of 256Kb/s x 2 = 512Kb/s?
You also talk about the connection not being over used. If I have a 5Mb/s upload (rated for 6Mb/s), 4Mb/s for linkshare in other queues (4Mb/s upper limit), and 1Mb/s for the qVoIP, I should never have to worry about over saturation.
Note: Not exact numbers but the same idea. -
Thanks for the response! It sounds like the m1 is per packet. Is that correct? So if pfSense received 2 packets within 2ms each packet would be allocated m1 and I would see a burst of 256Kb/s x 2 = 512Kb/s?
Yeah, exactly. As I understand it m1/d are per packet. If m1/d were per queue, then the per packet latency would be dependant on how many packets were queued, which would make latency unpredictable and practically break HFSC's latency guarantees.
You also talk about the connection not being over used. If I have a 5Mb/s upload (rated for 6Mb/s), 4Mb/s for linkshare in other queues (4Mb/s upper limit), and 1Mb/s for the qVoIP, I should never have to worry about over saturation.
Note: Not exact numbers but the same idea.There are a number of ways to do it, but that setup seems fine. You should stress test it to confirm proper functionality though, regardless.
-
It's kind of both. The scheduling is done at the queue level, but it is done in discrete steps of packets. The service curve is just the priority, which every curve is "greater" at any given moment is scheduled next. The m1+d modify the curve to make it look like it is greater. If sending the packet will push it beyond its limit, it won't send the packet yet. Of course if there is spare bandwidth and no other packets, it'll just send the packet anyway.
I just think of HFSC as a dynamic priority based queue where the priority is adjusted in real-time per packet processed, such that all queue configurations are respected. Of course my simplification breaks down in extreme situations, like absurdly low bandwidths, absurdly large MTUs, or absurdly low target latencies, but it's a good rule of thumb.