Traffic Shaping broke my LAN - topic has deviated

Harvy66

The only time the traffic-shaper has broke my LAN is when I would forget to change the LAN queue bandwidth limit from (k?)bit/sec to mbit/gbit… self-inflicted DoS. :)

For me, it actually affected other VLANs on the same physical interface.

Harvy66

I just had it happen again when I tried this setup

LAN - HFSC 95Mbit
–pInternet: Upper Limit 95Mb
----qHigh: LS: 49.42% 5ms 30.55% Queue 4069+Codel
----qNormal: LS: 30.55% Queue 4069+Codel
----qDefault: LS: 15.27% Queue 4069+Codel
----qACK: LS 0.01% RT 20%

After I hit save, I lost LAN access.

This is my current setup and it works just fine
LAN - HFSC 95Mbit
--qHigh: LS: 49.42% 5ms 30.55% Queue 4069+Codel
--qNormal: LS: 30.55% Queue 4069+Codel
--qDefault: LS: 15.27% Queue 4069+Codel
--qACK: LS 0.01% RT 20% Queue: 4096 FIFO

I would like to do more testing, but my wife has hit her limit.

My next plan is to backup my config and do a clean install, some time after 2.2.1. I also plan to repartition my SSDs to only be 20GB in size instead of the full 120GB, that way GEOM doesn't re-write the full 120GB every time it gets mad.

In case someone is wondering, I'm using the 80:20 rule, such that High and normal are 40% each and default is the 20%. But then I scaled them down so qHigh can have a 1.618x burst for 5ms, golden ratio, which is about 11,000 bytes worth. Plenty for a short burst of small packets.

Nullity

The only strange thing I see is the overly precise percentages.
Maybe try whole numbers for the percentages?

Nullity

@Harvy66:

In case someone is wondering, I'm using the 80:20 rule, such that High and normal are 40% each and default is the 20%. But then I scaled them down so qHigh can have a 1.618x burst for 5ms, golden ratio, which is about 11,000 bytes worth. Plenty for a short burst of small packets.

If you are only using HFSC for it's limiting properties (since it cannot prioritize incoming packets that have already arrived), why not switch to limiters?

The burst for qHigh is unneeded. You will only see imperceptible, sub-millisecond improvements. (~50Mbit vs ~30Mbit)
Isn't a 5ms burst of ~50Mbit equal to ~31,250 bytes, not 11,000 bytes?

Edit: I think the burst is copied to every stream in the queue (guaranteeing each stream's delay), but the m2 is applied to the entire queues average bandwidth. This is the only way it makes sense to me. If the m1/d params were applied to the entire queues delay, then delay would be dependant on the number of streams, which would preclude any guarantee of delay.

I mention that because m1 and d should never be calculated to be larger than MTU, according to my previous paragraph. I am still reading the referenced papers in the HFSC paper to find a definitive answer.

Harvy66

I updated my posts, those percentages were in reference to linkshare, only the one upper bound is used by the parent queue. My plan was to make the root 1Gb again.

"since it cannot prioritize incoming packets that have already arrived" - Actually, the only packets you can prioritize are ones that have already arrived. I want HFSC to artificially delay my bursty YouTube and Netflix streams and let my High priority stuff through.

The 5ms burst would be on the difference between the nominal and burst, so (0.4942-0.3055)95,000,0000.005/8=11,204.0625 bytes, which is almost 20 of the average packet size(576bytes). HFSC does everything on a curve, so I'm not sure if burst exactly works that way, but I figure it's close.

Nullity

@Harvy66:

"since it cannot prioritize incoming packets that have already arrived" - Actually, the only packets you can prioritize are ones that have already arrived. I want HFSC to artificially delay my bursty YouTube and Netflix streams and let my High priority stuff through.

~~Incoming WAN traffic will be shaped when leaving the LAN. A 100Mbit WAN link cannot saturate your 1Gbit LAN. Since QoS only serves a purpose on a congested link, QoS is useless on your LAN for incoming WAN traffic.

The only thing HFSC is accomplishing is rate limiting.

You could create limiters and allocate 40/40/20 of your incoming bandwidth and it would accomplish practically the same thing.~~

Edit: Disregard all this. I do not fully understand how incoming traffic-shaping works.

Nullity

@Harvy66:

The 5ms burst would be on the difference between the nominal and burst, so (0.4942-0.3055)95,000,0000.005/8=11,204.0625 bytes, which is almost 20 of the average packet size(576bytes). HFSC does everything on a curve, so I'm not sure if burst exactly works that way, but I figure it's close.

HFSC's m1 and d parameters achieve the same goal as the d-max and u-max parameters, which are the parameters used in the HFSC paper. The u-max parameter is the largest unit of work (aka packet size) and d-max is the guaranteed delay (aka milliseconds to fully transmit).

The paper mentions an example for an audio stream with 160byte packets;
u-max=160 bytes
d-max=5 ms
r=64 Kbps (r is m2)

Trasnslating this to m1/d/m2 notation is; (160bytes in 5ms is 256Kbps)
m1=256 Kbps
d=5 ms
m2=64 Kbps

My point is, m1/d seems to be single packet-size only, not multiple. Bursting 30Kbyte seems unneeded.
Be cautious with HFSC. Misconfigurations are rampant.

Harvy66

@Nullity:

@Harvy66:

"since it cannot prioritize incoming packets that have already arrived" - Actually, the only packets you can prioritize are ones that have already arrived. I want HFSC to artificially delay my bursty YouTube and Netflix streams and let my High priority stuff through.

Incoming WAN traffic will be shaped when leaving the LAN. A 100Mbit WAN link cannot saturate your 1Gbit LAN. Since QoS only serves a purpose on a congested link, QoS is useless on your LAN for incoming WAN traffic.

The only thing HFSC is accomplishing is rate limiting.

You could create limiters and allocate 40/40/20 of your incoming bandwidth and it would accomplish practically the same thing.

My ISP actually bursts 1Gb/s for tens of milliseconds regularly. This was more of an issue before my ISP did AQM, but it can cause packet-loss issues if I let the burst through. Google sets the receive window too large if I let the burst through at full rate.

@Nullity:

@Harvy66:

The 5ms burst would be on the difference between the nominal and burst, so (0.4942-0.3055)95,000,0000.005/8=11,204.0625 bytes, which is almost 20 of the average packet size(576bytes). HFSC does everything on a curve, so I'm not sure if burst exactly works that way, but I figure it's close.

HFSC's m1 and d parameters achieve the same goal as the d-max and u-max parameters, which are the parameters used in the HFSC paper. The u-max parameter is the largest unit of work (aka packet size) and d-max is the guaranteed delay (aka milliseconds to fully transmit).

The paper mentions an example for an audio stream with 160byte packets;
u-max=160 bytes
d-max=5 ms
r=64 Kbps (r is m2)

Trasnslating this to m1/d/m2 notation is; (160bytes in 5ms is 256Kbps)
m1=256 Kbps
d=5 ms
m2=64 Kbps

My point is, m1/d seems to be single packet-size only, not multiple. Bursting 30Kbyte seems unneeded.
Be cautious with HFSC. Misconfigurations are rampant.

You can also use the burst to mimic speedboost. My old ISP used to give a 100Mb "boost" for about 8 seconds, then return back to 30Mb. That would be similar to 100Mb 8000ms 30Mb HFSC setting.

A "burst" is really just a temporary change in ratios. It is more useful for smaller that tend to have strong bursting characteristics.

Nullity

@Harvy66:

You can also use the burst to mimic speedboost. My old ISP used to give a 100Mb "boost" for about 8 seconds, then return back to 30Mb. That would be similar to 100Mb 8000ms 30Mb HFSC setting.

A "burst" is really just a temporary change in ratios. It is more useful for smaller that tend to have strong bursting characteristics.

The HFSC paper never directly mentions using m1/d this way. Have you verified that HFSC can be used this way?

It might be a better idea to use upper-limit to achieve a "speedboost". (Though, remember, HFSC created the 2-part service-curve to explicitly decouple bandwidth and delay, not to achieve a "speedboost". Your 8second speedboost might be possible, but it may also be a misuse that will cause unforeseen problems.)

My criticism is an attempt to make sure you are configuring HFSC cautiously and avoiding potentially breaking HFSC/traffic-shaping by misconfiguring it.

Harvy66

M1 is a bandwidth setting, exactly the same as M2. Sum(max(m1,m2)) cannot be greater than 100% of the parent queue.

Nullity

@Harvy66:

M1 is a bandwidth setting, exactly the same as M2. Sum(max(m1,m2)) cannot be greater than 100% of the parent queue.

Yeah, they are both bandwidths, but m1 and d define delay while m2 defines average bitrate. Perhaps they can be used in other ways, but that may cause undefined results. I would configure HFSC conservatively and not make guesses.

Since HFSC is primarily meant to guarantee per-packet delay, the calculated byte-size of "m1 × (d ÷ 1000) × 8" should not greatly exceed the interface's MTU. (I hope I got that math right.)

I am still unclear how m1/d interact with link-share, since link-share makes no guarantees about bandwidth or delay. Link I mentioned earlier, if you must attempt the "speedboost", maybe use upper-limit by setting m1=50Mb, d=5, and m2=30Mb.

Maybe switch to a more conservative configuration and try just setting m2 and see if it fixes your broken LAN?

Harvy66

I would like to do more testing, but whit one machine, I can't afford the wife to get angry. I'm leaving it a lone for now.

I try not to care what HFSC does to individual packets, I only really care what it does on average. If I set a burst from 30% to 50% for 5ms, to me that just means for 5ms, the ratio of bandwidth between queues will be different. bursts are just ways to temporarily modify bandwidth with little change to the overall average.

Nullity

@Harvy66:

I would like to do more testing, but whit one machine, I can't afford the wife to get angry. I'm leaving it a lone for now.

I try not to care what HFSC does to individual packets, I only really care what it does on average. If I set a burst from 30% to 50% for 5ms, to me that just means for 5ms, the ratio of bandwidth between queues will be different. bursts are just ways to temporarily modify bandwidth with little change to the overall average.

Yeah… I use my internet gateway for testing purposes too. :(

You may be right about using bursts as speedboosts (increased average bitrate for all packets transmitted during a 5ms time-span, for example), but since the HFSC paper never explicitly uses it like that (m1/d is only used to modify delay, while m2 is the average), I think we need to do some testing to prove/disprove your theory.

It would be great if HFSC's m1/d could be used to modify the initial average bitrate of a connection/session, but I am obviously skeptical.

A quick thought experiment... your theory seems impossible for UDP since all individual UDP packets are their own individual transmission sessions (connectionless). If you employed m1/d like in your speedboost thoery, all UDP packets could send at 50Mbit for the first 5ms but must stay below a 30Mbit average. Since each individual UDP packet is a new session, and each new session gets an initial 50Mbit "speedboost" for 5ms, your average UDP bitrate could reach 50Mbit, because all UDP packets are transmitted during the "speedboost". The average bitrate could forever be above the m2 configured average bitrate of 30Mbit. Your "speedboost" theory breaks HFSC by causing m2 to be non-functioning.

The only way m1 and d function without breaking UDP sessions with HFSC is if m1 and d only modify individual packet transmission delay, and not the initial average session bitrate like in your "speedboost" theory.
I am pretty sure this applies to TCP as well, but I am not sure. The HFSC paper focuses on generic packet delay, never mentioning "TCP" and only mentions "UDP" once during a simulation of link-share's bandwidth sharing capabilities.

It seems like HFSC's m1/d params only control the transmission delay of generic packets/frames, and nothing else.

Any criticisms are welcome. These conversations help me understand HFSC.

Harvy66

UDP " stateless" as in the state must be handled by the application, not the protocol, but even PFSense has a notion of a UDP "State". UDP doesn't "burst". The most common usage of UDP is a flow of packets that are sent at standard intervals or event based. Most UDP flows do not "burst" in the sense of TCP trying to send data, UDP just blindly sends data with no concern for congestion.

When UDP "bursts", it does so only in the sense that many packets so happened to arrived at the "same time", in some sense of the term "same time'.

And HFSC doesn't burst based on new sessions, it bursts based on the current usage of the queue. If a queue has 30Mb allocated, but only 15Mb is being used, but suddenly a bunch of traffic arrives "at the same time", HFSC will temporarily allow a higher amount of bandwidth through that is above the 30Mb. I think the burst is inverse related to the current usage of the queue, such that if the queue is at 100% usage, the burst will be 0% of the burst amount, but if you're at 50% usage, the amount of burst will be 50% of the assigned amount. Without knowing the exactness of how the burst works, I cannot know how it will function at the micro time scale.

The best way to think about HFSC is not in bandwidth, but in ratios. Actual packet scheduling is based on the current ratios and packet ordering is not part of the algorithm. HFSC decouples bandwidth and latency such that if all queues are at 100% utilization, regardless of the amount of bandwidth assigned, they will all experience the same latency. While latency for all queues are equal, the ordering of the packets are not guaranteed, unless you use the priority option, but from the sounds of it, it makes little difference.

Derelict

Connectionless != Stateless.

Harvy66

I read a little about burst and it seems that it's more "bucket" like in that "credit" for the burst accumulates with some relation of the lack of usage of the entire allocated bandwidth. So 50% usage would not mean you can only use 50% of the burst, but the rate at which credit for the burst accumulate may be reduced by 50%. This rate may or may not be linear and in direction relation to the volume of the burst.

So I don't know if this example would be correct: If you have a 10Mb burst for 10ms, once you've used the burst, you need to not use 10Mb for another 10ms in order to get the full burst again.

It does sound more like you save current bandwidth now in order to burst bandwidth later, and not use a burst now but get penalized in the future. My guess is it's volume based and over a smooth curve, as HFSC tends to have a smoothing effect. So my current guess is if you have a 10Mb burst for 10ms, but you "saved" 5Mb of credit since the last burst, you will get that 5Mb spread smoothly over that same 10ms burst.

Of the information that I have seen, "burst" just changes the allocated bandwidth, but does not affect when a packet leaves. It sounds the same as temporarily changing the m2 to the m1 for a duration.

Nullity

@Harvy66:

UDP " stateless" as in the state must be handled by the application, not the protocol, but even PFSense has a notion of a UDP "State". UDP doesn't "burst". The most common usage of UDP is a flow of packets that are sent at standard intervals or event based. Most UDP flows do not "burst" in the sense of TCP trying to send data, UDP just blindly sends data with no concern for congestion.

When UDP "bursts", it does so only in the sense that many packets so happened to arrived at the "same time", in some sense of the term "same time'.

And HFSC doesn't burst based on new sessions, it bursts based on the current usage of the queue. If a queue has 30Mb allocated, but only 15Mb is being used, but suddenly a bunch of traffic arrives "at the same time", HFSC will temporarily allow a higher amount of bandwidth through that is above the 30Mb. I think the burst is inverse related to the current usage of the queue, such that if the queue is at 100% usage, the burst will be 0% of the burst amount, but if you're at 50% usage, the amount of burst will be 50% of the assigned amount. Without knowing the exactness of how the burst works, I cannot know how it will function at the micro time scale.

The best way to think about HFSC is not in bandwidth, but in ratios. Actual packet scheduling is based on the current ratios and packet ordering is not part of the algorithm. HFSC decouples bandwidth and latency such that if all queues are at 100% utilization, regardless of the amount of bandwidth assigned, they will all experience the same latency. While latency for all queues are equal, the ordering of the packets are not guaranteed, unless you use the priority option, but from the sounds of it, it makes little difference.

I think it is important to know that the HFSC authors never referred to m1/d as a "burst". They only called it one thing, a way of creating a 2-part service curve to decouple bandwidth and delay allocation. I am not sure where the confusing "burst" explanation came from.

Here is a quote about priority from the first sentence of the paper:

In this paper, we study hierarchical resource management models and algorithms that support both link-sharing and guaranteed realtime services with priority (decoupled delay and bandwidth allocation).

They say that because m1/d control delay (not "speedboost"), which decides the ordering of the packets. The service curve with earliest deadline (lowest delay, without getting overly technical) will be sent first, as mentioned below.

HFSC does have an algorithm for packet ordering, it is based on SCED (Service Curve Earliest Deadline first). See the following paper for it's origins: http://www-afs.secure-endpoints.com/afs/ece/u/anoopr/OldFiles/742/Scheduling%20for%20quality%20of%20service%20guarantees%20via%20service%20curves.pdf
I think HFSC modifed it to support both real and virtual times, but I am not completely sure… still trying to absorb a couple dozen academic papers. :-\

Only link-share works with ratios, because it uses Virtual Time. Real-time, obviously, works with Real Time and that is why it can guarantee delay and bandwidth when link-share does not.

Nullity

@Harvy66:

I read a little about burst and it seems that it's more "bucket" like in that "credit" for the burst accumulates with some relation of the lack of usage of the entire allocated bandwidth. So 50% usage would not mean you can only use 50% of the burst, but the rate at which credit for the burst accumulate may be reduced by 50%. This rate may or may not be linear and in direction relation to the volume of the burst.

So I don't know if this example would be correct: If you have a 10Mb burst for 10ms, once you've used the burst, you need to not use 10Mb for another 10ms in order to get the full burst again.

It does sound more like you save current bandwidth now in order to burst bandwidth later, and not use a burst now but get penalized in the future. My guess is it's volume based and over a smooth curve, as HFSC tends to have a smoothing effect. So my current guess is if you have a 10Mb burst for 10ms, but you "saved" 5Mb of credit since the last burst, you will get that 5Mb spread smoothly over that same 10ms burst.

Of the information that I have seen, "burst" just changes the allocated bandwidth, but does not affect when a packet leaves. It sounds the same as temporarily changing the m2 to the m1 for a duration.

"Burst" is a misnomer. m1/d only control a packet's delay. A token bucket is something different which can actually achieve a "burst" as you perceive it.

Expanding on your HFSC example, if
m1=10Mb
d=10ms
m2=5Mb

Then we could translate this to u-max/d-max
u-max=12500 Bytes [ m1×(d÷1000)÷8 = u-max ]
d-max=10ms
r=5Mb (aka m2)

This means a packet/frame with the maximum size of 12000Bytes (wayyy oversized) is guaranteed to be transmitted within 10ms, but the average bitrate cannot go over 5Mb. Once the packet is "burst", no other packets can be sent until the long-term average has fallen low-enough that another u-max sized packet can be transmitted at "burst" speeds without raising the long-term average above the m1 bitrate. (I am unsure how long "long-term" is)

That is how the HFSC paper explains it.

We should probably stop calling m1/d a "burst" because it is, at best, an over-simplification, and more likely plain wrong because m1 can be set to 0. If m1=0, there is no "burst", only added delay.

Harvy66

I've read similar examples about burst. So it sounds like "bursting" allows you to acquire a certain amount of "bandwidth debt", up to the amount set in the burst. It is kind of an extra examples to use a 1500byte packets on a 400kb link, but this hypothetical situation gets used many times. So in this case, the 1500byte packet cannot be dequeued until enough "bandwidth credits" have been accumulated, but a burst value gets set which allows the queue to acquire some "bandwidth debt", allowing it to dequeue much sooner.

This situation is a bit unnatural, except in the case of people with slow links, because the MTU is extremely large relative to the bandwidth, plus they're using a 1500byte packet and claiming it to be VoIP. 1500bytes on a 400Kb link is 30ms and that's ignoring how much bandwidth the example queue is given. In a more real world example, you have a 1Gb link and 200 byte VoIP packets, which gives you 0.0016ms. Even if you set aside 10Mb for the VoIP queue, you're still talking about an average 0.16ms queue time at saturation.

My argument is that while thinking about individual large packets on slow links is important for showing how the algorithm works, in the real world, you're more likely to have to think about statistical averages when configuring burst values. If I had a 10Gb link and I needed to set aside 2Gb for VoIP, I won't be trying to figure by burst value for a 0.0008ms duration for a 200 byte packet, I'll be more concerned about what is going on at a more macro level, relative to an 80ns packet delay, in the 1ms+ range. What is the threshold for VoIP delay before it becomes perceptible? 5ms? 10ms? 20ms? Those are the bursts I would be focusing on or some fraction of them. Maybe the optimal burst length is 50% of your target delay. If I want VoIP to get dequeued before a 10ms delay, maybe I want a 5ms burst target.

In the end, it all comes down to bandwidth management. Assuming HFSC does a good job interleaving packets in a fine-grained fashion, all you really need to do is focus on "do I have enough bandwidth to handle this traffic". The general rule is below 80% saturation, delay is not an issue. If I want to be able to handle 8Mb/s of VoIP, I really need to just have a bare minimum 10Mb or more bandwidth set aside. Given enough traffic flows, you don't get bursts of data, utilization becomes smooth and predicable. When you don't have enough bandwidth, and bursty traffic comprises a substantial portion of your bandwidth, you'll need to know the "shape" of the traffic you're try work with.

Individual flows or groups of flows have shapes when there is no congestion. Those are the shapes you want HFSC to mimic when controlling delay. The other elephant in the room is controlling your delay via bursts will always come at the expense of other queues. If VoIP is so important, why not just give that queue enough bandwidth in the first place? If you can't give it enough bandwidth, then no matter what you do, you will get delays because you can't shove 10Mb data down a 5Mb pipe without something giving.

Bursts seem more useful in the sense of someone has a highly specialized situation with a very specific traffic pattern or working with incredibly slow bandwidth rates where the packet size is large relative to the bandwidth.

Nullity

@Harvy66:

…

If VoIP is so important, why not just give that queue enough bandwidth in the first place? If you can't give it enough bandwidth, then no matter what you do, you will get delays because you can't shove 10Mb data down a 5Mb pipe without something giving.

Bursts seem more useful in the sense of someone has a highly specialized situation with a very specific traffic pattern or working with incredibly slow bandwidth rates where the packet size is large relative to the bandwidth.

HFSC does not "burst" in the way you think it does. Browse through the HFSC paper. It is only 16 pages, iirc.

VOIP is one of the major reasons why HFSC is useful. Using the example from the HFSC paper, you can guarantee a 64Kb audio session (160Byte packet sent every 20ms) a delay of 5ms (160Bytes @ 256Kb) instead of 20ms (160Bytes @ 64Kb). I showed the u-max/d-max and m1/d/m2 notation of this exact setup in one of my previous posts.

You give the VOIP session the delay of a 256Kb (5ms) connection while only allocating an average bitrate of 64Kb, leaving more bandwidth available for applications that are not delay sensitive. VOIP is common today, I would not call it a "highly specialized situation".

Anyway, I doubt an HFSC misconfiguration could cause your problem, but removing all m1/d entries and using only m2 might be worth trying.

Hell if I know… :)