Traffic Shaping broke my LAN - topic has deviated

Harvy66

It seemed to work until I tried poking around in PFSense's web interface, then things would break, no response on the LAN port. I finally logged in after a reboot and removed all traffic shaping on the LAN, and I can now view the RRD without crashing my LAN port. Dear lord!

edit: I placed just a default queue and enabled codel for the time being for my LAN. My WAN's shaping seems to be just fine.

doktornotor

@Harvy66:

It seemed to work until I tried poking around in PFSense's web interface, then things would break, no response on the LAN port

What kind of poking? Are you repeatedly adding and removing shapers?

Harvy66

Just changing pages to other than the dashboard. RRD, System Logs, Traffic Shaper, etc. I was not making any changes during this time.

edit: Clicking on the RRD was a near instant loss of connection, but going to the Traffic Shaping gave me a few seconds before completely dying. I used this small window later on to remove all shaping on the LAN interface.

Derelict

I would take a copy of the current config then roll back to the backup/snapshot I took before messing with my shaper settings.

Then maybe diff the working config with the new-and-improved config to see where I screwed it up.

But it sounds like you might have some other coincidental failure of some kind.

Nullity

The only time the traffic-shaper has broke my LAN is when I would forget to change the LAN queue bandwidth limit from (k?)bit/sec to mbit/gbit… self-inflicted DoS. :)

Harvy66

@Nullity:

The only time the traffic-shaper has broke my LAN is when I would forget to change the LAN queue bandwidth limit from (k?)bit/sec to mbit/gbit… self-inflicted DoS. :)

For me, it actually affected other VLANs on the same physical interface.

Harvy66

I just had it happen again when I tried this setup

LAN - HFSC 95Mbit
–pInternet: Upper Limit 95Mb
----qHigh: LS: 49.42% 5ms 30.55% Queue 4069+Codel
----qNormal: LS: 30.55% Queue 4069+Codel
----qDefault: LS: 15.27% Queue 4069+Codel
----qACK: LS 0.01% RT 20%

After I hit save, I lost LAN access.

This is my current setup and it works just fine
LAN - HFSC 95Mbit
--qHigh: LS: 49.42% 5ms 30.55% Queue 4069+Codel
--qNormal: LS: 30.55% Queue 4069+Codel
--qDefault: LS: 15.27% Queue 4069+Codel
--qACK: LS 0.01% RT 20% Queue: 4096 FIFO

I would like to do more testing, but my wife has hit her limit.

My next plan is to backup my config and do a clean install, some time after 2.2.1. I also plan to repartition my SSDs to only be 20GB in size instead of the full 120GB, that way GEOM doesn't re-write the full 120GB every time it gets mad.

In case someone is wondering, I'm using the 80:20 rule, such that High and normal are 40% each and default is the 20%. But then I scaled them down so qHigh can have a 1.618x burst for 5ms, golden ratio, which is about 11,000 bytes worth. Plenty for a short burst of small packets.

Nullity

The only strange thing I see is the overly precise percentages.
Maybe try whole numbers for the percentages?

Nullity

@Harvy66:

In case someone is wondering, I'm using the 80:20 rule, such that High and normal are 40% each and default is the 20%. But then I scaled them down so qHigh can have a 1.618x burst for 5ms, golden ratio, which is about 11,000 bytes worth. Plenty for a short burst of small packets.

If you are only using HFSC for it's limiting properties (since it cannot prioritize incoming packets that have already arrived), why not switch to limiters?

The burst for qHigh is unneeded. You will only see imperceptible, sub-millisecond improvements. (~50Mbit vs ~30Mbit)
Isn't a 5ms burst of ~50Mbit equal to ~31,250 bytes, not 11,000 bytes?

Edit: I think the burst is copied to every stream in the queue (guaranteeing each stream's delay), but the m2 is applied to the entire queues average bandwidth. This is the only way it makes sense to me. If the m1/d params were applied to the entire queues delay, then delay would be dependant on the number of streams, which would preclude any guarantee of delay.

I mention that because m1 and d should never be calculated to be larger than MTU, according to my previous paragraph. I am still reading the referenced papers in the HFSC paper to find a definitive answer.

Harvy66

I updated my posts, those percentages were in reference to linkshare, only the one upper bound is used by the parent queue. My plan was to make the root 1Gb again.

"since it cannot prioritize incoming packets that have already arrived" - Actually, the only packets you can prioritize are ones that have already arrived. I want HFSC to artificially delay my bursty YouTube and Netflix streams and let my High priority stuff through.

The 5ms burst would be on the difference between the nominal and burst, so (0.4942-0.3055)95,000,0000.005/8=11,204.0625 bytes, which is almost 20 of the average packet size(576bytes). HFSC does everything on a curve, so I'm not sure if burst exactly works that way, but I figure it's close.

Nullity

@Harvy66:

"since it cannot prioritize incoming packets that have already arrived" - Actually, the only packets you can prioritize are ones that have already arrived. I want HFSC to artificially delay my bursty YouTube and Netflix streams and let my High priority stuff through.

~~Incoming WAN traffic will be shaped when leaving the LAN. A 100Mbit WAN link cannot saturate your 1Gbit LAN. Since QoS only serves a purpose on a congested link, QoS is useless on your LAN for incoming WAN traffic.

The only thing HFSC is accomplishing is rate limiting.

You could create limiters and allocate 40/40/20 of your incoming bandwidth and it would accomplish practically the same thing.~~

Edit: Disregard all this. I do not fully understand how incoming traffic-shaping works.

Nullity

@Harvy66:

The 5ms burst would be on the difference between the nominal and burst, so (0.4942-0.3055)95,000,0000.005/8=11,204.0625 bytes, which is almost 20 of the average packet size(576bytes). HFSC does everything on a curve, so I'm not sure if burst exactly works that way, but I figure it's close.

HFSC's m1 and d parameters achieve the same goal as the d-max and u-max parameters, which are the parameters used in the HFSC paper. The u-max parameter is the largest unit of work (aka packet size) and d-max is the guaranteed delay (aka milliseconds to fully transmit).

The paper mentions an example for an audio stream with 160byte packets;
u-max=160 bytes
d-max=5 ms
r=64 Kbps (r is m2)

Trasnslating this to m1/d/m2 notation is; (160bytes in 5ms is 256Kbps)
m1=256 Kbps
d=5 ms
m2=64 Kbps

My point is, m1/d seems to be single packet-size only, not multiple. Bursting 30Kbyte seems unneeded.
Be cautious with HFSC. Misconfigurations are rampant.

Harvy66

@Nullity:

@Harvy66:

"since it cannot prioritize incoming packets that have already arrived" - Actually, the only packets you can prioritize are ones that have already arrived. I want HFSC to artificially delay my bursty YouTube and Netflix streams and let my High priority stuff through.

Incoming WAN traffic will be shaped when leaving the LAN. A 100Mbit WAN link cannot saturate your 1Gbit LAN. Since QoS only serves a purpose on a congested link, QoS is useless on your LAN for incoming WAN traffic.

The only thing HFSC is accomplishing is rate limiting.

You could create limiters and allocate 40/40/20 of your incoming bandwidth and it would accomplish practically the same thing.

My ISP actually bursts 1Gb/s for tens of milliseconds regularly. This was more of an issue before my ISP did AQM, but it can cause packet-loss issues if I let the burst through. Google sets the receive window too large if I let the burst through at full rate.

@Nullity:

@Harvy66:

The 5ms burst would be on the difference between the nominal and burst, so (0.4942-0.3055)95,000,0000.005/8=11,204.0625 bytes, which is almost 20 of the average packet size(576bytes). HFSC does everything on a curve, so I'm not sure if burst exactly works that way, but I figure it's close.

HFSC's m1 and d parameters achieve the same goal as the d-max and u-max parameters, which are the parameters used in the HFSC paper. The u-max parameter is the largest unit of work (aka packet size) and d-max is the guaranteed delay (aka milliseconds to fully transmit).

The paper mentions an example for an audio stream with 160byte packets;
u-max=160 bytes
d-max=5 ms
r=64 Kbps (r is m2)

Trasnslating this to m1/d/m2 notation is; (160bytes in 5ms is 256Kbps)
m1=256 Kbps
d=5 ms
m2=64 Kbps

My point is, m1/d seems to be single packet-size only, not multiple. Bursting 30Kbyte seems unneeded.
Be cautious with HFSC. Misconfigurations are rampant.

You can also use the burst to mimic speedboost. My old ISP used to give a 100Mb "boost" for about 8 seconds, then return back to 30Mb. That would be similar to 100Mb 8000ms 30Mb HFSC setting.

A "burst" is really just a temporary change in ratios. It is more useful for smaller that tend to have strong bursting characteristics.

Nullity

@Harvy66:

You can also use the burst to mimic speedboost. My old ISP used to give a 100Mb "boost" for about 8 seconds, then return back to 30Mb. That would be similar to 100Mb 8000ms 30Mb HFSC setting.

A "burst" is really just a temporary change in ratios. It is more useful for smaller that tend to have strong bursting characteristics.

The HFSC paper never directly mentions using m1/d this way. Have you verified that HFSC can be used this way?

It might be a better idea to use upper-limit to achieve a "speedboost". (Though, remember, HFSC created the 2-part service-curve to explicitly decouple bandwidth and delay, not to achieve a "speedboost". Your 8second speedboost might be possible, but it may also be a misuse that will cause unforeseen problems.)

My criticism is an attempt to make sure you are configuring HFSC cautiously and avoiding potentially breaking HFSC/traffic-shaping by misconfiguring it.

Harvy66

M1 is a bandwidth setting, exactly the same as M2. Sum(max(m1,m2)) cannot be greater than 100% of the parent queue.

Nullity

@Harvy66:

M1 is a bandwidth setting, exactly the same as M2. Sum(max(m1,m2)) cannot be greater than 100% of the parent queue.

Yeah, they are both bandwidths, but m1 and d define delay while m2 defines average bitrate. Perhaps they can be used in other ways, but that may cause undefined results. I would configure HFSC conservatively and not make guesses.

Since HFSC is primarily meant to guarantee per-packet delay, the calculated byte-size of "m1 × (d ÷ 1000) × 8" should not greatly exceed the interface's MTU. (I hope I got that math right.)

I am still unclear how m1/d interact with link-share, since link-share makes no guarantees about bandwidth or delay. Link I mentioned earlier, if you must attempt the "speedboost", maybe use upper-limit by setting m1=50Mb, d=5, and m2=30Mb.

Maybe switch to a more conservative configuration and try just setting m2 and see if it fixes your broken LAN?

Harvy66

I would like to do more testing, but whit one machine, I can't afford the wife to get angry. I'm leaving it a lone for now.

I try not to care what HFSC does to individual packets, I only really care what it does on average. If I set a burst from 30% to 50% for 5ms, to me that just means for 5ms, the ratio of bandwidth between queues will be different. bursts are just ways to temporarily modify bandwidth with little change to the overall average.

Nullity

@Harvy66:

I would like to do more testing, but whit one machine, I can't afford the wife to get angry. I'm leaving it a lone for now.

I try not to care what HFSC does to individual packets, I only really care what it does on average. If I set a burst from 30% to 50% for 5ms, to me that just means for 5ms, the ratio of bandwidth between queues will be different. bursts are just ways to temporarily modify bandwidth with little change to the overall average.

Yeah… I use my internet gateway for testing purposes too. :(

You may be right about using bursts as speedboosts (increased average bitrate for all packets transmitted during a 5ms time-span, for example), but since the HFSC paper never explicitly uses it like that (m1/d is only used to modify delay, while m2 is the average), I think we need to do some testing to prove/disprove your theory.

It would be great if HFSC's m1/d could be used to modify the initial average bitrate of a connection/session, but I am obviously skeptical.

A quick thought experiment... your theory seems impossible for UDP since all individual UDP packets are their own individual transmission sessions (connectionless). If you employed m1/d like in your speedboost thoery, all UDP packets could send at 50Mbit for the first 5ms but must stay below a 30Mbit average. Since each individual UDP packet is a new session, and each new session gets an initial 50Mbit "speedboost" for 5ms, your average UDP bitrate could reach 50Mbit, because all UDP packets are transmitted during the "speedboost". The average bitrate could forever be above the m2 configured average bitrate of 30Mbit. Your "speedboost" theory breaks HFSC by causing m2 to be non-functioning.

The only way m1 and d function without breaking UDP sessions with HFSC is if m1 and d only modify individual packet transmission delay, and not the initial average session bitrate like in your "speedboost" theory.
I am pretty sure this applies to TCP as well, but I am not sure. The HFSC paper focuses on generic packet delay, never mentioning "TCP" and only mentions "UDP" once during a simulation of link-share's bandwidth sharing capabilities.

It seems like HFSC's m1/d params only control the transmission delay of generic packets/frames, and nothing else.

Any criticisms are welcome. These conversations help me understand HFSC.

Harvy66

UDP " stateless" as in the state must be handled by the application, not the protocol, but even PFSense has a notion of a UDP "State". UDP doesn't "burst". The most common usage of UDP is a flow of packets that are sent at standard intervals or event based. Most UDP flows do not "burst" in the sense of TCP trying to send data, UDP just blindly sends data with no concern for congestion.

When UDP "bursts", it does so only in the sense that many packets so happened to arrived at the "same time", in some sense of the term "same time'.

And HFSC doesn't burst based on new sessions, it bursts based on the current usage of the queue. If a queue has 30Mb allocated, but only 15Mb is being used, but suddenly a bunch of traffic arrives "at the same time", HFSC will temporarily allow a higher amount of bandwidth through that is above the 30Mb. I think the burst is inverse related to the current usage of the queue, such that if the queue is at 100% usage, the burst will be 0% of the burst amount, but if you're at 50% usage, the amount of burst will be 50% of the assigned amount. Without knowing the exactness of how the burst works, I cannot know how it will function at the micro time scale.

The best way to think about HFSC is not in bandwidth, but in ratios. Actual packet scheduling is based on the current ratios and packet ordering is not part of the algorithm. HFSC decouples bandwidth and latency such that if all queues are at 100% utilization, regardless of the amount of bandwidth assigned, they will all experience the same latency. While latency for all queues are equal, the ordering of the packets are not guaranteed, unless you use the priority option, but from the sounds of it, it makes little difference.

Derelict

Connectionless != Stateless.