Modding Sideout's Lan Party config for home use

Advil000

First of all thanks to Sideout for the excellent lan party config. This is just a minor tweak of it for home use.

I'm a total pfSense newbie, but needed to make the leap to Traffic Shaping because at my house family uses 4-5 computers/devices at a time all night, with Netflix running non-stop along with gaming, web surfing and occasional random file downloads and torrents. I just couldn't keep the latency, buffering and packetloss under control on the gaming computers with the mostly broken QoS on my ASUS router, and my ISP (Frontier) is certainly not getting it right when I do nothing.

Thanks to being able to read through Sideout's rules and xml files I was able to read the guides for pfSense traffic shaping and somewhat figure out how that applies to my actual need.

The goal is to have traffic shaping for a busy home network that prioritizes gaming absolute first, streaming second, and everything else third.

My current config is attached for review and in case it helps anyone else.

If you see something wrong, or know how to make it better, speak up. That's what this is all about.

Here are the changes made so far:

Skipped using your system xml completely. Not critical for home use or home-sized LAN party use as long as you have a default pfSense install. If I'm wrong and there is something critical I missed in his system xml, let me know.
Removed the limits for TCP file transfers and the block for Torrents.
Set the up and down speeds for my internet connection in the WAN queue and qInternet queue.

// the next steps make the rules more friendly for general home use, letting the connection get closer to full usage for web traffic and downloads but still keeping ACKs and gaming packets the absolute highest priority.

raised the m2 linkshare of qDefault to 10% and m2 upperlimit to 40%.
set the m2 linkshare of qHTTP in both WAN and LAN to 20%.
raised the m2 linkshare of qCatchAll to 25% and m2 upperlimiit to 65%.

I think that was all the changes.

One thing that's been really nice is the ability to watch the Queues in the Status menu in 5 second real time and see the packets per second and bandwidth usage of each queue and check which queues are dropping packets. The queue breakdown that this setup uses lets you get a pretty good realtime picture of what is happing on the network with just a glance.

PS - I may still need to make a specific queue for torrents depending on how that works out long term. I havn't pushed this config with a ton of torrenting yet. It may need more specific control of that.

Also, if you are new like me, remember to set your up and down speeds below the worst you get during the busy part of the evening. Run speed tests during the worst part of the night, subtract about 10% from the up and down of your worst result and use that as a starting point. In my case, I pay for 50/50. In reality, a typical evening is 50/25. To make sure I stay inside of the worst results, I use 45/20 right now for my traffic shaper settings. You can play around with getting some of it back later, but if you don't set it low enough, traffic shaping won't work at all when you really need it and there's no point to any of this.
pfsense_trafficshape_gaming_centric.zip

Harvy66

I'd recommend enabling Codel on the child queues.

Nullity

@Harvy66:

I'd recommend enabling Codel on the child queues.

That or just the bulk/default queue. Depends on the setup.

Harvy66

Well, any queue where the total bandwidth used is greater the available bandwidth.

If I remember correctly, you found out the hard way that Codel does not like really low bandwidth situations. I think about 1Mb/s is were it really starts to get really bad and ideally not less than 5Mb/s.

Nullity

@Harvy66:

Well, any queue where the total bandwidth used is greater the available bandwidth.

If I remember correctly, you found out the hard way that Codel does not like really low bandwidth situations. I think about 1Mb/s is were it really starts to get really bad and ideally not less than 5Mb/s.

The official bufferbloat site says problems were experienced with low bandwidth (~3mbit?) connections, but Codel seems to work reliably with my ~650Kbit connection, which surprises me. My queues are obviously even lower bandwidths than my link's 650Kbit, but only the default/bulk queue has Codel enabled. It isually has a queue depth of 1-4 packets, increasing my ping from ~10ms to <45ms during full saturation uploads.

Whether that the above is a good or bad example of Codel performance, I dunno. The mixture of Codel and HFSC is mostly undocumented, so maybe HFSC improves Codel's low BW performance?

Advil000

And so we move directly from known territory into the unknown experimental. :)

I first enabled codel on the parent queues, but that didn't really tell me anything, as it destroys all the child queues when you do this. Codel is a "set it and forget it" thing so there would be almost nothing I can report back here with other than "it seems to be better/worse."

So I restored the config and this time enabled codel on each child queue. All of them for now, to avoid confusion.

My bufferbloat test on dslreports http://www.dslreports.com/speedtest# is now A or A+… (It was a D before with only HFSC) even WITH netflix running in the background during the test. Which is nothing short of amazing.

I opened up the queue monitor in status and watched the queues during the tests... it appears that the queue rules are still working.

The problem is, I can't find any documentation on doing this, or combining HFSC and codel like this. I can guess that the queues are still prioritized and codel does it's thing after the general priorities are applied but I don't have the knowledge to confirm this.

Speed tests still appear to show bandwidth limits applied...

So far, it seems a bit too good to be true. But I've only had it set like this for a few hours. Need to do some more reading on codel, but it's still not really going to give me any idea what pfSense is doing with this arrangement.

Harvy66

HFSC schedules when the packets get dequeued and put on the line, CoDel just decides which packets to drop. I've also noticed that HFSC "smooths" out bandwidth on its own. without it, I get very tall sharp spikes and 99.5Mb/s average. With HSFC set to 99Mb/s, I get small rounded spikes and an average of about 98.5Mb/s.

I also love Codel, and it's not even the best one. Without it, I get about 30ms-40ms of bufferbloat, about an A, on dslreports' speedtest. With Codel, it's nearly 0ms.

Here's 3 images. One withone with HFSC, another with HFSC+Codel, and no shaping

BloatDetail32x.png_thumb

PostCoDelChangeDetail.png_thumb

NoShaping.png_thumb

Nullity

The smoothing is most likely caused by the rate-limiting of the Token Bucket Regulator that ALTQ uses with all queueing disciplines, rather than HFSC itself. Reference: http://www.iijlab.net/~kjc/software/TIPS.txt (author of ALTQ)

@Harvy66, Could you post a pic of your results testing a Codel-only configuration ("CODELQ") rather than Codel as a sub-discipline of HFSC?

Harvy66

I did several test. You can see the settings in the file names. The images I linked before were older. Since then, my ISP has changed routes a bit. The first hop to Level 3 periodically changes. All tests had nearly identical bandwidth, it's just the bloat that varied a lot. All tests were 32 streams down, 24 streams up.

edit, I wonder if I somehow didn't enable FairQ on the LAN interface for the one test. The bloat looks like the unshaped. Meh. Anyway, the upstream looks excellent, and it only cares about bandwidth, just like CodelQ.

32d24uNoShaping.png_thumb

32d24uCodelSched.png_thumb

32d24uFairQSched.png_thumb

32d24uHFSC.png_thumb

32d24uHFSCCodel.png_thumb

Advil000

I've got a couple more basic questions:

let's take the qGAMES queue on the WAN side as an example.

bandwidth: 37.5% (this is a percentage of the bandwidth specified in the parent queue… in this case the WAN)

service curve (sc): real time m2 set to 25%.

The question - is that 25% of the 37.5% specificed for this child queue only? Or is it 25% of the parent WAN queue? Anyone know for sure?

The second question is more tricky.

Firewall rules. Lets say we are dealing with an ordinary floating rule to forward ports for a game.

Under advanced features, Ackqueue/Queue

We set the right side, Queue to whatever priority queue we want. Usually the highest priority qGames queue which has a priority of 7 (highest).

The question - what does the left side "Ackqueue" actually do? Most games are UDP and don't have TCP ACK packets as such. But if the game did use TCP ACK packets, could an extra ACK queue be made, that's priority 7 rather than than other general qACK queue that's priority 6? That would allow TCP ACK packets for certain ports to be higher priority than others if it works that way. Or am I totally misunderstanding what this setting does?

Nullity

@Advil000:

I've got a couple more basic questions:

let's take the qGAMES queue on the WAN side as an example.

bandwidth: 37.5% (this is a percentage of the bandwidth specified in the parent queue… in this case the WAN)

service curve (sc): real time m2 set to 25%.

The question - is that 25% of the 37.5% specificed for this child queue only? Or is it 25% of the parent WAN queue? Anyone know for sure?

The second question is more tricky.

Firewall rules. Lets say we are dealing with an ordinary floating rule to forward ports for a game.

Under advanced features, Ackqueue/Queue

We set the right side, Queue to whatever priority queue we want. Usually the highest priority qGames queue which has a priority of 7 (highest).

The question - what does the left side "Ackqueue" actually do? Most games are UDP and don't have TCP ACK packets as such. But if the game did use TCP ACK packets, could an extra ACK queue be made, that's priority 7 rather than than other general qACK queue that's priority 6? That would allow TCP ACK packets for certain ports to be higher priority than others if it works that way. Or am I totally misunderstanding what this setting does?

If you want to know the finer details of HFSC, refer to the official white-paper or my HFSC Explained thread, which has a good selection of links & examples.

IIRC, link-share takes from the parent queue while real-time takes from the root or link. I do not fully understand what you are asking though.

HFSC has no numeric prioritization. The priority parameter in the GUI is a left-over remnant of other schedulers.

Priority queuing is different from HFSC. The pfSense wiki/FAQ explains the problems with priority queueing (tldr; uncontrollable starvation of lower priorities).

The ACK queue thing is a built-in feature of "pf", which intelligently prioritizes related ACK packets. UDP, I assume, just stays in the normal queue.

Harvy66

What Nullity said.

I prefer to not even use realtime because you actually get punished if you got over the bandwidth allotment. you can also have strange issues, like realtime always take precedence for scheduling, so a child queue with more realtime bandwidth than the parent queue can cause the parent queue to get starved.

The main usefulness of realtime seems to be if you want a queue to be a child queue and you want its realtime bandwidth to count against a parent queue. In my limited testing, it gave no benefit and just complicated things. I saw no difference between a queue having realtime or linkshare. The bandwidth was always properly distributed and my pings stayed low.

Example

root 100Mb
-qDefault 99.9Mb
-qICMP 64Kb

Even while maxing out my connection, my ICMP pings to the Internet had less than 0.25ms of jitter and 0% loss, exactly the same as if my connection was 100% idle. No realtime required. And if I removed the qICMP, suddenly my pings had a measurable increase in jitter and minor loss.

You do need to start to worry about more advanced HFSC settings if you have a low bandwidth, sub-5Mb/s connection, because of the ratio of the MTU to the bandwidth.

Nullity

The original HFSC implementation only had a single parameter that was simultaneously setting both real-time and link-share to the same value. Later, link-share & real-time were split apart, allowing different values for each (and further confusing non-academians).

So, setting ls & rt to the same is technically fine. I think rt only applies to leaf queues, so it would make sense to only use it there. Also, ls cannot be over-used, while rt can.

Honestly, just use CBQ. The superman who programmed ALTQ even says so. HFSC is usually not worth the effort.

Harvy66

I think we need an optional simplified HFSC interface that only exposes linkshare, nothing else.

The one thing that I don't like about CBQ is it couples bandwidth and delay. If a queue has less bandwidth, it gets a higher delay, even when the link is otherwise idle, and worse when under load. HFSC does not have this issue. CBQ uses strict priority scheduling while HFSC uses service curves.

"…because classes are selected using service curves instead of static priorities".

" However, an in-depth analysis of CBQ showed several problems, most noticeably that link-sharing and delay are not well decoupled (high priority sessions may also get more bandwidth)"

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118.663&rep=rep1&type=pdf

Nullity

With HFSC, unless you use m1 & d (which nobody does, or if they do, they do it wrong), bandwidth & delay are coupled, just like CBQ.

HFSC is marginally better, but most people go wild and over-configure it into a mess.

We really need to start an evidence-based traffic-shaping best practices thread… Too much god-damn theory flying around here.

Harvy66

@Nullity:

With HFSC, unless you use m1 & d (which nobody does, or if they do, they do it wrong), bandwidth & delay are coupled, just like CBQ.

HFSC is marginally better, but most people go wild and over-configure it into a mess.

We really need to start an evidence-based traffic-shaping best practices thread… Too much god-damn theory flying around here.

They're two different types of coupling that are distinctly different.

Coupling caused inherent limits of bandwidth. If you only have 1Mb of bandwidth, there is a certain limit on how low your latency can get under load (read: laws of nature)
Coupling caused by scheduling. Additional latency over #1 because the algorithm has extra overhead

P.S. When load is low, #1 asymptotically approaches link rate delay. As load increase, delay quickly approaches the natural bandwidth delay.

HFSC has very low #2, CBQ has a lot of #2.

I read of an issue with CBQ and priority based schedulers in general. If a low priority flow is suddenly starved of bandwidth because a higher priority flow kicks in, the low priority flow gets a backlog of packets. If the higher priority flow suddenly drops out, CBQ will suddenly burst all of those backlogged packets. HFSC on the other hand uses a service curve and ramps up the packets.

In simple testing, CBQ has been known to increase the likelyhood of lower priority flows having increase packetloss in their paths because of this burstiness.

edit: HFSC allows you to decouple latency and bandwidth in #1, which is the harder problem.

Harvy66

I just thought of a simple math example of how HFSC is inherently algorithmically "decoupled".

With 64Kb/s of bandwidth assigned to ICMP, it should take 8ms on average to send 64bytes of data. When I ping during download and upload saturation, my pings are still 0.014ms(exactly the same if all shaping is disabled and the link is idle). My ping is about 1/550th of the expected value. That assumes that 64Kb queue is otherwise idle. If I do a ping flood, it maxes out at 64Kb/s and suddenly the ping and jitter skyrockets.

Nullity

This is the wrong thread for this back and forth, Harvy66. Please take your opinions to my HFSC thread or another one.

I would have hoped my numerous exhaustively researched posts regarding HFSC would hinder you from arguing with me at every turn. Seriously… :o

What is "decoupled" is bandwidth & delay only. This is defined precisely in the HFSC paper. What you talk about is something different.

Harvy66

You're the one who started it. /semi-sarc I showed documented proof that CBQ is worse than HFSC when it comes to delay and bandwidth coupling. I was using the CBQ definition of "coupling" or "decoupling", not your HSFC version. Remember, words have different meaning in different contexts, even extremely similar contexts with extremely similar usages. Context nuances are important.

I do concede that CBQ is easier to use(fewer options) and will agree that if even if using simple HFSC settings is too much, CBQ is good enough.

P.S. I am just saying I think is true, but you may also do the same.
P.P.S Nullity has properly corrected me on several occasions, which forced me to do more digging and correct myself. And I thank him for that.

Nullity

@Harvy66:

You're the one who started it. /semi-sarc I showed documented proof that CBQ is worse than HFSC when it comes to delay and bandwidth coupling. I was using the CBQ definition of "coupling" or "decoupling", not your HSFC version. Remember, words have different meaning in different contexts, even extremely similar contexts with extremely similar usages. Context nuances are important.

I do concede that CBQ is easier to use(fewer options) and will agree that if even if using simple HFSC settings is too much, CBQ is good enough.

P.S. I am just saying I think is true, but you may also do the same.
P.P.S Nullity has properly corrected me on several occasions, which forced me to do more digging and correct myself. And I thank him for that.

CBQ, in any implementation prior to HFSC, had no mention of "decoupling" or "coupling". There is no “CBQ definition of 'coupling' or 'decoupling'”, as you put it, as CBQ is wholly unaware. Post a link to any paper that implemented any CBQ algorithm with an understanding of decoupling bw & delay. If you cannot find one, please edit your posts to remove the misinformation.

No 30-page anecdotes. Link or stfu.