Traffic Shaping Worse Than Baseline?

KOM

Maybe I'm too much of a pragmatist, but if you're getting perfect performance under load without a shaper, then why are you playing with the shaper in the first place?

CaptainElmo

Even with perfect baseline performance I figured it couldn't hurt to reorder packets based on priority anyway. Now it's become a mission of understanding why doing this is having such unexpected results. While I may not need shaping now it could become an issue in the future and I would rather struggle with understanding now when it's not a critical issue.

KOM

I wish you well. I tried to wrap my head around shaping when I first started using pfSense. I gave up and got something working that's serviceable. Harvey is one of, if not the, best shaping guys here so I'm sure that you will eventually get to the bottom of it.

CaptainElmo

Thanks KOM. At this point I'm just hoping to make sense of the results I'm seeing. Based on these results I would be uncomfortable applying traffic shaping when it eventually becomes necessary.

My LAN is a gigabit network, my pfSense box is mid-range (Netgate's new 4860), and my WAN link is a gigabit fiber link which the ISP is limiting to 100 Mbps.

Gigabit LAN -> pfSense with enough horses -> Gigabit WAN with 100 Mbps limiter

In this case it makes sense to me that the raw components with default FIFO packet ordering would work just fine.

What doesn't make sense to me is why adding PRIQ shaping to pfSense would induce a huge spike in latency between LAN and WAN. pfSense runs about 50% CPU and 25% memory in that situation, so it wouldn't seem to be a hardware limitation causing it and the wires on both ends are still the same.

Are there any tests I can run to help narrow this down more?

Harvy66

@CaptainElmo:

What doesn't make sense to me is why adding PRIQ shaping to pfSense would induce a huge spike in latency between LAN and WAN. pfSense runs about 50% CPU and 25% memory in that situation, so it wouldn't seem to be a hardware limitation causing it and the wires on both ends are still the same.

CPU jumps to 50% when PRIQ is enabled and only 100Mb/100Mb connection?

Since PRIQ does not seem to work, want to give HFSC a try? Nothing crazy, just a basic setup. Stick with your current queues as they are, just need to make some HFSC related changes.

Also, do you have your bandwidth limit set on your LAN interface also? You can shape your download, it's just not as effective as shaping your upload, but it does still help. You may also need to waste a bit more bandwidth to gain more control.

Derelict

When I'm having trouble with shaping I always set way below expected throughput - 25 - 50% of expected. That way I KNOW I am the one doing the queuing and dropping. When you have the shaper working how you like it, you can just increase it until you start seeing ISP drops, then back it off.

CaptainElmo

@Harvy66:

CPU jumps to 50% when PRIQ is enabled and only 100Mb/100Mb connection?

Yes - it does this without shaping enabled as well. I also have Suricata running and I assumed it was the culprit here. In any case the CPU doesn't seem to be a limiting factor in this particular test scenario.

@Harvy66:

Since PRIQ does not seem to work, want to give HFSC a try?

Yes, I'm willing to try anything. If you'll lead the way I'll gladly follow. I'm not familiar with the HFSC settings so I'll need some guidance there.

@Harvy66:

Also, do you have your bandwidth limit set on your LAN interface also?

Aha - the bandwidth was set to 10 Mbps on the LAN interface. I bumped that to 990 Mbps and the latency spikes are now down to only 1,000 ms or so.

@Harvy66:

You can shape your download, it's just not as effective as shaping your upload, but it does still help. You may also need to waste a bit more bandwidth to gain more control.

Roger that. When the time comes that shaping is a necessity this will be an acceptable trade-off for more control. Ideally I would like to gain the necessary understanding now rather than later when I'm under pressure to keep everything running smoothly.

Harvy66

Since you're not actually doing any download shaping, just don't setup anything on your LAN.

On you WAN, set to HFSC, then go to each of the queues and instead of them being "priority", use the Bandwidth field. Don't worry about the three lower fields, leave those unchecked/set. HFSC is all about shaping bandwidth, not priorities. Figure out how much bandwidth you need or just set your "high priority" queues to have more bandwidth. You bandwidth can't be greater than the interface's total bandwidth. I prefer just to use percentages instead of actual bandwidth figures.

You can just throw ballpark figures at your low bandwidth queues. VOIP should be quite low compared to your 100Mb connection, so I wouldn't worry about giving it too much bandwidth. Unused bandwidth will get fairly distributed among the other queues.

Some examples
ICMP: Bandwidth 1%, Codel Active Queue
ACK: Bandwidth 20%, Codel Active Queue
VOIP: Bandwidth 20%, Codel Active Queue
Default: Bandwidth 59%, Codel Active Queue

of course you had some other queues, you decide how you want the bandwidth distributed. Remember, just concern yourself with if your connection is maxed out and all of your queues are also maxed out, how would you want your bandwidth distributed.

Another general rule of thumb. 80% is 100%. If you need 1Mb of bandwidth, then give a queue at least 1.25Mb of bandwidth.

You may want to disable Suricata during performance testing if you're having performance issues. Always reduce the number of variables when attempting to debug.

I chose 20% for ACK only because I've seen it recommend by a trust worthy source. I don't quite agree with it since ACK in my network have a 20:1 ratio, placing it at only 5%, but I'm not concerned because unused bandwidth gets distributed.

Nullity

You are having trouble with traffic-shaping because you do not need it, as KOM said, I think.
HFSC is for people with numerous, assumedly life-or-death services that all need to get their guaranteed service levels (bandwidth and/or latency guaratees). Most people really only have a VIP traffic, Penalized traffic, and Bulk traffic which can be handled just fine with PRIQ or CBQ (or FAIRQ).

Traffic-shaping is rarely a good idea unless you need to FIX a PROBLEM. If you are just looking to get web-pages loaded faster, other avenues are much easier and more effective. Also, you cannot prioritize traffic without resultingly deprioritizing other traffic, so… Unless you NEED something, let it go for now.

Harvy has some goodish hints and theories, but without packet/traffic captures to show before and after, be wary.
Only by monitoring your traffic, will you know where the problems are, if there are any...
With my ADSL 15mbit I allocated 300Kb for ACKs because that is what I observed with pftop and other monitors. Each person will have their own ACK ratio.

Unless you have a precise goal and a solid grasp of internetworking, so that you know what to expect, there's only going to be confusion and haphazard trial and error. That is what happened to me...

So, if you have no reason for traffic-shaping, just don't. Read up first, if you must. For example, The Book of pf, Network flow analysis, Practical Packet Analysis, and of course the Computer Networks by Tanenbaum. Knowing how to fix a network is different from bumbling around hoping you will learn, while screwing up your network and thinking you are actually improving it.

If you are actually just interested in fixing the bufferbloat, I would find a Linux-based router OS, as Linux has been where every new scheduling algorithm has been implemented, for like 10+ years, and it has the modern CoDel with Fair-Queueing. The CoDel people are now working on Cake, which is quite a bit more ambitious than CoDel.

P.S. With CoDel enabled on my WAN traffic, I see 0 to 4 packets in the queue, regardless. Users do not control the queue depth, CoDel does by constantly measuring the packet sojourn timing.

CaptainElmo

HFSC appears to be doing the trick! Initial tests show all parameters at expected values during testing - even with Suricata running. For thoroughness I disabled Suricata for a couple tests - with and without shaping - and the CPU stayed below 10% in all cases without Suricata's overhead while still hitting 50% in all cases with Suricata enabled.

CaptainElmo

Nullity - thank you for the excellent feedback. I agree with you wholeheartedly.

I do not plan to use traffic shaping until it is actually needed, but before reaching that point I need to get my brain around how it works in pfSense. During this practice I discovered that simple PRIQ queues induce huge unacceptable spikes in latency for reasons which are still unexplained. Harvey's suggestion to use HFSC seems to work though, so now when I need traffic shaping I will better understand the appropriate way to implement it without doing more harm than good to my network.

Thank you again for all of the excellent advice. The more I interact with the pfSense ecosystem the more impressed I am.

Harvy66

@Nullity:

HFSC is for people with numerous, assumedly life-or-death services that all need to get their guaranteed service levels (bandwidth and/or latency guaratees). Most people really only have a VIP traffic, Penalized traffic, and Bulk traffic which can be handled just fine with PRIQ or CBQ (or FAIRQ).

Even without all of the special features of HFSC, even if all you use is the bandwidth fields, HFSC is still superior to PRIQ or CBQ when it comes to fairly distributing bandwidth while maintain tight scheduling. It's like comparing an electric motor to a combustion engine, even if you don't enable regenerative breaking, it's still better.

@Nullity:

Traffic-shaping is rarely a good idea unless you need to FIX a PROBLEM.

You can't go wrong with "don't fix what aint broken". But also good to learn new stuff that may become relevant as long as you're not harming production while testing.

@Nullity:

Harvy has some goodish hints and theories, but without packet/traffic captures to show before and after, be wary.

Nullity makes a good point. I am not 100% confident, but I have had very good results on my network and I am confident enough with my reasoning to give ideas but not say "this is how you fix it". I think they should work most of the time, but I do not have a comprehensive background in these issues. I have good intentions, but take what I say with a grain of salt.

@Nullity:

Unless you have a precise goal and a solid grasp of internetworking, so that you know what to expect, there's only going to be confusion and haphazard trial and error. That is what happened to me…

I think what he's getting at, is it's a great way to learn on your own network. Empirical evidence is important with debugging problems and solving specific problems is better than taking guesses, unless something you're doing is a general recommendation. One thing that may be worth trying and should be dead simple is using FAIRQ. I haven't used it myself, so I have no idea of its jitter characteristics or fairness under many flow high loads, but in theory it should be "set your interface bandwidth", and that's all. fq_Codel or Cake would be a future replacement for FAIRQ, whenever those get implemented.

I still have no idea why PRIQ was giving such an issue with performance. In theory, VOIP should have been perfect since it was the highest priority.

I am curious as to what kind of loads you have tested and what kind of jitter, loss, latency, bandwidth you're seeing. On my network, I get pretty much exactly what I expect, but I have a strange situation of very low pings, low loss, and my ISP uses an AQM. Yeah, boo hoo, I have great Internet.

CaptainElmo

@Harvy66:

I still have no idea why PRIQ was giving such an issue with performance. In theory, VOIP should have been perfect since it was the highest priority.

The main issue is that PRIQ is inducing huge latency spikes between LAN and WAN. Even though the VOIP packets are being dropped on the wire first they are still subject to the high latency of getting there in the first place. I still have no idea why PRIQ is causing these latency spikes.

@Harvy66:

I am curious as to what kind of loads you have tested and what kind of jitter, loss, latency, bandwidth you're seeing.

I do what I can to fill up the WAN pipe from the LAN interface - mostly simultaneous speed tests and FTP transfers to/from a dedicated server on the other end of the WAN pipe. The pfSense traffic graph shows constant saturation on the WAN interface during these tests.

When the WAN pipe is saturated PRIQ is inducing latency of 2-4 seconds with up to 20% packet loss according to the apinger stats. While I don't fully trust the apinger service, I am able to confirm noticeable (to the ear) VOIP issues coinciding with the high latency being reported by apinger.

The WAN pipe is a dedicated gigabit fiber circuit which the ISP limits to 100/100 Mbps. If I saturate it without traffic shaping it remains rock solid with no movement in latency and not a single lost packet. No matter what I try I am not able to make the WAN link blink at all. If I'm not mistaken that means the WAN link itself can be ruled out as the source of high latency and packet loss.

CaptainElmo

Is any part of the PRIQ queue processing offloaded in a manner which HFSC is not? Could there be a situation where I am hitting processing limits of an offloaded resources which are not reported as part of the main CPU statistics?

Harvy66

I don't think the shapers use any offloading, but they make use of certain driver features

Nullity

@CaptainElmo:

Is any part of the PRIQ queue processing offloaded in a manner which HFSC is not? Could there be a situation where I am hitting processing limits of an offloaded resources which are not reported as part of the main CPU statistics?

The CPU needed for any sched algo will be minimal. Elegance and efficiency are perhaps more important than actual scheduling capability (Stochastic Fair Scheduling, for example). HFSC, perhaps the most complex and CPU intensive, was capable of 80,000+ packets per second on a 200Mhz Pentium Pro.