Traffic Shaping Worse Than Baseline?



  • Hello everyone.

    I'm having a situation which does not make sense to me, and I'm hoping someone can explain what I'm missing. I'm using pfSense 2.2.4.

    My baseline (traffic shaping disabled):
    100 Mbps synchronous WAN pipe
    0.5 ms latency from pfSense box to WAN gateway
    speed test to known server at other end shows 99/99 Mbps consistently.

    At baseline I can saturate the pipe with a speed test and VOIP still works flawlessly. The latency remains rock solid at 0.5 ms.

    So I figure adding some PRIQ shaping couldn't hurt, but the results I'm getting are indeed worse and I can't figure out why.

    Queues (simple PRIQ):
    LAN: Priority 4, Codel Active Queue
    ICMP: Priority 5, Codel Active Queue
    ACK: Priority 6, Codel Active Queue
    VOIP: Priority 7, Codel Active Queue

    With the LAN queue set to default buffer length (50) a speed test will result in some dropped packets from the LAN queue and latency that jumps into the hundreds of ms. VOIP performance is usable albeit noticeably worse. No queue other than LAN shows any dropped packets.

    Bump the LAN queue buffer length up to 5000 and things get even worse. There are no more dropped packets in the LAN queue but latency jumps over 1,000 ms. VOIP performance is awful.

    Drop the LAN queue buffer down to 5 and things swing the other way - latency stays under 10 ms but the LAN queue has a lot of dropped packets and the throughput drops to around 80 Mbps.

    These results are all similar when using Random Early Detection instead of Codel Active Queue.

    What doesn't make sense to me is why the complete lack of traffic shaping works so much better in my situation than simple prioritization shaping. Is this an expected outcome? Have I missed something somewhere?

    I can leave traffic shaping disabled since things appear to be working fine without it, but I would at least like to understand why my traffic shaping configuration is producing noticeably worse results than no shaping at all.

    Thanks in advance.



  • For me, changing the queue length of CoDel as a child discipline makes no difference.

    I use HFSC+CoDel and everything works perfectly. I get my full bandwidth and my latency stays low. Either something is misconfigured or PRIQ doesn't work well for your traffic patterns.

    I do find it strange that your highest priority queue(VoIP) is having issues. PRIQ is strict priority and should always give VoIP up to all of your bandwidth. Are you able to test packet-drops or jitter across your VoIP queue? Have you checked your Queue Stats?

    And yes, a queue size of 5 is way too small. I get back-to-back 1Gb line rate bursts from my ISP even though I have a 100/100 connection.

    Perhaps some screenshots of your interfaces and your queues configurations

    Stuff like this




  • I tried CBQ but it didn't appear to make any difference over PRIQ. All I'm really wanting to do is ensure higher priority packets are put on the wire before lower priority packets, so PRIQ seems like the right choice for me….no?

    I can confirm that there were no packet drops from the VOIP queue and that the VOIP queue is indeed capturing the correct VOIP packets. My VOIP equipment has an adaptive jitter buffer that smooths jitter up to about 600ms, but above 600ms any jitter can become very noticeable. I call into my PBX from an outside line and listen to the on-hold music to confirm if I can hear any noticeable jitter on the line while doing speed tests across the WAN link, and the noticeable jitter corresponds directly with the increased latency between pfSense and the WAN link. Once the latency spikes I don't think it matters that the VOIP packets are being prioritized to the front of the line - the induced latency itself is causing the jitter.

    Attached are screen shots of my PRIQ config. All of the queues are configured the same with the exception of the name and priority, and qDefault obviously has the Default queue option checked (qDefault is priority 1). I can confirm that all packets are ending up in the correct queues as expected.

    Am I correct that PRIQ should just be re-ordering the packets dropped on the wire based on queue priority? If I am correct there, then why would doing this result in such severe spikes in latency? Remember the unshaped FIFO traffic causes zero movement in latency when the WAN link is saturated. This has me stumped for an explanation.






  • You have your full bandwidth defined for your root queues.  Shapers require less, like 90-95% of your real-world bandwidth.  I don't think that's your problem, but I thought I would mention it.



  • Ok - I reduced the full bandwidth on the root level to 90 Mbps and did some more tests. Latency now spikes up to almost 4 seconds (!) and the upload throughput is reduced to just under 90 Mbps (as expected since the root level total bandwidth is now set to 90 Mbps?).

    With no shaping and FIFO packet ordering I still get 99/99 throughput and 0.5 ms latency.

    This makes no sense to me.



  • I manage a corporate network with VoIP phones and all the mustard & relish you can handle.  I use PRIQ.  My phone performance is good in that quality is good and nobody is complaining.  I see lots of drops in my non-critical queues.  I don't use Codel.  How are you measuring?



  • I'm watching the queue stats on the Status->Queues page and watching other system stats (apinger latency, CPU/memory usage, etc) on the main dashboard page. I call into my PBX from an outside line and listen to the on-hold music while performing tests to see if there is any noticeable jitter.

    Noticeable jitter does correspond to the latency spikes being reported by the apinger service when running speed tests on the WAN link.

    With traffic shaping disabled I get seemingly perfect results under stress. With PRIQ traffic shaping enabled the latency spikes under stress and causes noticeable jitter on the VOIP lines.

    I'm sure I must be doing something wrong because this makes no sense. I just can't figure out where I've gone wrong…



  • Maybe I'm too much of a pragmatist, but if you're getting perfect performance under load without a shaper, then why are you playing with the shaper in the first place?



  • Even with perfect baseline performance I figured it couldn't hurt to reorder packets based on priority anyway. Now it's become a mission of understanding why doing this is having such unexpected results. While I may not need shaping now it could become an issue in the future and I would rather struggle with understanding now when it's not a critical issue.



  • I wish you well.  I tried to wrap my head around shaping when I first started using pfSense.  I gave up and got something working that's serviceable.  Harvey is one of, if not the, best shaping guys here so I'm sure that you will eventually get to the bottom of it.



  • Thanks KOM. At this point I'm just hoping to make sense of the results I'm seeing. Based on these results I would be uncomfortable applying traffic shaping when it eventually becomes necessary.

    My LAN is a gigabit network, my pfSense box is mid-range (Netgate's new 4860), and my WAN link is a gigabit fiber link which the ISP is limiting to 100 Mbps.

    Gigabit LAN -> pfSense with enough horses -> Gigabit WAN with 100 Mbps limiter

    In this case it makes sense to me that the raw components with default FIFO packet ordering would work just fine.

    What doesn't make sense to me is why adding PRIQ shaping to pfSense would induce a huge spike in latency between LAN and WAN. pfSense runs about 50% CPU and 25% memory in that situation, so it wouldn't seem to be a hardware limitation causing it and the wires on both ends are still the same.

    Are there any tests I can run to help narrow this down more?



  • @CaptainElmo:

    What doesn't make sense to me is why adding PRIQ shaping to pfSense would induce a huge spike in latency between LAN and WAN. pfSense runs about 50% CPU and 25% memory in that situation, so it wouldn't seem to be a hardware limitation causing it and the wires on both ends are still the same.

    CPU jumps to 50% when PRIQ is enabled and only 100Mb/100Mb connection?

    Since PRIQ does not seem to work, want to give HFSC a try? Nothing crazy, just a basic setup. Stick with your current queues as they are, just need to make some HFSC related changes.

    Also, do you have your bandwidth limit set on your LAN interface also? You can shape your download, it's just not as effective as shaping your upload, but it does still help. You may also need to waste a bit more bandwidth to gain more control.


  • Netgate

    When I'm having trouble with shaping I always set way below expected throughput - 25 - 50% of expected.  That way I KNOW I am the one doing the queuing and dropping.  When you have the shaper working how you like it, you can just increase it until you start seeing ISP drops, then back it off.



  • @Harvy66:

    CPU jumps to 50% when PRIQ is enabled and only 100Mb/100Mb connection?

    Yes - it does this without shaping enabled as well. I also have Suricata running and I assumed it was the culprit here. In any case the CPU doesn't seem to be a limiting factor in this particular test scenario.

    @Harvy66:

    Since PRIQ does not seem to work, want to give HFSC a try?

    Yes, I'm willing to try anything. If you'll lead the way I'll gladly follow. I'm not familiar with the HFSC settings so I'll need some guidance there.

    @Harvy66:

    Also, do you have your bandwidth limit set on your LAN interface also?

    Aha - the bandwidth was set to 10 Mbps on the LAN interface. I bumped that to 990 Mbps and the latency spikes are now down to only 1,000 ms or so.

    @Harvy66:

    You can shape your download, it's just not as effective as shaping your upload, but it does still help. You may also need to waste a bit more bandwidth to gain more control.

    Roger that. When the time comes that shaping is a necessity this will be an acceptable trade-off for more control. Ideally I would like to gain the necessary understanding now rather than later when I'm under pressure to keep everything running smoothly.



  • Since you're not actually doing any download shaping, just don't setup anything on your LAN.

    On you WAN, set to HFSC, then go to each of the queues and instead of them being "priority", use the Bandwidth field. Don't worry about the three lower fields, leave those unchecked/set. HFSC is all about shaping bandwidth, not priorities. Figure out how much bandwidth you need or just set your "high priority" queues to have more bandwidth. You bandwidth can't be greater than the interface's total bandwidth. I prefer just to use percentages instead of actual bandwidth figures.

    You can just throw ballpark figures at your low bandwidth queues. VOIP should be quite low compared to your 100Mb connection, so I wouldn't worry about giving it too much bandwidth. Unused bandwidth will get fairly distributed among the other queues.

    Some examples
    ICMP: Bandwidth 1%, Codel Active Queue
    ACK: Bandwidth 20%, Codel Active Queue
    VOIP: Bandwidth 20%, Codel Active Queue
    Default: Bandwidth 59%, Codel Active Queue

    of course you had some other queues, you decide how you want the bandwidth distributed. Remember, just concern yourself with if your connection is maxed out and all of your queues are also maxed out, how would you want your bandwidth distributed.

    Another general rule of thumb. 80% is 100%. If you need 1Mb of bandwidth, then give a queue at least 1.25Mb of bandwidth.

    You may want to disable Suricata during performance testing if you're having performance issues. Always reduce the number of variables when attempting to debug.

    I chose 20% for ACK only because I've seen it recommend by a trust worthy source. I don't quite agree with it since ACK in my network have a 20:1 ratio, placing it at only 5%, but I'm not concerned because unused bandwidth gets distributed.



  • You are having trouble with traffic-shaping because you do not need it, as KOM said, I think.
    HFSC is for people with numerous, assumedly life-or-death services that all need to get their guaranteed service levels (bandwidth and/or latency guaratees). Most people really only have a VIP traffic, Penalized traffic, and Bulk traffic which can be handled just fine with PRIQ or CBQ (or FAIRQ).

    Traffic-shaping is rarely a good idea unless you need to FIX a PROBLEM. If you are just looking to get web-pages loaded faster, other avenues are much easier and more effective. Also, you cannot prioritize traffic without resultingly deprioritizing other traffic, so… Unless you NEED something, let it go for now.

    Harvy has some goodish hints and theories, but without packet/traffic captures to show before and after, be wary.
    Only by monitoring your traffic, will you know where the problems are, if there are any...
    With my ADSL 15mbit I allocated 300Kb for ACKs because that is what I observed with pftop and other monitors. Each person will have their own ACK ratio.

    Unless you have a precise goal and a solid grasp of internetworking, so that you know what to expect, there's only going to be confusion and haphazard trial and error. That is what happened to me...

    So, if you have no reason for traffic-shaping, just don't. Read up first, if you must. For example, The Book of pf, Network flow analysis, Practical Packet Analysis, and of course the Computer Networks by Tanenbaum. Knowing how to fix a network is different from bumbling around hoping you will learn, while screwing up your network and thinking you are actually improving it.

    If you are actually just interested in fixing the bufferbloat, I would find a Linux-based router OS, as Linux has been where every new scheduling algorithm has been implemented, for like 10+ years, and it has the modern CoDel with Fair-Queueing. The CoDel people are now working on Cake, which is quite a bit more ambitious than CoDel.

    P.S. With CoDel enabled on my WAN traffic, I see 0 to 4 packets in the queue, regardless. Users do not control the queue depth, CoDel does by constantly measuring the packet sojourn timing.



  • HFSC appears to be doing the trick! Initial tests show all parameters at expected values during testing - even with Suricata running. For thoroughness I disabled Suricata for a couple tests - with and without shaping - and the CPU stayed below 10% in all cases without Suricata's overhead while still hitting 50% in all cases with Suricata enabled.



  • Nullity - thank you for the excellent feedback. I agree with you wholeheartedly.

    I do not plan to use traffic shaping until it is actually needed, but before reaching that point I need to get my brain around how it works in pfSense. During this practice I discovered that simple PRIQ queues induce huge unacceptable spikes in latency for reasons which are still unexplained. Harvey's suggestion to use HFSC seems to work though, so now when I need traffic shaping I will better understand the appropriate way to implement it without doing more harm than good to my network.

    Thank you again for all of the excellent advice. The more I interact with the pfSense ecosystem the more impressed I am.



  • @Nullity:

    HFSC is for people with numerous, assumedly life-or-death services that all need to get their guaranteed service levels (bandwidth and/or latency guaratees). Most people really only have a VIP traffic, Penalized traffic, and Bulk traffic which can be handled just fine with PRIQ or CBQ (or FAIRQ).

    Even without all of the special features of HFSC, even if all you use is the bandwidth fields, HFSC is still superior to PRIQ or CBQ when it comes to fairly distributing bandwidth while maintain tight scheduling. It's like comparing an electric motor to a combustion engine, even if you don't enable regenerative breaking, it's still better.

    @Nullity:

    Traffic-shaping is rarely a good idea unless you need to FIX a PROBLEM.

    You can't go wrong with "don't fix what aint broken". But also good to learn new stuff that may become relevant as long as you're not harming production while testing.

    @Nullity:

    Harvy has some goodish hints and theories, but without packet/traffic captures to show before and after, be wary.

    Nullity makes a good point. I am not 100% confident, but I have had very good results on my network and I am confident enough with my reasoning to give ideas but not say "this is how you fix it". I think they should work most of the time, but I do not have a comprehensive background in these issues. I have good intentions, but take what I say with a grain of salt.

    @Nullity:

    Unless you have a precise goal and a solid grasp of internetworking, so that you know what to expect, there's only going to be confusion and haphazard trial and error. That is what happened to me…

    I think what he's getting at, is it's a great way to learn on your own network. Empirical evidence is important with debugging problems and solving specific problems is better than taking guesses, unless something you're doing is a general recommendation. One thing that may be worth trying and should be dead simple is using FAIRQ. I haven't used it myself, so I have no idea of its jitter characteristics or fairness under many flow high loads, but in theory it should be "set your interface bandwidth", and that's all. fq_Codel or Cake would be a future replacement for FAIRQ, whenever those get implemented.

    I still have no idea why PRIQ was giving such an issue with performance. In theory, VOIP should have been perfect since it was the highest priority.

    I am curious as to what kind of loads you have tested and what kind of jitter, loss, latency, bandwidth you're seeing. On my network, I get pretty much exactly what I expect, but I have a strange situation of very low pings, low loss, and my ISP uses an AQM. Yeah, boo hoo, I have great Internet.



  • @Harvy66:

    I still have no idea why PRIQ was giving such an issue with performance. In theory, VOIP should have been perfect since it was the highest priority.

    The main issue is that PRIQ is inducing huge latency spikes between LAN and WAN. Even though the VOIP packets are being dropped on the wire first they are still subject to the high latency of getting there in the first place. I still have no idea why PRIQ is causing these latency spikes.

    @Harvy66:

    I am curious as to what kind of loads you have tested and what kind of jitter, loss, latency, bandwidth you're seeing.

    I do what I can to fill up the WAN pipe from the LAN interface - mostly simultaneous speed tests and FTP transfers to/from a dedicated server on the other end of the WAN pipe. The pfSense traffic graph shows constant saturation on the WAN interface during these tests.

    When the WAN pipe is saturated PRIQ is inducing latency of 2-4 seconds with up to 20% packet loss according to the apinger stats. While I don't fully trust the apinger service, I am able to confirm noticeable (to the ear) VOIP issues coinciding with the high latency being reported by apinger.

    The WAN pipe is a dedicated gigabit fiber circuit which the ISP limits to 100/100 Mbps. If I saturate it without traffic shaping it remains rock solid with no movement in latency and not a single lost packet. No matter what I try I am not able to make the WAN link blink at all. If I'm not mistaken that means the WAN link itself can be ruled out as the source of high latency and packet loss.



  • Is any part of the PRIQ queue processing offloaded in a manner which HFSC is not? Could there be a situation where I am hitting processing limits of an offloaded resources which are not reported as part of the main CPU statistics?



  • I don't think the shapers use any offloading, but they make use of certain driver features



  • @CaptainElmo:

    Is any part of the PRIQ queue processing offloaded in a manner which HFSC is not? Could there be a situation where I am hitting processing limits of an offloaded resources which are not reported as part of the main CPU statistics?

    The CPU needed for any sched algo will be minimal. Elegance and efficiency are perhaps more important than actual scheduling capability (Stochastic Fair Scheduling, for example). HFSC, perhaps the most complex and CPU intensive, was capable of 80,000+ packets per second on a 200Mhz Pentium Pro.