Traffic Shaping Performance issues

mattlach

Hey all,

So I am brand new to traffic shaping, but wanted to give it a try in order to better manage traffic on my network.

I went through and filled out the wizard in the traffic shaper, and didn't modify anything else manually.

The connection bench tests at ~152Mbit down and 160Mbit up normally, using Speedtest.net so as recommended in many guides, I set the limits to 97% of this on the first page of the wizard.

Followed by this I set vonage for VOIP, and reserved 90kbit both up and down for it. I also added deprioritization for peer to peer networks, enabling all of them, and prioritized game traffic, adding all of them.

I am familiar with, and understand the concept of QoS, and how you need to create the local limit lower than your total bandwidth in order to be able to control the traffic, and was expecting a slight speed test drop, in line with the 97% values I entered in the wizard, but my speed tests after, with the traffic shaper wizard enabled are abysmal. I get about 70Mbit/s down, and 102Mbit/s up, much lower than the 97%*152mbit and 97%*160mbit I expected.

Just to make sure I wasn't CPU limited (which I didn't expect, it's a haswell core dual core, with a base clock of 2.9 ghz and a turbo clock of up to 3.6ghz) I signed on via SSH and looked at the overall CP usage in top while performing the speed test. The CPI Idle figure never dropped below 87%, which suggests to me I never went above ~13% CPU usage in total.

Then I though maybe PowerD was to blame, so I tried disabling it and rerunning the test, but it made no difference.

The system is built around a Supermicro mini-ITX server board with dual intel gigabit network adapters (i210at and i217) One of them uses the igb driver, and the other uses the em driver (though I haven't been able to figure out which is which). Not knowing if there would be any advantage to the configuration either way, I randomly set em0 as my WAN and igb0 as my LAN.

Could the adapters be to blame? Could toggling various adapter offloads on and off make a difference here? Hardware offload of QoS (if they support it) is great but not if the hardware offload is too slow, and the CPU can do it faster, right?

Or am I going about this all wrong? Are my poor speed test results just the traffic shaper doing it's job, detecting a bandwidth hog and slowing it down? That can't be the case though, right? I had thought the way the traffic shaping was supposed to work was to allow even a low priority category to use all of the bandwidth, at times when no one else needs it (and I intentionally disconnected everything else using bandwidth for the test)

I intentionally over dimensioned this system because I wanted to run QoS on it and I am a bit disappointed right now. Does anyone have any thoughts about what I might try?

Much obliged,
Matt

Nullity

Which version(s) of pfSense?

mattlach

@Nullity:

Which version(s) of pfSense?


2.3-RELEASE (amd64) 
built on Mon Apr 11 18:10:34 CDT 2016 
FreeBSD 10.3-RELEASE 

The system is on the latest version.

Harvy66

The default buffer sizes are only 50 when you enabled shaping. These are way too small for most 10Mb+ networks. Even when I had a 50Mb connection, my bandwidth was only about 30Mb/s with a buffer of 50, and it was up and down and all over. Be warned, too large of a buffer will add bloat. I recommend just enabling CoDel, which has a large buffer, but fights bloat. /guessing at the issue

mattlach

@Harvy66:

The default buffer sizes are only 50 when you enabled shaping. These are way too small for most 10Mb+ networks. Even when I had a 50Mb connection, my bandwidth was only about 30Mb/s with a buffer of 50, and it was up and down and all over. Be warned, too large of a buffer will add bloat. I recommend just enabling CoDel, which has a large buffer, but fights bloat. /guessing at the issue

Thank you, I appreciate this suggestion, and will try it when I get home.

Which buffer are you talking about though? Is this a buffer specific to the traffic shaping queue? Where do I change it?

Also, What is CoDel, and where is that setting? I did sopme googling but this was unsatisfiactory. Do you have any links to good reading material on what CoDel is and how it works? Is it just an alternative ot HFSC?

Thanks again,
Matt

mattlach

@Harvy66:

Be warned, too large of a buffer will add bloat.

Also,

What do you mean by bloat in this context? Are we just talking about added RAM use?

My pfSense box has 2GB of RAM, only because I wanted to take advantage of the dual channel performance, so I needed two sticks of RAM, and it turns out you just can't buy anything smaller than a 1GB stick these days :p

With 2GB of RAM, I've never seen pfSense use more than ~5%. I think I can probably live with a little RAM bloat if that's the extent of it.

Nullity

@mattlach:

@Harvy66:

Be warned, too large of a buffer will add bloat.

Also,

What do you mean by bloat in this context? Are we just talking about added RAM use?

My pfSense box has 2GB of RAM, only because I wanted to take advantage of the dual channel performance, so I needed two sticks of RAM, and it turns out you just can't buy anything smaller than a 1GB stick these days :p

With 2GB of RAM, I've never seen pfSense use more than ~5%. I think I can probably live with a little RAM bloat if that's the extent of it.

He means bufferbloat: latency caused by oversized network buffers.

Regarding your original question, if you put in 150Mbit then you should get practically 150Mbit. If you are not, then something is wrong. Honestly, I usually only see this strange behavior when someone is new to pfSense, so I assume misconfiguration, but I could be wrong. Double-check your setting and perhaps show pictures of your queues. Make sure to reset states between queue/firewall changes.

Plenty of us use pfSense and when we setup a 10Mbit queue, that queue moves 10Mbit. Long-time users would quickly recognize if the values were skewed.

mattlach

@Nullity:

He means bufferbloat: latency caused by oversized network buffers.

Regarding your original question, if you put in 150Mbit then you should get practically 150Mbit. If you are not, then something is wrong. Honestly, I usually only see this strange behavior when someone is new to pfSense, so I assume misconfiguration, but I could be wrong. Double-check your setting and perhaps show pictures of your queues. Make sure to reset states between queue/firewall changes.

Plenty of us use pfSense and when we setup a 10Mbit queue, that queue moves 10Mbit. Long-time users would quickly recognize if the values were skewed.

Thank you. I'll do some poking around and post some screenshots.

While I am not new to pfSense at all (been using it since ~2010) I AM new to traffic shaping. I tried to set it up once a few years back, but it got complicated and I gave up.

While I understand the basics, I've always gotten myself confused by how the firewall rules work, and how that interacts with queues etc. I have no problem blocking or opening certain ports and setting up port forwards, but any more complicated than that, and I've quickly become confused and I've never found a good write-up that explains the whole thing adequately.

I'll happy accept recommendations for further reading :)

When I built this router box last week, I simply saved my old configuration from the pfSense install I had running as a guest on my ESXi server. Hopefully there are no old settings in there screwing things up. Maybe I should try a clean install. It doesn't take THAT long to set up my port forwards and static DHCP leases…

Nullity

@mattlach:

@Nullity:

He means bufferbloat: latency caused by oversized network buffers.

Regarding your original question, if you put in 150Mbit then you should get practically 150Mbit. If you are not, then something is wrong. Honestly, I usually only see this strange behavior when someone is new to pfSense, so I assume misconfiguration, but I could be wrong. Double-check your setting and perhaps show pictures of your queues. Make sure to reset states between queue/firewall changes.

Plenty of us use pfSense and when we setup a 10Mbit queue, that queue moves 10Mbit. Long-time users would quickly recognize if the values were skewed.

Thank you. I'll do some poking around and post some screenshots.

While I am not new to pfSense at all (been using it since ~2010) I AM new to traffic shaping. I tried to set it up once a few years back, but it got complicated and I gave up.

While I understand the basics, I've always gotten myself confused by how the firewall rules work, and how that interacts with queues etc. I have no problem blocking or opening certain ports and setting up port forwards, but any more complicated than that, and I've quickly become confused and I've never found a good write-up that explains the whole thing adequately.

I'll happy accept recommendations for further reading :)

When I built this router box last week, I simply saved my old configuration from the pfSense install I had running as a guest on my ESXi server. Hopefully there are no old settings in there screwing things up. Maybe I should try a clean install. It doesn't take THAT long to set up my port forwards and static DHCP leases…

The official pfSense wiki is good. Otherwise, there's the official pfSense book, "The book of pf", and of course Google.

Firewall rules match and then assign said packets to a queue of your choosing. Then you setup the queue to have whatever traffic-shaping characterics you want.

Harvy66

@mattlach:

@Harvy66:

Be warned, too large of a buffer will add bloat.

Also,

What do you mean by bloat in this context? Are we just talking about added RAM use?

My pfSense box has 2GB of RAM, only because I wanted to take advantage of the dual channel performance, so I needed two sticks of RAM, and it turns out you just can't buy anything smaller than a 1GB stick these days :p

With 2GB of RAM, I've never seen pfSense use more than ~5%. I think I can probably live with a little RAM bloat if that's the extent of it.

Bufferbloat is why people see high latency during congestion. Get rid of the bloat and you get rid of the latency.

Example. Say set your buffer to hold a gigabit of packets, but you only have a 1Mb connection. If someone was to send you a bunch of data faster than 1Mb/s, your buffer would eventually fill up. Once your buffer is full, it will take at least 1,000 seconds to empty, and that's assuming no new data comes in. Now what happens if your buffer is nearly full and someone send a ping packet? The ping has to sit at the back of the line and wait 1,000 seconds before you see it. Now you have 1,000 seconds of latency.

You want a large enough buffer to absorb a bursty traffic, but you want a small buffer to handle sustained traffic. It's very difficult to balance. CoDel doesn't use data sized buffers but time sized buffers. The default is 5ms. Once your buffer has 5ms of data, a packet will get dropped. The most likely packet to get dropped is a packet associated with a greedy flow, signally it to back-off. It's not guaranteed, but CoDel is biased towards dropping packets from high bandwidth flows and unlikely to drop small packets from low bandwidth flows, like VoIP, ping, games.

To set your queue size, edit your traffic shaper queue in the "Queue Limit" setting. When not set, it's 50. I think the queue size is meaningless when CoDel is selected.

QueueSizeExample.png_thumb

mattlach

Thank you very much for the explanation.

What is codel, and how do I set it? I can't seem to find it anywhere in the traffic shaper menus?

mattlach

Alright, So I did the test I suggested I was going to above.

I reset pfsense to factory settings to make sure nothing old was hiding in the config.

Ran bandwidth test without traffic shaping, got my full ~160/160

Again, did the HFSC traffic shaper wizard, and wound up with the same results as before, very slow.

So I went back again, to a saved config I had before doing the traffic shaper wizard, and instead went to the "by interface" screen, and added codelq there for both LAN and WAN. It was confusing though, because this thread suggested no bandwidth needed to be entered for codelq, as it's operation is bandwidth independent and it adjusts on its own, but the screen would not let me apply the settings without applying a bandwidth.

It was unclear to me which bandwidth this should be though. The wizard game me an "upstream" and a "downstream" bandwidth. Here it is just one per interface. The tooltip said that "this is usually the interface bandwidth", so I set both to 1 Gbit/s. Was this the right thing to do, or should I have chosen something closer to my external bandwidth (or just under, like with HFSC), and in that case which do I enter where, upstream and downstream wise?

Anyway, I did a bandwidth test after applying codelq, and it appears as if I have my full bandwidth, but how do I know it is actually working and doing it's magic in the background?

Again, appreciate all your help.

–Matt

Nullity

@mattlach:

Alright, So I did the test I suggested I was going to above.

I reset pfsense to factory settings to make sure nothing old was hiding in the config.

Ran bandwidth test without traffic shaping, got my full ~160/160

Again, did the HFSC traffic shaper wizard, and wound up with the same results as before, very slow.

So I went back again, to a saved config I had before doing the traffic shaper wizard, and instead went to the "by interface" screen, and added codelq there for both LAN and WAN. It was confusing though, because this thread suggested no bandwidth needed to be entered for codelq, as it's operation is bandwidth independent and it adjusts on its own, but the screen would not let me apply the settings without applying a bandwidth.

It was unclear to me which bandwidth this should be though. The wizard game me an "upstream" and a "downstream" bandwidth. Here it is just one per interface. The tooltip said that "this is usually the interface bandwidth", so I set both to 1 Gbit/s. Was this the right thing to do, or should I have chosen something closer to my external bandwidth (or just under, like with HFSC), and in that case which do I enter where, upstream and downstream wise?

Anyway, I did a bandwidth test after applying codelq, and it appears as if I have my full bandwidth, but how do I know it is actually working and doing it's magic in the background?

Again, appreciate all your help.

–Matt

The answers to your questions are out there. Searching this forum & Google can answer everything you are asking.

Sorry for the callousness, but I spent ~9 months researching traffic-shaping (mostly relating to pfSense) and I know for a fact that you can find the answers if you search & read. If, after exhaustively researching traffic-shaping, you still have an unsolved problem, please ask. I (and very likely others) will be interested in solving your problem.

but otherwise, much smarter & more educated people have already answered your questions. You simply need to find those answers. ermal's posts in this forum are a favorite resource of mine, but most of his topics may be too advanced. Easy solutions are rare. :)

Harvy66

Shaping is bloody simple, but everything you read out there makes it seem hard for some reason. I'll see about creating some screenshots some time this weekend.

There are only four things you need to do

HSFC

Set your interface bandwidth
Create your queues
Set your queue minimum bandwidths
Assign traffic to your queues

#1 seems to be the #1 reason for issue. I have no idea how this part is so hard. If you only have 10Mb of bandwidth, set your Interface to only have 9Mb.
#2 seems to be the next issue. Stop creating complicated hierarchical queues, just create 3, high normal and idle
#3 I can't explain this, same as #1. Stop thinking about priorities and blah blah blah. How much minimum bandwidth do you want for this queue
#4 The wizard does create these for you. Unlike normal firewall rules, for floating, the last one wins

P.S. Don't use realtime, it's confusing, seems to have little benefit in most cases, and is easy to mess up

-flo- 0

@Harvy66:

Set your interface bandwidth
[…]
#1 seems to be the #1 reason for issue. I have no idea how this part is so hard.

I do.

I worked quite a while on this until I found out that the selection 'kbit/s' in the bandwidth setting was plain wrong. It should have been 'bit/s'. (Don't know whether this has been fixed in 2.3 but the bug is still in 2.2.6.)

If as a consequence of this bug I set the bandwidth limit to effectively 10 bit/s (intending it to be 10 kbit/s) the performance was of course completely off …

-flo-

Harvy66

Here is my setup. I'm only showing the WAN because the LAN is mostly identical, just slightly lower interface bandwidth. Some of my settings, like queue size when using CoDel, is probably not correct, and I have funny percentages, but this is exactly what I use. Take it, leave it, learn from it. Do whatever.

WAN.png_thumb

WAN-qACK.png_thumb

WAN-qClassified-Parent.png_thumb

WAN-qClassified-qHigh.png_thumb

WAN-qClassified-qNormal.png_thumb

WAN-qUnclassified-Parent.png_thumb

WAN-qUnclassified-qDefault.png_thumb

WAN-qUnclassified-qUDP.png_thumb

Nullity

@pf3000:

@-flo-:

I worked quite a while on this until I found out that the selection 'kbit/s' in the bandwidth setting was plain wrong. It should have been 'bit/s'. (Don't know whether this has been fixed in 2.3 but the bug is still in 2.2.6.)
If as a consequence of this bug I set the bandwidth limit to effectively 10 bit/s (intending it to be 10 kbit/s) the performance was of course completely off …

-flo-

Thanks for pointing that out. So it's actually bit labeled as Kbit huh. I was confused when the speed test showed 10% of my actual download after setting in Kbit. I settled for Mbit and it's fine. However setting up bandwidth in Kbit via the wizard, it's okay. After that editing to Mbit and back to Kbit messes it up for me. It's like that on both 2.3 and 2.3.1

Hmm… I have used pfSense since 2.1.x and kbit has always been kbit. With 2.3 I am using kbit with no problems.

I extensively tested 2.2.x using mostly "kbit" and never experienced this bug. Are we sure it even exists? Has it been submitted to the pfSense bug tracker?

-flo- 0

@Nullity:

I extensively tested 2.2.x using mostly "kbit" and never experienced this bug. Are we sure it even exists?

I already reported this nearly one year ago, see this thread: Noob guide to Traffic Shaping. You already commented on that thread back then.

At least one user confirmed this problem, others didn't.

@Nullity:

Has it been submitted to the pfSense bug tracker?

I didn't, should I?

-flo-

Nullity

@-flo-:

@Nullity:

Has it been submitted to the pfSense bug tracker?

I didn't, should I?

-flo-

Absolutely.

You would need to share some detailed, repeatable steps that consistently show that "kbit" is actually "bit", then someone will either confirm the bug (and fix it) or have further questions.

It might just be easier to post the info here and someone more experienced will submit a bug.

pf3000

It's my mistake; I couldn't duplicate the issue. I knew that traffic wizard always worked, and my internet upload speed is 20% of my download.

While editing the shaper, I may have plugged in up/down bandwidth numbers in lan/wan instead of the other way round.
I also had "Disable hardware large receive offload" unchecked - it hinders my real speed test results.

Hence I got a speed test result of about 10% of the download speed, leading me to think that kbits might be bits. Sorry for the confusion : D