Interesting Queue limit issue

Harvy66

I have an interesting "issue" you might say. I have a 50/50 connection, which is rate limited upstream by a Cisco core router. The problem is this core router doesn't quite limit my rate immediately, allowing for short bursts of packets to make it through before it starts to shape to my limit.

These are incredibly short bursts. If my connection is completely idle and I start streaming a YouTube video, I can get a nearly perfect vertical bandwidth graph going from 0mb/s to almost 200mb/s in one sample. The actual burst lasts less than one second, but the average is 200mb/s, meaning it pretty much lets 1gb/s through for a small window. Once my connection is transferring data, this is not an issue.

I finally decided to enable HFSC shaping to help with this. TCP was causing minor packet-loss and jitter from time to time because is would just ACK all of that data and the other side would be like "ohh, that was quick, send more". My previous way to "solve" this issue was to let BitTorrent just keep running. As long as a transfer is going that keeps the connection relatively in use, their shaping algorithm works fine.

HFSC is working quite nicely. Both my Upload and Download no longer show that massive spike, and my pings are unaffected. Great!.. not entirely. Minor issue. Seems the microbursting is still causing some issues. See that "qOthersLow" queue. That is my HTTP/HTTPS traffic and it has a 1,000 queue length, and it still drops packets from time to time. It was set to 50, but I was having issues where Google Plus would have issues loading because too many packets were being dropped. I actually got TCP connection issues because the burst of packets was so great, it was flooding the queue. I upped it to 250, and the problem was less, but I still sometimes got a "Connection issue" message on parts of the web page. So I set it to 1000, but it seems a few still get dropped.

I see two possible ways to help alleviate this problem. Up the queue length to even more than 1000, or to force my WAN port to be 100/100 instead of auto-negotiating to 1g/1g.

The con to reducing my link speed would it would increase jitter for all of my traffic, so I'm leaning towards just making the queue longer. Any recommendations on queue length?

If you're wondering why my up is limited to 50 and my down is 48, it's because I actually get about 51 on my up and 49 on my down, when doing speed tests and looking at my real-time graphs.

pfTop: Up Queue 1-16/16, View: queue
QUEUE               BW SCH  PR  PKTS BYTES DROP_P DROP_B QLEN BORR SUSP P/S  B/S
root_igb0          50M hfsc  0     0     0      0      0    0                   
 qACK              10M hfsc     122K 7232K      0      0    0                   
 qDefault        5000K hfsc    55915   20M      0      0    0                   
 qP2P            2500K hfsc    1178K 1247M      0      0   25                   
 qGames            10M hfsc    17661 1789K      0      0    0                   
 qOthersHigh     5000K hfsc    18042 1448K      0      0    0                   
 qOthersLow      2500K hfsc    73967   10M      0      0    0                   
root_igb1        1000M hfsc  0     0     0      0      0    0                   
 qLink            200M hfsc        0     0      0      0    0                   
 qInternet         48M hfsc        0     0      0      0    0                   
  qACK           9600K hfsc    24983 1339K      0      0    0                   
  qP2P           2400K hfsc     645K   65M      0      0    0                   
  qGames         9600K hfsc    28185 6329K      0      0    0                   
  qOthersHigh    4800K hfsc     9909  716K      0      0    0                   
  qOthersLow     4800K hfsc     537K  306M   1045  1446K    0                   
  qDefault       4800K hfsc    69099   14M      0      0    0

Thanks for any thoughts or ideas. either way, my Internet is still great, but dropping packets on sub-saturation levels of bandwidth is a bit annoying, especially when it sometimes causes issues with some web pages.

Harvy66

I just set my receive queues to 2.5k. It's pretty much an issue just limited to traffic that can burst in quickly. Because interactive streams, like games, are on their own separate queue with reserved bandwidth, it seems to not affect anything except their own queues. Because the burst is being rate limited to fit into 48Mb/s in a much smoother fashion via PFSense, than Cisco, my machine cannot ACK data that it has not received yet, so the other side backs down. It seems to be that it's not so much the burst causing issues, but that my machine would normally ACK all of the data in that burst as quickly as it came in, indicating to the other side that I'm ready to receive more, when it really needs to back off before Cisco clamps down hard.

This is mostly me just theorizing, but I am seeing much better results.

I did find that I need to limit my P2P's queue size. During the ramp-up of a heavily seeded torrent, like Fedora, the hundreds of sending end-points would still peak over 50Mb/s on my WAN interface before leveling off, even though PFSense was making sure that I was only getting 48Mb/s. So while a large queue to soak the burst from a single sender works fine, a large queue for many senders that are all ramping up at the same time can cause issues.

P2P also has a lot less burst than Google services. I don't really have the issue of 1gb micro-bursting from Torrents. If I remember correctly, Google uses a custom TCP setup where they purposefully burst the first X bytes at or near full line rate, to make better use of available bandwidth. They let network buffers worry about the bursts. The "problem" is that between my ISP and Google is Level 3, and no congestion. It just lets that 1gb burst right on through 8 hops and 250 miles.