Interesting Queue limit issue
-
I have an interesting "issue" you might say. I have a 50/50 connection, which is rate limited upstream by a Cisco core router. The problem is this core router doesn't quite limit my rate immediately, allowing for short bursts of packets to make it through before it starts to shape to my limit.
These are incredibly short bursts. If my connection is completely idle and I start streaming a YouTube video, I can get a nearly perfect vertical bandwidth graph going from 0mb/s to almost 200mb/s in one sample. The actual burst lasts less than one second, but the average is 200mb/s, meaning it pretty much lets 1gb/s through for a small window. Once my connection is transferring data, this is not an issue.
I finally decided to enable HFSC shaping to help with this. TCP was causing minor packet-loss and jitter from time to time because is would just ACK all of that data and the other side would be like "ohh, that was quick, send more". My previous way to "solve" this issue was to let BitTorrent just keep running. As long as a transfer is going that keeps the connection relatively in use, their shaping algorithm works fine.
HFSC is working quite nicely. Both my Upload and Download no longer show that massive spike, and my pings are unaffected. Great!.. not entirely. Minor issue. Seems the microbursting is still causing some issues. See that "qOthersLow" queue. That is my HTTP/HTTPS traffic and it has a 1,000 queue length, and it still drops packets from time to time. It was set to 50, but I was having issues where Google Plus would have issues loading because too many packets were being dropped. I actually got TCP connection issues because the burst of packets was so great, it was flooding the queue. I upped it to 250, and the problem was less, but I still sometimes got a "Connection issue" message on parts of the web page. So I set it to 1000, but it seems a few still get dropped.
I see two possible ways to help alleviate this problem. Up the queue length to even more than 1000, or to force my WAN port to be 100/100 instead of auto-negotiating to 1g/1g.
The con to reducing my link speed would it would increase jitter for all of my traffic, so I'm leaning towards just making the queue longer. Any recommendations on queue length?
If you're wondering why my up is limited to 50 and my down is 48, it's because I actually get about 51 on my up and 49 on my down, when doing speed tests and looking at my real-time graphs.
pfTop: Up Queue 1-16/16, View: queue QUEUE BW SCH PR PKTS BYTES DROP_P DROP_B QLEN BORR SUSP P/S B/S root_igb0 50M hfsc 0 0 0 0 0 0 qACK 10M hfsc 122K 7232K 0 0 0 qDefault 5000K hfsc 55915 20M 0 0 0 qP2P 2500K hfsc 1178K 1247M 0 0 25 qGames 10M hfsc 17661 1789K 0 0 0 qOthersHigh 5000K hfsc 18042 1448K 0 0 0 qOthersLow 2500K hfsc 73967 10M 0 0 0 root_igb1 1000M hfsc 0 0 0 0 0 0 qLink 200M hfsc 0 0 0 0 0 qInternet 48M hfsc 0 0 0 0 0 qACK 9600K hfsc 24983 1339K 0 0 0 qP2P 2400K hfsc 645K 65M 0 0 0 qGames 9600K hfsc 28185 6329K 0 0 0 qOthersHigh 4800K hfsc 9909 716K 0 0 0 qOthersLow 4800K hfsc 537K 306M 1045 1446K 0 qDefault 4800K hfsc 69099 14M 0 0 0
Thanks for any thoughts or ideas. either way, my Internet is still great, but dropping packets on sub-saturation levels of bandwidth is a bit annoying, especially when it sometimes causes issues with some web pages.
-
I just set my receive queues to 2.5k. It's pretty much an issue just limited to traffic that can burst in quickly. Because interactive streams, like games, are on their own separate queue with reserved bandwidth, it seems to not affect anything except their own queues. Because the burst is being rate limited to fit into 48Mb/s in a much smoother fashion via PFSense, than Cisco, my machine cannot ACK data that it has not received yet, so the other side backs down. It seems to be that it's not so much the burst causing issues, but that my machine would normally ACK all of the data in that burst as quickly as it came in, indicating to the other side that I'm ready to receive more, when it really needs to back off before Cisco clamps down hard.
This is mostly me just theorizing, but I am seeing much better results.
I did find that I need to limit my P2P's queue size. During the ramp-up of a heavily seeded torrent, like Fedora, the hundreds of sending end-points would still peak over 50Mb/s on my WAN interface before leveling off, even though PFSense was making sure that I was only getting 48Mb/s. So while a large queue to soak the burst from a single sender works fine, a large queue for many senders that are all ramping up at the same time can cause issues.
P2P also has a lot less burst than Google services. I don't really have the issue of 1gb micro-bursting from Torrents. If I remember correctly, Google uses a custom TCP setup where they purposefully burst the first X bytes at or near full line rate, to make better use of available bandwidth. They let network buffers worry about the bursts. The "problem" is that between my ISP and Google is Level 3, and no congestion. It just lets that 1gb burst right on through 8 hops and 250 miles.