Proportionate increase in bandwidth usage and ping

JenovaMantooth

After searching pfSense forums up and down I sometimes run into posts that sound similar, but are never resolved and I'm not sure are exactly my issue. The closes I've found is this post:
https://forum.pfsense.org/index.php?topic=143116.msg780424

I have a home network that has almost become SOHO. 5 PC's under constant use, a multitude of IoT devices like Alexa, lights. Several Raspberry Pi's running silly things like web servers and uTorrent.
A simple DD-WRT router as a gateway is seemingly overloaded with GUI lag as I've added on dumb switch after another to accommodate the wired connections. Not to mention the 20+ wireless devices.

As plainly put as possible, on a new install of 2.4.2 with a HP NC360T in an older AMD A4-5300B SFF I have a proportionate lag with ping over the WAN under load. Run a simple ping google.com -t seems to accurately capture the issue I see with applications like Discord and Steam games where ping is critical.

At idle the ping looks like this:
Reply from 172.217.1.206: bytes=32 time=51ms TTL=53
Reply from 172.217.1.206: bytes=32 time=52ms TTL=53
Reply from 172.217.1.206: bytes=32 time=50ms TTL=53
Reply from 172.217.1.206: bytes=32 time=51ms TTL=53
[repeat]

When a FireTV is streaming a program the ping will look like this…
Reply from 172.217.1.206: bytes=32 time=51ms TTL=53
Reply from 172.217.1.206: bytes=32 time=54ms TTL=53
Reply from 172.217.1.206: bytes=32 time=50ms TTL=53
Reply from 172.217.1.206: bytes=32 time=252ms TTL=53
Reply from 172.217.1.206: bytes=32 time=51ms TTL=53
[repeat]

With the above example of a streaming program, the FireTV seems to fill it's buffer and then grab a pulse of data periodically to refill. 65Mbs down and 7Mbs up is what my Cable WAN looks like.

If I do something that really hammers the connection, like a speed test I get something to the effect of this…
Reply from 172.217.1.206: bytes=32 time=51ms TTL=53
Reply from 172.217.1.206: bytes=32 time=54ms TTL=53
Reply from 172.217.1.206: bytes=32 time=50ms TTL=53
[ Idle, then download test starts ]
Reply from 172.217.1.206: bytes=32 time=98ms TTL=53
Reply from 172.217.1.206: bytes=32 time=78ms TTL=53
Reply from 172.217.1.206: bytes=32 time=88ms TTL=53
Reply from 172.217.1.206: bytes=32 time=92ms TTL=53
[ Upload test starts ]
Reply from 172.217.1.206: bytes=32 time=128ms TTL=53
Reply from 172.217.1.206: bytes=32 time=156ms TTL=53
Reply from 172.217.1.206: bytes=32 time=168ms TTL=53
Reply from 172.217.1.206: bytes=32 time=134ms TTL=53
Request timed out.
Reply from 172.217.1.206: bytes=32 time=175ms TTL=53
Reply from 172.217.1.206: bytes=32 time=137ms TTL=53

I have tried a variety of em(4) related troubleshooting options, disabling flow control, checksum and flow control disable/enable to no avail. Conversely, a DD-WRT router operating as a gateway does not seem to have this kind of issue. I am totally open to try anything at this point. I would very much prefer the power and flexibility of pfSense at this point and tire of spending weekends trying to figure this out and waiting for new versions and ultimately going back to plugging in my DD-WRT router. What information can I provide to help solve this issue and be rid of DD-WRT as a gateway?

corvey

Looks like saturation to me. Try limiting the bandwidth so it never maxes out.

JenovaMantooth

@corvey:

Looks like saturation to me. Try limiting the bandwidth so it never maxes out.

That was my initial impression, but I have not yet ruled out that I am competent with traffic shaping enough to share. Half the reason I lamented using DD-WRT is that these smaller routers get overloaded trying to do any reasonable QoS, let alone anything else one might want to do on the gateway. You have offered your help and it is greatly appreciated. I will take it as that. Allow me to beat myself over the head with traffic shaping for a bit until I feel myself worthy of holding that conversation better and I will return.

corvey

Ok, the IP you're testing with is obviously Google and that is normal to see spikes there from time to time. Open up several cmd prompts and ping several different random locations at the same time, like your ISP, other places, etc.. Come back with your results. You will see a pattern comparing the terminal windows, you may also discover the probability of whether or not the NIC/driver is bad just by analyzing the results of pinging multiple places at once.

If, for example, you see "Request time out" across the board in all terminals pinging different locations, then you have a problem with the NIC, driver, or Pf itself - this assumes you fully have internet access. If, you see, different ping times for different sites and maybe a spike or two on one particular address, a couple of request time outs at one address but not the others at the same time, consistency with fluctuation under load, then, it's probably normal. Pings can change under congestion and load.

Another possible reason is the ISP itself. I live out in the sticks with crappy DSL service, bonded in fact, but it works good for what it is when it's working. My ping is 30 to that IP you posted and in that time for the past 30 minutes, I saw "request time out" 3 times in row in that terminal and it spiked a couple of times, but not the other terminals.

I think you're beating yourself up over nothing to be honest. However, as I said before, throttling the bandwidth will solve those severe saturation spikes.

SammyWoo

Obviously the first thing to note is that CPU load NEVER max out, then after that there is never enough Internet bandwidth, some app(s) will always try grab the whole thing for itself, reason why one must always have a sensible QOS (Traffic Shaping Scheme), there is some art on configuring the later. So, you can never have a completely free highway for the sports car to flex its muscles, but with a good QOS configuration, you can minimize braking and maintain reasonable speed MOST of the time for all apps.

JenovaMantooth

Let me start by thanking everyone who responded to this post. This community and the ability to take my gateway beyond the realm of Broadcom chips is the reason I switched from DD-WRT to pfSense. Let me just say that I have been on a journey into the mouth of madness and back and have finally narrowed down what the problem is…

For anyone that is scanning past and doesn't want to hear my tale of newbie pfSense muckery let me just tell you what it was. A FireTV Ethernet adapter. Or cable. I'm going to drop another cable tomorrow to rule out the cable.

SammyWoo, good advice and I love the analogy. I use "cars on a highway" to explain all this stuff to my son quite often. In this case, the best way to describe this would be sitting at a highway onramp traffic meter with 5 out of 6 lanes completely unoccupied. In fact in my recent experimentation to solve this issue I even made it worse. Lets say 64ms,64ms,64ms,500ms,200ms,64ms,etc. No joke. To respond to your post I would say that CPU on my dual 3.4Ghz never goes above 10%, even with squid running transparent proxy mode. And my WAN is 60/6Mbps with nowhere near full network utilization, although I would take steps to ensure such later with limiters and things.

Corvey, also good advice. I made sure to test several different points on the WAN and even from different points on my LAN. Eventually I did lose all faith and resolved myself to "as good as it gets" until I managed to take my problems from an annoyance to FUBAR.

Here's the long of it then.

Bufferbloat and QoS
First I assumed like everyone else that it was bufferbloat. Several things bothered me about this though. Watching the throughput my FireTV would choke down no more than 1/5 of my bandwidth and despite having 50+ devices on my network I can say that it the whole thing got nowhere near the point of saturating the WAN. I tried the multiwan wizard… PRIQ, HFSC, made very little dent. In fact it seemed to have even added a +10ms to my pings overall. I tried manually configuring special HFSC settings from several posts regarding poor gaming performance even. Nothing seemed to work, and from what I know of QoS with other gateway/firewall devices and distros the weird lagging spike just made no sense to me. I always had some idea it was the FireTV, because I could pause it and the spikes would lessen, or even flatten out, but I was thrown off track when I would run ping tests and throughput tests that would cause similar, although less severe ping increases. Spoiler alert, I would attribute most of the ping increases from throughput tests to the PRIQ and HFSC queuing disciplines later.

NIC Troubleshooting
As I mentioned before I tried EVERYTHING. I can tell you I scanned hundreds of posts on this forum, easily. I tested EVERYTHING and nothing made any noticeable difference.

Everything Else Under the Hood
So when you're about ready to plug in old reliable and turn your spare PC back into a ZoneMinder server you get desperate. I tried rate limiting the FireTV. Limited it as low as 3Mbps and STILL no improvement. I tried dual wan with a wifi connection and not even the slightest noticeable difference even with the FireTV locked to the wifi connection. NAT reflection modes, static routes, IPv6, no IPv6, SLAAC, RAM disks and even modified any system tunable that I had even the slightest understanding of. So basically I was firing in the dark. Bear in mind I must have reinstalled pfSense many doezens of times in the past couple months.

The straw that finally broke the camel's back was SAMBA. I run a real frankenstein of a network share on this LAN. I had an old Intel dual core with 4 sata connections and threw in a sata adapter and FUSE'd together about 8 different leftover drives to make a single 2TB drive. The FUSE is on top of some RAID as well, so I pride myself on it's ridiculous speed and likely as ridiculous waste of space and energy compared to a single 2TB drive. Anyway, I digress… Another problem I had that I had ASSUMED was unrelated was poor throughput on the SAMBA share. I mean super poor. Like 1-3Mbps on a gigabit network connection. It affected streaming video playback it was so bad. I would mess with things while also trying to diagnose my pfSense issue (as I set up both at almost the same time) and occasionally I would "fix" it back to 90MB/s. Later the problem would return and I would stress about it and flail about like a moron in much the same way I have with this pfSense issue.

Let me finally reward anyone who has read this far as much as I can. I had decided to try one last thing I had avoided, but knew was great from my QoS work on DD-WRT. FQ_CODEL. By far the superior QoS discipline, it is mindbogglingly absent from pfSense. HFSC seems to be the best scheduler in my opinion, so I left that as is. As I ran over the predominant forum post regarding hacking in FQ_CODEL usage on pfSense I figured "Hey, why not check to see if it's in the development branch for 2.4.4!". Also like a moron I decided to just upgrade to the alpha development branch and promptly destroyed the system. Not only was it not in there, but I couldn't set up more than one limiter because of some strange GUI bug. I reinstalled back to the stable 2.4.3 and noted that without having done ANYTHING with the SAMBA share that was troubling me it was back to a solid throughput. ??? Additionally without any mucking about the baseline ping went from 50-60ms to 20-30ms. :) The ping spikes were still there, so I managed to put together FQ_CODEL from the post. Still spiking, although DSL Reports did upgrade my bufferbloat from C to A despite occasionally jumping to 500ms during the test :o So I yanked the FireTV ethernet cable and the spiking stopped as I expected it would. Similar to when I just hit pause on it. But as I angrily ranted about it to my wife I explained that the issue would happen if I hammered the WAN with a speed test and ran a speed test… And it didn't. ??? Sooooo, maybe FQ_CODEL was doing it's job then. Too bad it didn't help with the FireTV. I configured the FireTV to connect via my already congested wifi, and no spiking. Like maybe 5ms difference under load from either the FireTV or any of the computers. So maybe I'll try a different ethernet adapter, or replace a cable or just improve my wireless network.

All of my problems seem to have been solved. Everyone is playing nicely. My son is no longer complaining that Discord is dropping messages and lagging. My network share is fast enough to host linux user home folders, maybe even Steam games. FireTV trips up on 4K, but that's due to my wifi radio dipping to 10Mbps from time to time. Long story there… It's an old E3000 and I have a lot of neighbors that have apparently covered their radios with aluminum foil. But otherwise everything is great. In fact, with FQ_CODEL I'm hitting solid green bars online and my son is irritatingly "360 no scoping" people with much greater and much more annoying frequency. Frankly since I terminated my own Cat5e and just kinda tossed it under a carpet I'm going to rule that out first, but let me tell you that I have noticed 0 packet errors or loss across the network, so it's probably just that garbage FireTV ethernet adapter. Frankly I'm really surprised that one malfunctioning device could have such a dramatic and penetrating effect to the network. I mean, what could cause that? Some sort of whackado multicasting? I have discount LED strips and light switches that don't even have instructions in English that don't cause problems like this, but then again, it is ethernet.

Morale of the story is deal with one problem at a time. That and FQ_CODEL is INDISPENSABLE and absolutely must be in the next stable release in my opinion. You think at my age I would have learned that by now. Luckily being totally new to pfSense I can say that my foray into this issue has gotten me pretty well versed with the product. I suppose you have to learn sometime.

Harvy66

I think it's "180 no scope". Memories.