bufferbloat with fq_codel after update to 23.01
-
Edit:
Observations below are wrong. As stated here in some post later the same I observe in previous version (22.05)Original description:
Hi,
I wonder if I am the only one observing worse fq_codel performance after upgrade to 23.01Before upgrade
With 22.05 I had almost standard fq_codel configuration for my 600/30 Mbps connection:
And each time I run bufferbloat test I got something like the one below (depending on the day max +/- 3ms diff) - I have been getting results like that for months
After upgrade
Now when I check bufferbloat on 23.01 (the same configuration) test looks the same for the first half of download test, but after a few seconds it is gradually getting worse. An example of the one of the better results I got is below
Change isn't big but on 23.01 I have never got anything better than 40ms for "Download Active" resultAdditional notes
- I'm using Proxmox and for the test above:
- I restored backup with 22.05 and run tests
- I restored backup with 23.01 and run the second test
- so all if it has been run multiple times within 10 minutes
- before each tests I restarted pfsense (it doesn't change anything for the results but I wanted to be sure that I doing it properly)
- I'm using passtrough for NICs and
pciconf -lv
is giving for thisigc
- device name 'Ethernet Controller I225-V' - Intel(R) Celeron(R) J4125 CPU @ 2.00GHz
- In case it is needed I'm using pfblocker
If additional info is needed please let me know.
- I'm using Proxmox and for the test above:
-
@tomashk said in bufferbloat with fq_codel after update to 23.01:
In case it is needed I'm using pfblocker
If you have Wildcard Blocking (TLD) enabled ensure it isn't chewing up CPU during the test:
https://redmine.pfsense.org/issues/13884 -
@steveits said in bufferbloat with fq_codel after update to 23.01:
If you have Wildcard Blocking (TLD) enabled ensure it isn't chewing up CPU during the test:
https://redmine.pfsense.org/issues/13884I don't have it enabled
-
Also I wanted to rule out any additional factor that might make those observations not valid. For example do you know other sites testing bufferbloat? I used https://www.waveform.com/tools/bufferbloat but maybe I should check it using something else. I don't know if http://www.dslreports.com/ is good because for me it has only one server.
-
@tomashk I did a check of my install against Waveform, and I do not see a degradation with 23.01.
First thing I would do is to check CPU utilization during the test and see if you are seeing high CPU utilization. If so, what's using up the CPU?
For reference, I am on a 6100 (Atom). My connection is a bit slower than yours at 300/30, however I see peak CPU utilization of under 20%. My test results are here if you want to see them.
FWIW, depending upon how your ISP implements its own rate limiting, your limiter setting of 555Mb is probably a bit high. Even in your prior results, you have a significant bump in latency. You might try lowering the limiter rate to get that under control. My would guess somewhere around the 510-525Mb range.
-
@dennypage Thank you for your suggestion. I'll check it out later. And just for reference - are your settings for fq_codel similar or completely different?
Also, I think it would be great if someone could provide a good way to profile limiters like this. Because right now I'm just changing the settings at random and hoping for the best :)
-
@tomashk said in bufferbloat with fq_codel after update to 23.01:
@dennypage And just for reference - are your settings for fq_codel similar or completely different?
That I can see from your status snapshot, they are the same except for the bandwidth settings.
-
@dennypage Looks like I'll have to do a bit more research. Even with the limiter at 555Mbps the CPU was below 50%. I tried lowering the speed, but the funny thing is that even when I changed it to 450Mbps for download, the results were maybe 5-10ms better. Obviously something is wrong, but I suspect that there is something special about my configuration :). I'll probably come back to this once I've had a bit more time to observe it. Or maybe compare its behavior with 22.05.
-
@tomashk said in bufferbloat with fq_codel after update to 23.01:
@dennypage Even with the limiter at 555Mbps the CPU was below 50%. I tried lowering the speed, but the funny thing is that even when I changed it to 450Mbps for download, the results were maybe 5-10ms better.
What is your hardware? Approaching 50% seems pretty high.
When you lowered to 450Mb, what was your throughput?
-
@tomashk said in bufferbloat with fq_codel after update to 23.01:
@dennypage Even with the limiter at 555Mbps the CPU was below 50%. I tried lowering the speed, but the funny thing is that even when I changed it to 450Mbps for download, the results were maybe 5-10ms better.
What is your hardware? Approaching 50% seems pretty high.
When you lowered to 450Mb, what was your throughput?
-
@dennypage So I found a temporary solution. I set it to 550Mbps and changed the setting for the VM with pfsense. I had 2 cores assigned and changed it to 4. Now I'm getting
So something that's acceptable to me while I'm playing with the settings. And at the moment I have to stop because other people will be using the network.
I have an Intel(R) Celeron(R) J4125, so only 4 cores, but this proxmox only has this pfsense and container with unifi controller. So it will do for now.
-
@tomashk I'm glad you found a solution.
-
It seems I was wrong. After giving it 4 cores, it will just work a little better and get an A once in a while - maybe once or twice for 20 tests. On the dashboard, the maximum CPU usage I saw was between 20 and 25%.
top -HaSP
doesn't show anything working hard either.I guess at this point I should ask for investigation tips. This could be anything now
- some bug in the limiter implementation
- something between the new kernel and proxmox
- Neighbour made a voodoo doll to influence my router ;)
You never know.
Of course I'll share if I learn something useful and I'm grateful for any suggestions.
I'm also going to look at version 22.05 a bit more closely. It may be that the same problem exists there and I haven't investigated it well enough.
-
@tomashk This testsite is highly dependent on your ISP, peering, maybe daytime.
The better test would be pinging something near and using speedtest.net in the meantime. -
@bob-dig Thanks for the suggestion. I hadn't thought of it that way.
I guess I was right to look at an older version (22.05). After a short test, I found a similar problem there as well. Since version 22.05 worked fine for me for months (in terms of bufferbloat), I assume that something has changed recently. So I will focus on what seems more likely:
- ISP has changed something (one of those using DOCSIS)
- something has changed in proxmox
- my configuration (pfsense or proxmox) is not very good
So I have a lot to analyze. And sorry for the initial wrong guess. I should have better checked version 22.05 and not assumed that if it worked before, it will work now.
But I'll try that later, because now I'd be disturbing others on the same network.
-
Got this...
-
@tomashk One thing to bring forward then. When you lowered to 450Mb, what was your throughput?
The reason that I asked was to confirm that your limiter assignment rules were actually being hit.
-
@dennypage I'll test it again when I get back, but I'm pretty sure limiter was used. When I was testing I usually had the dashboard open and sometimes also the 'limiter info' page. And limiter info was showing usual stuff about packet increase, drop, etc. when test started. Dashboard showed traffic graphs:
- for LAN it is what was set for limiter +/- 5 Mbps
- for WAN I see traffic about 5 to 10 Mbps more than LAN output
Since I've been wrong a few times, I'll check again later and post if I remember correctly.
-
@tomashk, I am using floating rules to perform limiter assignment, with no ackqueue. FWIW, I also have a floating rule just prior to exclude ICMP echo request/reply from the limiter rules.
-
You're not the only one. I am seeing terrible bufferbloat performance after upgrading. This and some other CPU related issues has caused me to revert back to 22.06 (thx ZFS!).