pfSense 2.5.2 periodic HUGE lag spikes
-
Suffering from periodic lag/ping spikes with my pfSense install at home. It's running on a Core i5 2500T with 4GB RAM & 250GB SSD.
Lag while gaming can jump up higher than 800ms every 1-2 minutes for around 10-20 seconds, then come back down. Pinging 8.8.8.8 during this time show's the ping time to jump to 30ms during this time.
My NICs:
bge0@pci0:2:0:0: class=0x020000 card=0x167714e4 chip=0x167714e4 rev=0x21 hdr=0x00 vendor = 'Broadcom Inc. and subsidiaries' device = 'NetXtreme BCM5751 Gigabit Ethernet PCI Express' class = network subclass = ethernet re0@pci0:3:0:0: class=0x020000 card=0x307c17aa chip=0x816810ec rev=0x06 hdr=0x00 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet
bge0 is my LAN, re0 is my WAN.
My set up is like this:
Internet < TP-Link Archer (Modem Mode) < pfSense (Authenticating with PPPoE) < Switch/AP.When I first installed this I was suffering from internet disconnects, so I installed the alternative re0 driver and that solved it. All day it was working fine. Today I've tried using the internet and this is happening. Internet speed test returns ok, it's just the latency that spikes.
PowerD is disabled, hardware offloading is disabled, no third party packages are installed.
Mbuf usage and state table size are sat at 0%. Memory usage at 9%.
This is my loader.conf.local
if_re_load="YES" if_re_name="/boot/modules/if_re.ko" kern.ipc.nmbclusters="1000000" hw.bge.tso_enable=0 hw.pci.enable_msix=0 machdep.hyperthreading_allowed="0"
Internet spike. Only lasts 30 seconds
Ping test taken at the time of the "spike". Ping statistics for 8.8.8.8: Packets: Sent = 1003, Received = 997, Lost = 6 (0% loss), Approximate round trip times in milli-seconds: Minimum = 7ms, Maximum = 68ms, Average = 13ms
-
I see the same on my Netgate 6100 with 4 WLANs loadbalancing. The lag shows in devices connected to the LAN interface - for these devices it looks like the internet connection is lagging but in the pfsense monitor I don't see any movement on all 4 WLANs.
After the lag is over, all WLANs continue as nothing happened.
I also found the longer the 6100 is running the more often these lags appear. If I reboot the 6100 I do not see any lag for a day or so -
@muenchris I've actually just rebuilt my pfsense box from factory defaults with Intel NICs and instantly got the same issue, but I may have found the fix. I had to apply traffic shaping rules.
I will still need to test this over a few days, but it's potentially fixed.
Here's how I did it:First, grab a speedtest and get your best-case Download/Upload speeds. In my case it was 40Mbit/4Mbit.
Head to Firewall > Traffic Shaper > Limiters. Create a new limited and name it something like "InternetDownload". Tick the "Enable" box and leave all the values as default except for the following:
- Bandwidth: The value you got from your speedtest - Queue Management Algorithm: CoDel - Scheduler: FQ_CODEL - Queue length: 1000 - ECN: Enabled
After saving, click on the Limiter you just created, scroll to the bottom and click "Add new Queue", name it InternetDownQueue, fill out the following:
- Enabled: ticked - Queue management algorithm: CoDel - ECN: enabled
Save that and follow the instruction again, but this time for your UPLOAD. Except, knock about 10-20% from your upload speed when filling it out.
Head over to firewall > Rules > Floating and add a new floating rule with the following info:
- Action: Pass - Quick: Enabled - Direction: Out - Address Family: IPv4 (usually) - Protocol: Any
Advanced Options
- Gateway: Select your WAN gateway - In/Out pipe: InternetUploadQueue / InternetDownloadQueue (Yes, it looks like its in reverse order, it's not).
Hit save and reload your configuration. Test a game/ping while running a speed test and the latency shouldn't get knocked.
-
@abtekk interesting. I thought the traffic shaper will limit connections from a client to one WLAN gateway (like a smart-queue in Unifi).
With my 4 WLANs I can download Steam or XBox updates across all WLANs, will this still work after the shaper is active? -
@muenchris Yes, in fact you won't be shaping your download speed at all, you just still need the rule to make the firewall happy. The only thing you'll be shaping is your upload speed (by up to 20%), which is just to stop the overload that seems to cause this. Let me know how you get on.
-
@abtekk Thanks, I will try and test for a couple of days then let you know
-
@abtekk Quick question: I did not select any of my interfaces in the firewall rule. Do I have to select all interfaces that I use in my load-balancer gateway group or keep them all unselected?
-
@muenchris said in pfSense 2.5.2 periodic HUGE lag spikes:
If I reboot the 6100 I do not see any lag for a day or so
A day seems fast for my suggestion but check to see if it's running out of memory. pcscd has a memory leak.
OTOH if the Internet pipe is full then that will back up everything and traffic shaping can help that a lot.
-
@steveits My network setup is quite complicated. Behind the pfSense/6100 there is a Unifi UDM Pro that manages my main network that has 5 sub-networks (IoT, Cameras, Media, Gaming and Business). All firewall rules are managed by the UDM Pro - the pfSense is "only" load balancing. This results in my triple-NAT setup.
Also my WLANs are all LTE router with non-consistent internet speeds (very different for time of day and day of week).
Especially the (isolated) IoT network creates a lot of "mini-Connections" to their respective clouds.
If there is a memory leak in one of the pfSense services it might be caused by the IoT network (over 150 devices).Is there something (like a specific service) I can "flush/restart" periodically? I would have to restart the complete Netgate every night
-
According to this thread:
https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4
Codel has a bug. Use "taildrop" instead. You'll get same result
-
@magikmark Thanks, I will check it out.
-
@muenchris said in pfSense 2.5.2 periodic HUGE lag spikes:
something (like a specific service) I can "flush/restart" periodically
The pcscd service, as mentioned :) If you aren't using IPSec you can just stop it, though it will start when pfSense boots. Otherwise if you follow into that bug report there is a patch to disable it properly. Not saying this is your issue, but it's generally an issue on all installs eventually.
-
@magikmark said in pfSense 2.5.2 periodic HUGE lag spikes:
Codel has a bug. Use "taildrop" instead. You'll get same result
First I've heard of that. I don't see anything recent in that thread detailing it. A lot of people are running that.
Is there a bug report?Steve
-
The thread is quite long. Here is the exact post. I think it's not Codel itself but pfsense:
https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/770
You get this error:
"config_aqm Unable to configure flowset, flowset busy!" error.
-
@magikmark said in pfSense 2.5.2 periodic HUGE lag spikes:
https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/770
Ah, OK. That's not a bug it's a feature.
I've never hit that but it looks like you would only ever hit it if trying to re-configure an existing pipe that is actively in use.
Steve