[Solved] High latency and low CPU utilization after 2.1->2.1.3 upgrade
-
Just some more information I've discovered:
I have several machines pinging the pfsense box at the moment, and it seems the pings only spike on the machines that are pushing traffic…
-
What NICs do you have on that box?
New drivers for Intel NICs went in between 2.1 and 2.1.3. If you are using Intel NICs and you have any tuning options in loader.conf.local try removing them.Steve
-
What NICs do you have on that box?
New drivers for Intel NICs went in between 2.1 and 2.1.3. If you are using Intel NICs and you have any tuning options in loader.conf.local try removing them.Steve
I've got a D-Link NIC for LAN side and Elitegroup (onboard) for WAN. I do not have a loader.conf.local, and there are no NIC based tuning arguments in loader.conf…
Thanks for the response.
-
Ok, what drivers are those NICs using? D-Link and Elitegroup could be almost anything.
Copy and paste the output of:pciconf -lv | grep 20000
Steve
-
Ok, what drivers are those NICs using? D-Link and Elitegroup could be almost anything.
Copy and paste the output of:pciconf -lv | grep 20000
Steve
re0@pci0:2:0:0: class=0x020000 card=0x26511019 chip=0x816810ec rev=0x03 hdr=0x00
vr0@pci0:3:1:0: class=0x020000 card=0x14051186 chip=0x31061106 rev=0x8b hdr=0x00Thanks
-
Hmm, no changes to those drivers as far as I'm aware.
Go to System: Advanced: Networking: and make sure you have all hardware offloading options disabled. Not that it should have changed since 2.1.
Are you using traffic shaping? Are you using VLANs?Steve
-
Hmm, no changes to those drivers as far as I'm aware.
Go to System: Advanced: Networking: and make sure you have all hardware offloading options disabled. Not that it should have changed since 2.1.
Are you using traffic shaping? Are you using VLANs?Steve
We aren't using traffic shaping at the moment, and there are no VLANs set on the pfSense box. There is a VLAN set on our VoIP system, but that is only so it bypasses the firewall, as our last system didn't have VLAN capabilities and we haven't gotten around to changing it yet (if we ever bother to), so I'm hoping that is a non-issue.
Two of the three offloading options are disabled. The only one that wasn't is "Hardware Checksum Offloading". I disabled it, and rebooted the firewall.
There was no change in the problems, unfortunately.
I'm considering backing up my Sarg logs and rolling back to my Clonezilla image and attempting the upgrade again, or starting with a fresh install, importing the current config, and reinstalling the Dansguardian and squid packages. I'm just trying to avoid doing another all-nighter. My hard disk is 300gb, and I allocated the whole thing. Even though only 35gb is used, due to the filesystem type, Clonezilla takes 2.5 hours to run…
-
Clonezilla takes 2.5 hours to run…
Ouch!
A fresh install and config restore is certainly an option. You might want to setup a very simple WAN and LAN config to test the throughput before you restore the config though just to make sure it's not a problem with the config file.
I assume your VoIP VLAN completely bypasses pfSense then? One user recently had some latency issues with VLANs. There's no chance of tagged packets endding up at the pfSense NICs?
When you say you're not currently running traffic shaping do you mean you have done previously? Were you using rates close to the restriction you're seeing? Just speculation but maybe you have some rogue config options somewhere that were interpreted by the upgrade code incorrectly. It might be worth having a manual read through of the config file.
One thing you could try is downloading some data directly to the pfSense machine to check if the restriction is WAN or LAN side. E.g.
[2.1.3-RELEASE][root@pfsense.fire.box]/root(1): fetch -o /dev/null http://download.thinkbroadband.com/10MB.zip /dev/null 100% of 10 MB 2067 kBps
Thinkbroadband is a good site for me but you might want to use something more local to you.
You should get >1Mbps wherever you are though.
Edit: Not this thread! ::)Steve
-
A fresh install and config restore is certainly an option. You might want to setup a very simple WAN and LAN config to test the throughput before you restore the config though just to make sure it's not a problem with the config file.
That was my thinking too, however seeing it in writing gives me the idea to shut down all my packages and do a default config just to see if the symptoms remain on my current install. Thanks :)
I assume your VoIP VLAN completely bypasses pfSense then? One user recently had some latency issues with VLANs. There's no chance of tagged packets endding up at the pfSense NICs?
In theory, that's how it was supposed to be working. but I wasn't the one who set up the switches. Only the port on the switch that goes straight to the internet facing mikrotik is set with that VLAN PVID, however, all ports are tagged with the VLAN ID. I'm still not 100% on all the VLAN technologies. All VoIP traffic come in over a single port, so maybe I need to just tag the two ports that are needed… I'll dig a bit deeper on that, regardless if that's the cause or not. I'd prefer to have it cleaned up, not to mention I need to learn this stuff.
When you say you're not currently running traffic shaping do you mean you have done previously? Were you using rates close to the restriction you're seeing? Just speculation but maybe you have some rogue config options somewhere that were interpreted by the upgrade code incorrectly. It might be worth having a manual read through of the config file.
Traffic shaping was done on our last firewall which was smoothwall. It was decommissioned in January. I'll take a closer look through the config, thanks.
One thing you could try is downloading some data directly to the pfSense machine to check if the restriction is WAN or LAN side. E.g.
[2.1.3-RELEASE][root@pfsense.fire.box]/root(1): fetch -o /dev/null http://download.thinkbroadband.com/10MB.zip /dev/null 100% of 10 MB 2067 kBps
Thinkbroadband is a good site for me but you might want to use something more local to you.
WAN side is acting normally. Our line is very poor, but latency is absolutely solid. We haven't had full throughput in several weeks. I tried fetching a couple files from the shell, and mixed with watching iftop and pinging external DNS servers showed no irregularities.
Thanks again for all the input. Will keep you posted.
-
I installed a fresh copy of 2.1.3 on identical hardware today, and started restoring config pieces one at a time. The symptoms of high ping times started with the import of the captive portal configuration.
I deleted our only zone, and made a fresh one, leaving out the "Enable per-user bandwidth restriction" option. On 2.1.0 that option wasn't really working properly, and I had just left it.
I went back to the upgraded machine that I started this thread about, took the check mark out of "Enable per-user bandwidth restriction" and now our ping times are always as expected!
I haven't noticed our CPU usage get as high as it used to, but hey, that might actually be a good thing if everything is performing well.
I'm not going to mark this as solved just yet. Gonna keep an eye on things for a day or two first.
Thanks to all for their help!
Edit: I forgot to mention this even affected those with their MAC addresses added to passthru…
-
Well deduced. Methodical testing FTW. ;D
Steve
-
Everything's been running smooth. Marked the subject as solved.
Thanks again!