Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5
-
@xpxp2002 I'm tried upgrading this afternoon and saw the same thing. pfctl - 100%. I'm also using Hyper-V on Server 2019 and had to revert back to a snapshot. Been using pfSense under Hyper-V for years and this is the first time I've had to revert an upgrade.
-
@swinn Perhaps there is something about these virtualized instances that is a problem, as @muppet suggested. Looking through the release notes, I don’t see anything specifically calling out virtualization that seems like it would cause this. Unless simply going to FreeBSD 11 is the issue.
I’ve also run pfSense on Server 2016 for years without issues prior to this. Ran into an issue with Server 2019 and receive segment coalescing causing weird packet drop issues when I first went to 2019, but once I disabled RSC the issue went away. But this is the first time a pfSense upgrade didn’t go smoothly for me either.
-
We haven't had a big FreeBSD jump though, 11.2p10 to 11.3
I can't find any release notes either that say something major/odd has changed with Virtualization in 11.3
-
I’ve been using the snapshot to test individual settings and packages that seem like possible culprits. As I mentioned before, the bootup hangs on the first “configuring firewall...” so I’ve tried removing settings and packages that I expect would be initialized when the filter rule load is occurring, then performed the upgrade.
So far, I’ve ruled out queues/limiters, pfBlocker-NG-devel, and Service Watchdog.
-
The only packages I have installed are:
- Avahi
- OpenVPN-Export
Oh and I have fq_codel configured.
-
@muppet I also have both of those. I’m out of time for testing but either one of those could be the culprit.
My first thought goes to Avahi trying to come up on an interface where it isn’t supported, but it could also be the OVPN export package struggling with the new version of OVPN.
-
Maybe avahi could cause problems, that I could understand.
OpenVPN export isn't even called until you visit that page. -
I upgrade six (6) pfsense production server at the same time from 2.4.4_p3, and I had problem with the conectivity. The ping time is very high above 7.000ms.
I tried upgrade my pfsense server at home from 2.4.4_p3, but in this case I did a snapshot on vmware, and the problem is same. The ping time is very high and the navigation have a lot of problems.
I restored the snapshot, and all return to normally
At all server I have installed this packages:
Open-VM-Tools
openvpn-client-export
squid
snort
zabbix-agent4I tried reinstall all packages, but the problem persist
-
Same troubles after upgrade from 2.4.4 to 2.4.5 on Hyper-V Windows Server 2019, 100% CPU usage (by pfctl process), long boot, and pfSense works with spikes and hangs.
It seems that 2.4.5 not compatible with Hyper-V Windows Server 2019.
Maybe it related:
https://forum.netgate.com/topic/149595/2-4-5-a-20200110-1421-and-earlier-high-cpu-usage-from-pfctl/8 -
I'm having the same problem. I'm running 2.4.4.-p3 on Server 2016 with Hyper-V. I tried upgrading my 2nd CARP node to 2.4.5 yesterday, but it pegged the CPU and never became stable. I reverted that snapshot, shut it down and tried to upgrade my 1st CARP node, but the same problem. I've reverted both nodes to the snapshots.
pfSense on Hyper-V has been rock solid up until now and all previous upgrades have been flawless.
If I have time, I'll try installing a 2.4.5 VM from scratch to see if the problem occurs there too.
-
I have made clean reinstall system with catching config from updated system, first time boot was fast, then all packagers was restored (installed), after that system stuck at boot and lags after.
Then i have found a source of problem — pfBlockerNG! When it's disabled, all works good, after enabling pfBlockerNG system lags totally. -
@Gektor This is interesting. I had pfBlockerNG-devel installed on 2.4.4-p3. One of my earlier tests was to roll back to 2.4.4-p3, uninstall that package, then upgrade; and my system was still slow. Did you simply disable it, or uninstall the package?
I will try this later today when I have an outage window.
-
Mine is pfBlockerNG version 2.1.4_21, with this setting all works good:
Then i have disable all GeoIP lists, but enable DNSBL, and enable pfBlockerNG, and for now there is no problems with pfSense 2.4.5 on Hyper-V. System makes "crazy" when GeoIP lists is enabled in pfBlockerNG.
Have make post, maybe it will be helpful:
https://forum.netgate.com/topic/151726/pfblockerng-2-1-4_21-totally-lag-system-after-pfsense-upgrade-from-2-4-4-to-2-4-5 -
@Gektor I deleted all the installed packages:
Open-VM-Tools
openvpn-client-export
squid
snort
zabbix-agent4and I disabled OpenVPN links unpriority; and the system conectivity was restored
-
@gusfersa On another production server with the same installed packages, only I disabled OpenVPN link to an another pfsense server 2.4.5, and the system conectivity restored
-
I've noticed something similar in terms of memory usage, but in my case cpu nice dropped in half and otherwise everything else seems status quo.
I'm not however noticing any latency outages or anything of that nature, but i've got plenty of free RAM so maybe that's the difference.
-
@digitalgimpus said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:
I've noticed something similar in terms of memory usage, but in my case cpu nice dropped in half and otherwise everything else seems status quo.
I'm not however noticing any latency outages or anything of that nature, but i've got plenty of free RAM so maybe that's the difference.
Same here, memory utilization spikes up from <20% before upgrade to 2.4.5 (w/all the same settings and packages) to 65-80% after upgrade.
Miscreant isolated to pfBlockerNG-devel (when uninstalled, memory use goes back to <20%) - running on netgate amd64 hardware, 8gb ram.
@BBcan177 any ideas on this, did this come up in the extensive testing done for 2.4.5? Any setting that could be tweaked (memory, feeds) or is this something that will require some coding/patching?
-
in quick testing here, it appears related to the pfblocker "maxmind GeoIP settings", either deleting the key or checking the box "disable maxmind csv database updates" makes the pfblocker pages respond near instantly again and gets rid of the long boot hang-time, which I'm assuming is breaking everything else and causing flapping in a loop as it keeps trying to reload it for high latency and other things!
I haven't tested further than that and cannot guarantee that's the only issue at hand, tested on minimal configured vm with nearly no traffic, but it slows it way down in many functions. -
@t41k2m3
You are running on a physical machine and it looks like you are not experiencing any issues other than higher memory usage. That can be attributed to how many entries are in DNSBL, especially with TLD enabled. I assume it was the same as before but you didn't notice it. DNSBL in Unbound will create a pointer in memory for each domain and it can eat memory. Nothing I can do about that. The upcoming Unbound python integration will make a significant improvement in memory usage tho. -
@taz3146
Are you in a virtualized environment as the others in this thread? There seems to be some issue with pfctl (which is used to create and update the IP aliases for the firewall rules) and with some virtualization software.
I have tested with VMware ESXi and can't reproduce these issues. Sent a message to the devs to see if the have any other guidance. Alternatively, setup a physical box with the same configuration and see if the problem exists without virtualization. Then we can attest narrow down the issue.
The deselection of settings in the IP tab should have no affect on anything. When you save that page it just writes settings to the config.xml and the nothing else. Probably you have something else happening in the background.
Would also suggest that everyone review the system.log and the pfblockerng.log for any other clues.