2.4.5.a.20200110.1421 and earlier: High CPU usage from pfctl

Gektor

@Magma82
Now i am using almost 2 days with 2 CPU cores pfSense 2.4.5 under Hyper-V Server 2019, no issues with high CPU usage at all after disabling pfBlockerNG GEOip lists.

provels

@Gektor Check out this thread. Worked for me.
https://www.reddit.com/r/pfBlockerNG/comments/fqjdc5/pfblockerngdevel_downloading_lists_but_not_able/flqzkgp/

KasselA

I am seeing this too. Running Hyper-V on a Dell R720. VM had 4 CPU assigned. No packages installed. CPU on the VM would spike to 100% for a few minutes at a time, then drop to normal briefly (no more than 30 seconds or so), then back to 100%.

Following recommendations earlier in the thread, I dropped the VM down to 1 CPU and that made everything operate normally again, as far as I can tell. Because it's not really a busy firewall, this is no real issue for me to have 1 CPU. Therefore I don't have a really urgent need to roll back to 2.4.4. I'll stay where I am until a patch is issued.

Sounds like the problem is specific to multiple CPUs on Hyper-V only.

ThEWbA

I am seeing this too. We are running Qemu 4.1.1, Kernel 5.3 (KVM) and CPU emulation Skylake-Client.

Problem started with 2.4.4 and a upgrade to 2.4.5 did not solve the issue.
Workaround is to downgrade to one (1) core.

Magma82

Netgate DEVS, the CPU performance in HyperV is definitely broken in 2.4.5 - are there any Hyper-V integration tools or libraries that are perhaps missing in the OS build?

Boot up CPU with 4 cores assigned = 100% constant at an early stage of the boot process and is barely accessible once booted.

OpenVPN = CPU is also considerably higher under load (as if the CPU isn't optimised for the VM)

I have reverted to 2.4.4 and its rock solid and under 3% CPU in use and just feels a lot more optimised.

Hardware
stable 2.4.4-RELEASE-p3 (amd64)
Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz
4 CPUs: 1 package(s) x 4 core(s)
AES-NI CPU Crypto: Yes (active)
Hardware crypto AES-CBC,AES-XTS,AES-GCM,AES-ICM

Auror990

Same issue, Server 2019 and Hyper-V, no packages installed on custom HW (Ryzen 2700) after upgrade. Pegs CPU upon boot and is basically unusable.

Set VM to 1 virtual processor to get it working but it is sub-optimal for OpenVPN clients. Even experimented with just assigning 2 virtual processors - it runs sluggish.

~~Will look to revert to 2.4.4-p3 snapshot in the near future.~~

Edit: since I had nothing to lose and this is in a test lab, I bumped up to 2.5.0 development (2.5.0.a.20200403.1017). 2.5.0 does not seem to have the Hyper-V CPU issue.

Cool_Corona

Its the same in a VM on Vsphere. I run 32 cores on a test system and they all go to almost 100% shortly after boot.

I noticed that the server started spinning its fans a lot harder and looked in the hypervisor and sure enough. Almost 100% and not handling traffic at all....

I was running 2.4.4 p3 and no issues until Suricata wont start. Then I had to upgrade and it died....

kiokoman

i made a clean install on my esxi with 4 cpu
and upgraded from 2.4.4-p3 to 2.4.5 on another server with qemu/kvm with 4 cpu westmere
both have suricata installed, never had such a problem. and i'm unable to reproduce on my test lab, must be some settings

timboau 0

Same problem here too Hvper V 2016 version 2.4.5
5GB RAM
4 CPU
pfblocker NG

Sits for ages on 'firewall' & Also DHCPv6 before booting really sluggish dropped packets galore

Dropped back to single CPU and all ok on 2.4.5

slim2016

Same problem, pfsense 2.4.4 installed on Vmware Esxi. I have suricata, pfblockerng, squid, squidguard and lightsquid installed. After upgrading to 2.4.5 the latency went haywire. However, I've managed resolve my problem, I reduced 8 vcpu to 1vcpu then did the upgrade to 2.4.5. So far everything worked fine except suricata wouldn't start, so i did a Forced pkg Reinstall. Everything worked fine after that, then I added an additional 3vcpu and it's been working fine ever since.

Uncle_Bacon

Same problem here but with a Proxmox VM on pfSense 2.4.5.
2 CPU, 2 core
8GB RAM
NUMA disabled

High CPU on "/sbin/pfctl -o basic -f /tmp/rules.debug" effectively killed my networks and VLANS, and both incoming WAN connections. pfSense would often crash and reboot automatically, which produces a crash report.

Dropping to 1 CPU, 1 core fixes it but it's running hard due to my network. 2.4.4_3 ran just peachy!

slim2016

@Uncle_Bacon Have you tried adding cpu later (after the upgrade)? I noticed that maximum vcpu is 4 before it starts going crazy.

Cool_Corona

@slim2016 said in 2.4.5.a.20200110.1421 and earlier: High CPU usage from pfctl:

@Uncle_Bacon Have you tried adding cpu later (after the upgrade)? I noticed that maximum vcpu is 4 before it starts going crazy.

I have upped it to 8 so far and it runs pretty stable. Havent noticed a crash report yet.

timboau 0

It doesn't work properly with more than one vCPU (in my experience)

slim2016

@Cool_Corona You are right, iv'e just added a total of 8 vcpu and gave it time to settle down after a boot, it seems to stabilise itself after a short while.

timboau 0

@slim2016 The point is its completely unstable with more than one cpu (when it doesn't work) including dropped packets.

This isn't acceptable to simply 'wait for it' to settle down. Also the boot times with multiple CPU are magnitudes slower that it should be, again not acceptable for a firewall.

If the root cause isn't determined are you happy for the firewall to randomly drop packets and generally die?

It's not happening for everyone but it is a bug and it needs to resolved.

The silence from NetGate is deafening. I understand its not happening on NetGate hardware - Does anyone have a subscription on a virtual machine that NetGate can address?

slim2016

@timboau-0 I was responding to Cool_Corona

Cool_Corona

@timboau-0 said in 2.4.5.a.20200110.1421 and earlier: High CPU usage from pfctl:

@slim2016 The point is its completely unstable with more than one cpu (when it doesn't work) including dropped packets.

This isn't acceptable to simply 'wait for it' to settle down. Also the boot times with multiple CPU are magnitudes slower that it should be, again not acceptable for a firewall.

If the root cause isn't determined are you happy for the firewall to randomly drop packets and generally die?

It's not happening for everyone but it is a bug and it needs to resolved.

The silence from NetGate is deafening. I understand its not happening on NetGate hardware - Does anyone have a subscription on a virtual machine that NetGate can address?

Its happening on Netgate hardware as well. They are not so fortunate to have the workaround reducing the number of cores as are the VM's.

Reducing it to 1 core and get it up and running stable is no problem. Then add cores as you like.

Yes the boot time is quicker with 1 core then with 8 cores.

Yes I would like it to be resolved as well. I think its an BSD issue and therefore needs to be forwarded in the ECO system of BSD.

I am running 8 cores as of now and no issues so far.

Uncle_Bacon

@slim2016 I haven't tried that. Unfortunately my backups don't run as deep as they should so I have no 2.4.4 backup. I am going to try a fresh install and restore config from 2.4.5 to see if that helps. Thank you for the suggestion. It's nice to have the ability to create/re-create as many instances of it that I want. I'll post back.

slim2016

@Uncle_Bacon I haven't used Proxmox for many years and when I did it was for a short while. With Esxi you just create a snapshot before you upgrade or update and if something goes wrong you just restore the snapshot.