PfBlockerNG high CPU

lunaticfringe80

@Cool_Corona The DNS issue was a dumb oversight. Unbound was set to go out on a VPN interface with a domain name which of course won't work if it can't resolve the name. I appreciate the helpful reply, though.

The real issue here is the pfblockerng performance, which doesn't seem to be working for everyone.

Cool_Corona

@lunaticfringe80 said in PfBlockerNG high CPU:

@Cool_Corona The DNS issue was a dumb oversight. Unbound was set to go out on a VPN interface with a domain name which of course won't work if it can't resolve the name. I appreciate the helpful reply, though.

The real issue here is the pfblockerng performance, which doesn't seem to be working for everyone.

No worries. I get that, but we need to eliminate a lot of things first to get to the problem.

What packages are you running and are you using thePFB Devel or std. package?

lunaticfringe80

@Cool_Corona

I've tried both so far. Currently testing the devel package, and the problem has just re-emerged.

Same as before

What I did was uncheck the box to keep settings, completely uninstalled pfblockerng-devel, reinstalled pfblockerng-devel. I ran the wizard, and then went in to the feeds and disabled all but just one feed per group, leaving the top one in each. I set the update frequency to once a day and then entered my maxmind key. Then I ensured pfblockerng and dnsbl were both enabled. It was fine for about 5 hours and then it begins again. There's correlating state table usage as well:

Last 6h

Now here it is over the past 5 days. The first group is from the standard pfblockerng package, second group is from the devel package.

last 5d

It just keeps repeating that pattern until I restart it, then it comes back within minutes to hours.

I've tested it with and without suricata, snort, squid, badnwidthd, darkstat, apcupsd, and clamd running. I've run it with none of those, and variations of different ones running. No difference in result. The only other things consistently running have been openvpn and then telegraf for monitoring.

Cool_Corona

Have you tested LAN for malware/spyware??

Have you noticed anything suspicious in the states table???

lunaticfringe80

This post is deleted!

lunaticfringe80

@Cool_Corona

There's only two machines running windows, one is my primary desktop, and the problem happens while it is completely powered down. It gets regular scans and I'm cautious online so I'm pretty confident it's clean. The other is a work laptop that goes through corporate VPN before I even login, and these events happen while it is offline too.

Everything else runs Linux and I'm pretty confident they are not compromised.

There's a lot of IoT on the network, the two Nvidia Shield's are pretty chatty and were top of my pihole block counts in my previous setup, but if those cause this problem it would kinda defeat the purpose. Stopping devices like that from phoning home is part of why I'm sinkholing dns to begin with.

I did not realize I could directly monitor the states table, I see it in the diagnostics menu so I will watch it during these events.

Gertjan

Interesting images.

The states are not created by pfBlockerNG. These are created by network devices that start to do their thing, like nightly backups.

Just a theory :
I've seen iPhones/iPad's, that, while they hey are Wifi connected and being charged and stand-by and have the Apple's iCloud's storage activated, really hitting hard the network. That is, I've seen it on this forum. Never with on my own networks although I've the same equipment / configuration.
Also : one would be using a feed that have listed the Apple's cloud services as a DNSBL (you see the upcoming issue ??) and unbound / pfBlockerNG start to do what they are asked for. This can be seen as a huge miss configuration.
Remember what really happened to HAL - in 2001 movie ? Feed a system with information that is in contradiction,and you brings it down.

Apple has many iCloud servers pre defines in it's OS, and if one is unreachable, it will try another one. Leaving a state aborted. This will add up quickly.
This is a Apple example, it could be any device type / service.

You have the exact moment when its happens, this should help to determine which device is doing what at that moment.
If pfBlockerNG is not active, do you see the same state table bursts ?

lunaticfringe80

When DNSBL is disabled, the state table bursts do not happen. They only happen when it is enabled and when the correlating CPU bursts happen.

No Apple devices at all on my network, but there are several Amazon Echos.

Cool_Corona

@lunaticfringe80 said in PfBlockerNG high CPU:

When DNSBL is disabled, the state table bursts do not happen. They only happen when it is enabled and when the correlating CPU bursts happen.

No Apple devices at all on my network, but there are several Amazon Echos.

Around 24hrs between the bursts?? Ends at 2100hrs and begins again at 2100hrs. Then it runs for quite some time and happen 24hrs later again??

Is that about correct?

and can you disable ANYTHING other than pc's on the network?? And then turn on IoT one by one??

lunaticfringe80

Almost exactly 1 hour between bursts. Not always exactly on the hour, sometimes a few minutes after. Looking at the previous test over the past 5 days those were starting at about 36mins after the hour. At first I thought it was the default hourly feed updates, but I have since made a point to change the updates to daily to rule that out.

Cool_Corona

@lunaticfringe80 said in PfBlockerNG high CPU:

Almost exactly 1 hour between bursts. Not always exactly on the hour, sometimes a few minutes after. Looking at the previous test over the past 5 days those were starting at about 36mins after the hour. At first I thought it was the default hourly feed updates, but I have since made a point to change the updates to daily to rule that out.

Its a 24hr timespan on the graph.... and states skyrocket at exactly 2200hrs.

Pretty suspicious to the point where there is definately a backup running....

lunaticfringe80

@Cool_Corona said in PfBlockerNG high CPU:

Its a 24hr timespan on the graph.... and states skyrocket at exactly 2200hrs.

Pretty suspicious to the point where there is definately a backup running....

The first graph image is a 6 hour timespan. The blue mark on the top graph marks when I did the complete reinstall of pfblockerng-devel.

The second graph image is a 5 day timespan of the same metrics, where pfblockerng was disabled where the previous streak of bursts ended, up to the second mark on the cpu graph which matches up to the mark on the above graph image indicating where pfblockerng was enabled again.

The first mark on that second graph image was just me marking when I turned on all my systems to begin work that morning just so it could be seen when regular activity begins.

lunaticfringe80

Here's a fresh capture to show you the trend is repeating like it did before:

6hr

The baby spike in between the first two is actually a small download over openvpn and is unrelated to the issue.

Just before I captured this I completely stopped all docker containers on my network to rule out any running services causing this. No backups were running. I even rebooted my Home Assistant RPi3 with no effect as you can see on the graph.

Only thing left to rule out on the network are the IoT devices and Nvidia Shields. I'll have to do that tomorrow.

Cool_Corona

@lunaticfringe80 said in PfBlockerNG high CPU:

Here's a fresh capture to show you the trend is repeating like it did before:

The baby spike in between the first two is actually a small download over openvpn and is unrelated to the issue.

Just before I captured this I completely stopped all docker containers on my network to rule out any running services causing this. No backups were running. I even rebooted my Home Assistant RPi3 with no effect as you can see on the graph.

Only thing left to rule out on the network are the IoT devices and Nvidia Shields. I'll have to do that tomorrow.

There is a spike in CPU at 1800hrs every day.

Again the state table spikes comes every hour starting at 2200hrs.

Shutdown everything on the LAN and start with 1 thing at a time....

Gertjan

@lunaticfringe80 said in PfBlockerNG high CPU:

but there are several Amazon Echos.

I'm pretty confident now that you have devices that "do their thing" at certain times.
And
You activated a feed (on, or more) in pfBlockerNG-devel that block these devices - that is : it blocks the addresses / IP's these devices are trying to access.

Easy solution : stop using these devices. stop using pfBlockerNG-devel.

Or continue, and correct things. You've got all the tools at your disposal.
When the issue happens, go to the pfBlockerNG-devel page. The most recent IP's / DNSBL listed now are the ones your device(s) are tying to use to do what they have to do .... and they have a whole list of addresses to use / test, because hosts like "amazon" do not have one host or IP, but thousands of them. Your device(s) are testing them all. They all get blocked. Your state table, processor usage explodes.

At the same moment you instructed (who else is choosing the feeds ??) with pfBlockerNG-devel that these addresses / IP's are 'forbidden'. Because they are on a (or more) list(s).

The issue can now be solved in a couple of seconds.

lunaticfringe80

@Cool_Corona said in PfBlockerNG high CPU:

There is a spike in CPU at 1800hrs every day.

Again the state table spikes comes every hour starting at 2200hrs.

Shutdown everything on the LAN and start with 1 thing at a time....

There are no spikes at 1800 in the previous days. They also do not start at 2200. Here are closer views of those days. Keep in mind pfblockerng was not running the entire time. The spikes start and stop only because I am enabling and disabling pfblockerng. I end up disabling pfblockerng because name resolution eventually fails completely until it is disabled or restarted.

14th

15th

lunaticfringe80

@Gertjan said in PfBlockerNG high CPU:

@lunaticfringe80 said in PfBlockerNG high CPU:

but there are several Amazon Echos.

I'm pretty confident now that you have devices that "do their thing" at certain times.
And
You activated a feed (on, or more) in pfBlockerNG-devel that block these devices - that is : it blocks the addresses / IP's these devices are trying to access.

Easy solution : stop using these devices. stop using pfBlockerNG-devel.

Or continue, and correct things. You've got all the tools at your disposal.
When the issue happens, go to the pfBlockerNG-devel page. The most recent IP's / DNSBL listed now are the ones your device(s) are tying to use to do what they have to do .... and they have a whole list of addresses to use / test, because hosts like "amazon" do not have one host or IP, but thousands of them. Your device(s) are testing them all. They all get blocked. Your state table, processor usage explodes.

At the same moment you instructed (who else is choosing the feeds ??) with pfBlockerNG-devel that these addresses / IP's are 'forbidden'. Because they are on a (or more) list(s).

The issue can now be solved in a couple of seconds.

This would be incredibly disappointing if it's the case. I was previously running Pihole on a Raspberry Pi Zero and it was blocking these devices from phoning home while keeping load at around 0.25 on that weak device. It had 990,000 domains in its blocklist and did not struggle at all.

Replacing pihole was my primary motivation for putting pfsense, rather than other available firewall distros, on this Odyssey X86 to begin with because I was told by many that pfblockerng was the superior solution. If it really can't even keep up with a few IoT devices without eventually causing an outage, then it's back to the drawing board for me.

I'll flip some breakers tomorrow to confirm this is the case before giving up completely. I appreciate everyone's help.

Cool_Corona

@lunaticfringe80 said in PfBlockerNG high CPU:

@Gertjan said in PfBlockerNG high CPU:

@lunaticfringe80 said in PfBlockerNG high CPU:

but there are several Amazon Echos.

I'm pretty confident now that you have devices that "do their thing" at certain times.
And
You activated a feed (on, or more) in pfBlockerNG-devel that block these devices - that is : it blocks the addresses / IP's these devices are trying to access.

Easy solution : stop using these devices. stop using pfBlockerNG-devel.

Or continue, and correct things. You've got all the tools at your disposal.
When the issue happens, go to the pfBlockerNG-devel page. The most recent IP's / DNSBL listed now are the ones your device(s) are tying to use to do what they have to do .... and they have a whole list of addresses to use / test, because hosts like "amazon" do not have one host or IP, but thousands of them. Your device(s) are testing them all. They all get blocked. Your state table, processor usage explodes.

At the same moment you instructed (who else is choosing the feeds ??) with pfBlockerNG-devel that these addresses / IP's are 'forbidden'. Because they are on a (or more) list(s).

The issue can now be solved in a couple of seconds.

This would be incredibly disappointing if it's the case. I was previously running Pihole on a Raspberry Pi Zero and it was blocking these devices from phoning home while keeping load at around 0.25 on that weak device. It had 990,000 domains in its blocklist and did not struggle at all.

Replacing pihole was my primary motivation for putting pfsense, rather than other available firewall distros, on this Odyssey X86 to begin with because I was told by many that pfblockerng was the superior solution. If it really can't even keep up with a few IoT devices without eventually causing an outage, then it's back to the drawing board for me.

I'll flip some breakers tomorrow to confirm this is the case before giving up completely. I appreciate everyone's help.

Have you configured "kill states" in the rules??

lunaticfringe80

I know I did enable kill states previously, along with floating rules, but this with this current test I did not. I've just checked and confirmed that neither of those are enabled right now.

Gertjan

@lunaticfringe80 said in PfBlockerNG high CPU:

The spikes start and stop only because I am enabling and disabling pfblockerng. I end up disabling pfblockerng because name resolution eventually fails completely until it is disabled or restarted.

Ass I mentioned earlier : pfBlockerNG-devel does nothing** when it runs.
pfBlockerNG-devel prepares files that are used by unbound, the Resolver.

Try it yourself : leave pfBlockerNG-devel activated, but remove all feeds.
I bet : No more issues ^^

** it actually does collect statistics info in the back ground all the time.

@lunaticfringe80 said in PfBlockerNG high CPU:

This would be incredibly disappointing if it's the case

Wait ....
You are aware of the fact that most of these feeds are created by automated tools, and are free of use ?
To create them, big resources are need that do cost $$.
And imagine this one :
Me being a smart-ass, I managed to introduce an host name like : *.windowsupdate.microsoft.com on one of the feeds that you are using.
What do you thing will happens ? Except the serious huge Internet buz just I created because thousands of PC"'s can't update any more.
Because I managed to get an URL on a list.
And you are "disappointing" ? Don't be. Do what should always be done. After each setting change, be patient. Check what the results are. What do you expect to see ? What do you see ? This process goes over entire days, even weeks. humans :)