PfBlockerNG high CPU
-
All the feeds were not enabled. Not even half of them are enabled. The wizard, which comes up automatically upon first going to pfblockerng post install, enables just those few.
To rule out whether it's the amount of blocked domains in total, or just the volume of blocked requests at the time, I will see if I can only block the aforementioned worst offender after I confirm that allowing that domain makes the performance impact disappear.
The problem has just resurfaced after completely uninstalling and reinstalling pfblockerng-devel for the previous demonstration. I expect the next spike to happen in about 15mins. I will report my findings.
-
Adding that domain to the whitelist during a spike was enough to return performance to normal. So I think that's enough to safely say the spikes result from blocking queries. The second part, whether the overall performance impact depends on the total size of the blocklist, I was not able to determine for 2 reasons.
-
I couldn't find where to simply add a single domain to the blocklist.
Please advise the simplest way to do that. -
Once I allowed the device to reach the domain, it stops continually trying, which makes sense. It's only trying over and over because it fails, obviously.
If someone can help with #1, I can rule out the second part when the device decides to phone in metrics again.edit: I figured out where to add a custom blocklist. Now we wait for the Shield to do it's thing.
-
-
Can confirm, no difference in performance with a blocklist of 1, versus the ~160,000 domains the wizard sets it up with.
The first blue mark before 0200 is when I whitelisted the troublesome domain causing no further performance issues.
The second blue mark is after I had done a fresh reinstall of pfblockerng-devel, waited for the spikes to come, then I whitelisted that same domain in the middle of a spike causing it to stop immediately.
After that I did a fresh reinstall of pfblockerng-devel, but this time opted out of the wizard and added just a single domain to a custom blocklist. At about 1700 you can see the spike returns and performance impact is no different.
Here you can see both the scale of the queries and that there's only 1 domain:
Can you guys confirm that this is line with what is to be expected for DNS sinkholing performance from pfblockerng? If this is a bug I'd be happy to continue testing and providing any helpful data I can. If this is working as intended, I'll just move on to another solution and quit bothering you guys.
-
You've probably a device that tests if it has a 'Internet' connection, and if so, it (tries to) resolves a host name, like "update.device-brand.tld".
If that fails - because the answer it received was "10.10.10.1" (the default internal pfBlockerNG build in webserver ) it will fail to connect to what it wants : the device's update server at "update.device-brand.tld".
It could be stupid, and re try, using the same host name, or another host name from an internal, build in list. Let's face it, some appliances coded in "the east" are not coded as appliances coded in "the west" (from where I'm based ^^).@lunaticfringe80 said in PfBlockerNG high CPU:
then I whitelisted that same domain in the middle of a spike causing it to stop immediately.
Just to be sure : you do see the Firewall >= pfBlockerNG =>Alerts and on that page, somewhat mower : DNSB, getting populated with blocked domain names ?
@lunaticfringe80 said in PfBlockerNG high CPU:
If this is working as intended, I'll just move on to another solution and quit bothering you guys.
Your kidding ?
This is a 'free' forum : no one has to reply here. Those who do are all looking for the same thing : it's not about you being right or wrong, or us. Just all of us trying to understands what happens. If an issue can be found, it could be corrected and everybody will benefit from it. -
I'm not at home these days. But i can confirm that I have a nvidia shield as well.
Regarding IoT devices I have a very easy setup. Isolated network with dns 1.1.1.1 configured in the dhcp server. Therefore the IoT devices should not be filtered by pfblockerng. Openly speaking I don't really care if they phone home and send metrics.
Pfblockerng is used only to protect my "normal devices" and to avoid seeing annoying ads.
Devices: approx 10 IoT and 10 "normal devices"
When I'm back from my vacation I will come back to the topic. There are a few interesting hints I want to check. Thank you guys!
-
@Gertjan said in PfBlockerNG high CPU:
You've probably a device that tests if it has a 'Internet' connection, and if so, it (tries to) resolves a host name, like "update.device-brand.tld".
If that fails - because the answer it received was "10.10.10.1" (the default internal pfBlockerNG build in webserver ) it will fail to connect to what it wants : the device's update server at "update.device-brand.tld".
It could be stupid, and re try, using the same host name, or another host name from an internal, build in list. Let's face it, some appliances coded in "the east" are not coded as appliances coded in "the west" (from where I'm based ^^).The device continues to function without issue while the domain is sinkholed. Updates work and it's able to reach all my streaming services and play content fine.
Just to be sure : you do see the Firewall >= pfBlockerNG =>Alerts and on that page, somewhat mower : DNSB, getting populated with blocked domain names ?
I definitely see entries on the alerts page. Here's an example:
I found an acceptable workaround yesterday that may indicate where the problem is. Rather than set the sinkhole IP to an actual address, I copied what Pihole does and set it to 0.0.0.0 instead. This completely fixed the performance problem. There were no spikes in CPU usage or state table usage at all. This also made all the logging and beautiful alerts and stats in pfblockerng stop working, but the DNS sinkhole continued to function as usual. I was actually able to set the VIP to a proper address and then force reload dnsbl to get the above image just minutes ago.
This is an acceptable workaround for the time being, but I'd still like to figure out the problem. The fact that @Frosch1482 confirmed they also have an Nvidia Shield indicates this will be a problem for owners of that device. I mistakenly said it was my 2015 model previously, but I've since realized it my newer 2019 Pro model at fault here.
How does pfblockerng handle that VIP address? Since all the logging stops when I set it to 0.0.0.0, I assume it is either listening on the VIP address to receive this data or watching for requests to route to that non-existent private network. So it's not logging DNS queries like pihole does, but actual packets that are destined for that VIP address, right? The performance issue has to be somewhere in that process, but since I've only been using pfsense for less than a month, I'm at the limit of my knowledge.
-
Just in case you haven't seen this, you might give it a look as to how to setup pfBlockerNG with DNSBL: https://linuxincluded.com/block-ads-malvertising-on-pfsense-using-pfblockerng-dnsbl/
-
Hello!
@lunaticfringe80 Is your pfb dnsbl vip and network devices in the same subnet? The default vip is 10.10.10.1 and it looks like shield1 is at 10.10.10.32. It might not be an issue, but the pfb dnsbl vip instructions do say, "This address should be in an Isolated Range that is not used in your Network."
John
-
@serbus said in PfBlockerNG high CPU:
Hello!
@lunaticfringe80 Is your pfb dnsbl vip and network devices in the same subnet? The default vip is 10.10.10.1 and it looks like shield1 is at 10.10.10.32. It might not be an issue, but the pfb dnsbl vip instructions do say, "This address should be in an Isolated Range that is not used in your Network."
John
My network is 10.10.10.0/25 so I've been using something like 10.222.222.1 for the VIP address. I mentioned in posts above that I changed it from default, but this thread has gotten long so I can't blame you for not seeing that.
For good measure I've just tested it using a completely different private network class by using 172.31.250.250 and it's the same performance problem as when I have it set to 10.222.222.1.
-
Hello!
Oh, sorry I missed that...
I am not an expert on any of this, but what you are seeing is not normal for any of my networks that are running similar setups with pfb and 10-20 devices. I am running the straight, stock pfb dnsbl feed config. There are normally 600-1200 states active throughout the day. That spike to 14-15k states just doesn't look right. The States Summary table should show you total state counts for each local/remote ip. Do have a bunch of IPs with hundreds of states open, or a couple of IPs with thousands? Maybe a single rogue device opening almost all of them? That is where I would start looking...
John
-
@serbus said in PfBlockerNG high CPU:
Maybe a single rogue device opening almost all of them? That is where I would start looking...
I've confirmed it is a 2019 Nvidia Shield sending an incredible amount of repeated traffic to a blocked domain. The OP of this whole thread already confirmed they also have Shield so I'm confident we know the source.
At this point the question is why pfblockerng is unable to handle sinkholing this device's requests without such a major performance impact? Pihole was able to handle the same task on far weaker hardware.
Pihole uses 0.0.0.0 as the query response. Setting the VIP in the DNSBL settings to 0.0.0.0 solves the performance problem, but completely disables the logging and stats entirely. Pihole only logs the DNS queries so its stats are unaffected by using 0.0.0.0.
Different systems handle traffic to 0.0.0.0 differently:
As you can see, linux responds like it's localhost and windows acts like it doesn't exist.
This tells me that the problem is somewhere in how the system handles the packets it receives on the VIP address, since when you set the DNSBL VIP to 0.0.0.0 it only receives the DNS queries and no additional related traffic as it would if it were directing that sinkholed domain traffic on a VIP it was listening on.
-
Hello!
Yeah, I don't know enough about the design or innards of pihole vs pfb/pfsense to answer. Personally, I wouldnt be too concerned about a device running at those cpu and state levels for brief periods. I would be more worried about the crazy device making all those connections.
You can create a host override in the dns resolver for api.amplitude.com -> 0.0.0.0
This would return 0.0.0.0 but also 10.10.10.1 (or whatever).You could also do a dns resolver custom option of:
server:local-zone: "amplitude.com" always_nxdomain
...to just nuke the request.
Not sure how the shield would respond to either of those options, but they might shut it up and let you keep the pfb dnsbl running on a workable vip.
John
-
@serbus said in PfBlockerNG high CPU:
Yeah, I don't know enough about the design or innards of pihole vs pfb/pfsense to answer. Personally, I wouldnt be too concerned about a device running at those cpu and state levels for brief periods. I would be more worried about the crazy device making all those connections.
The problem is that if I let it continue for several days, eventually name resolution completely fails until I restart pfblockerng or pfsense entirely. I'm certainly not going to get rid of my Nvidia Shield just because pfblockerng can't handle sinkholing it's attempts to phone home metrics. The only comparable device is Apple TV and there's no way I'm going there. I would just go back to using Pihole like I have for years.
@serbus said in PfBlockerNG high CPU:
You can create a host override in the dns resolver for api.amplitude.com -> 0.0.0.0
This would return 0.0.0.0 but also 10.10.10.1 (or whatever).You could also do a dns resolver custom option of:
server:local-zone: "amplitude.com" always_nxdomain
...to just nuke the request.
Not sure how the shield would respond to either of those options, but they might shut it up and let you keep the pfb dnsbl running on a workable vip.
John
I like this idea, but it doesn't work. It actually results in two addresses when you look up the domain, 0.0.0.0 and the DNSBL VIP address. The performance issue returns as well.
-
Hello!
The always-nxdomain approach should return "domain doenst exist". Maybe it is cached somewhere.
Or...
server:ratelimit-for-domain:amplitude.com 1
...might slow it down enough.
John
-
@lunaticfringe80 said in PfBlockerNG high CPU:
Nvidia Shield
Just looking at the Nvidia Shield. How is it connecting to your pfSense? I see that is capable of connecting via WIFI and hardwired via network cable. I assume your using it via network cable. Is it using a USB network interface? What kind of network interface card are you using in your pfsense? Maybe you have mentioned it, but what kind of hardware are running pfsense on: CPU, memory, network interfaces, etc?
I'm just trying to see if maybe there is something else that hasn't been hashed over that could be causing your issue?
-
@jdeloach said in PfBlockerNG high CPU:
@lunaticfringe80 said in PfBlockerNG high CPU:
Nvidia Shield
Just looking at the Nvidia Shield. How is it connecting to your pfSense? I see that is capable of connecting via WIFI and hardwired via network cable. I assume your using it via network cable. What kind of network interface card are you using in your pfsense? Maybe you have mentioned it, but what kind of hardware are running pfsense on: CPU, memory, network interfaces, etc?
I'm just trying to see if maybe there is something else that hasn't been hashed over that could be causing your issue?
I have the Shield wired. pfSense is on an Odyssey X86 with Intel NICs and Celeron J4105. Specs here.
-
Example of the DNSBL main file : /var/unbound/pfb_dnsbl.conf
local-zone: "film" "transparent" local-data: "adservice.google.film 60 IN A 10.10.10.1"
What we know :
and consider this
So, in my case, the IP 10.10.10.1 has two ports being "served" by a web server : 8081 and 8443.
The idea is understandable : when you look at a web page using a web browser, all URL's on that page that load publicity from remote web sites will get pointed to 10.10.10.1 - and port 80 will get redirected to 8081 (443 to 8443 - which is nice, but won't work) : The pub 'pages'/sections, the content, will get send by .... not the pub generator web site but our internal pfBlockerNG web server that lives at 10.10.10.1.
Now consider some code on some devices : it isn't loading web pages, but other data, using other ports.
Just think about it : look at api.amplitude.com [ edit have a look at https://amplitude.com .... is this Cambridge Analytica 2 ?? ]
That's an URL - being blacklisted probably as "amplitude.com" that gets blocked .... euh ... redirected to our 10.10.10.1.
I'm sure for 99,99 that it is not requesting things on port 80 or port 443. So our NAT rules don't kick in.
It hammers 10.10.10.1 with all the processor aviable power and bandwidth because it didn't get an 'valid' answer The main issue is : the code ius written right out a pile off bll sht - and a first from you guys would be : kill the app on the device that is doing this ( and problem solved ).My idea : what about a pfSense firewall rule that blocks (orrejects) any ports on 10.10.10.1, except our 8081 and 8433.
This way there won't be 15k states because some low-bud-program goes totally haywire.
I bet the pi-hole does just that.The 10.10.10.1 web server only makes sense for http requests .... not some unknown API traffic.
edit : some one can tcpdump this box/device ? I really start be curious what these 'boxes' pomp in and out ....
edit again :
A rule like this :
I choose the floating tab, because the other 2 rules that handle 10.10.10.1 access are already there.
I added a third rule, blocking all traffic going to 10.10.10.1 -
I've checked the state tables again to confirm the destination port and it's always 443. It looks like this, but there's pages and pages of it repeated (172.31.250.250 is my DNSBL VIP):
I like your idea, so I thought perhaps I could use the same strategy to simply block that device entirely, and gain back DNSBL logging for the rest of my network, but that doesn't work either. What am I doing wrong? This is what I have:
Another thing that's been bothering me is the disparity between the blocked packets and the total queries as shown here:
Is it normal for it to have so many more packets than queries? It appears to be throwing that percentage off since clearly 100% of my queries have not been blocked. I'm wondering if somehow the traffic is being duplicated or increased in some manner causing the unusual impact to performance?
-
I couldn't get the floating rule to work, but I was able to use the port forward rules to essentially change the sinkhole address just for the Shield like this:
This is just an improved bandaid, though. I just get DNSBL logging back for the rest of my network. I still think the disparty between blocked packets and total queries is part of the core issue.
-
@lunaticfringe80 said in PfBlockerNG high CPU:
I've checked the state tables again to confirm the destination port and it's always 443.
This means the pfBlockerNG-devel's web server throws out the 'blocked page' on every request.
Because the Shield device is stupid enough to re try again, and again there is actually just one real solution : have it repaired/updated/moved.
A web browser wouldn't do that when it loads a page with publicity content.
Your idea to sink hole it seems good !