V 3.2.0 with pfsense 23.01 RC 20230202

jimp

OK, that may be a different problem. The patch definitely helps with some things (on Redmine, the reporter said it fixed MaxMind downloads), but there may be something else going on with the code for wildcard blocking.

Juggernaut

@jimp FYI..I updated to RC 23.01 last night and am having the same issue when wildcard blocking is checked..I uncheck and all is well.

amviewer

Having the same issue on 23.01RC,

Removed the package unchecking "keep settings", reinstall and it just hangs on the update.

greenflash

I am also facing severe problems with pfsense 23.01-RC and pfblocker 3.2.0.

For me, the pfBlocker cron job also hangs on the HTTP 200 message.
In fact, it doesn't really hang fully, it just takes a really long time to process a blocklist and continue with the next one. Previously (on pfsense 22.05 and pfblocker 3.1.0), processing for one block list takes about 20 seconds on my system, but on pfsense 23.01-RC and pfblocker 3.2.0, the same list takes over 30 minutes.

Then I thought, I will just let the cronjob run until it has finished even if it takes hours, as it can just run in the background.

The next morning I saw that unbound has was killed because of out-of-memory errors.
After examination of the logs, the whole cronjob took about 16 hours, instead of the normally less than 10 minutes.

After the oom errors, I couldn't get unbound to restart on 23.01, therefore I reverted to 22.05.

Edit1: I do have 8GB RAM in my system (on 22.05 it used between 3-5GB), but now I have even ungraded to 16GB, which also has not helped for my oom errors with unbound.

Note: I do NOT have Wildcard Blocking (TLD) enabled.

pfsjap

@greenflash Have you applied the patch above by jimp? I also did/do not have wildcard tld blocking enabled, but still had the slow reload problem. After the patch feeds get processed in a timely manner.

greenflash

@pfsjap Not yet, but I will try that.
But I highly doubt, that it will fix my out-of-memory errors.

The out-of-memory errors could be the same described here: https://forum.netgate.com/topic/177559/23-01-r-20230202-1645-2500-mb-laundry

amviewer

Ran into the same problem with the memory usage.
Not sure if the other issues I had with 23.01 came from PfblockerNG but IPv6 had some weird issues which I couldn't resolve.
For now I rolled back to 22.05.

greenflash

@pfsjap I am now back at 23.01-RC1 and have applied the patch from jimp sucessfully.
This has solved my issues with the long reloading (at least to a certain point).
The patch has indeed reduced the reload time from ~16 hours (23.01 without the patch) to arround 2,5 hours (147 minutes exactly).
While this is a significant improvement, the reloading times are still way slower than on 22.05, where a full reload normally is faster than 20 minutes.

But the much worse issue for me still persits:
After the full reload of all the block lists, unbound gets restarted, but then gets killed again because it has used all of the available memory.

So for me the memory issues could be tracked down to unbound, see:

[23.01-RC][fabian@***]/home/fabian: top
last pid: 44352;  load averages:  1.71,  0.87,  0.80                                            up 0+03:01:07  12:01:55
69 processes:  2 running, 62 sleeping, 5 waiting
CPU: 24.3% user,  0.0% nice,  2.8% system,  0.4% interrupt, 72.5% idle
Mem: 9845M Active, 1156K Inact, 3833M Laundry, 1662M Wired, 337M Free
ARC: 1121M Total, 713M MFU, 384M MRU, 1312K Anon, 15M Header, 8207K Other
     1035M Compressed, 6602M Uncompressed, 6.38:1 Ratio
Swap: 2048M Total, 1251M Used, 796M Free, 61% Inuse, 153M Out

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
39855 unbound       1 135    0    14G    13G CPU1     1   1:58  99.88% unbound
  916 root          1  68    0   170M    27M accept   0   1:08   0.62% php-fpm
 2584 root          8  20    0   968M   187M nanslp   3  13:46   0.50% suricata
14741 root          9  20    0   980M   208M nanslp   3  13:48   0.48% suricata
32505 root          1  20    0    31M  4240K kqread   2   0:01   0.06% nginx
82817 fabian        1  20    0    14M  3020K CPU0     0   0:00   0.04% top

(top just before unbound gets killed because of oom)

pfsjap

@greenflash Is your DNSBL Mode set to Unbound Mode? Changing to Unbound Python Mode may help.

I have 1GB RAM on Netgate 1100 with 260MB set to RAM disk. Not that many feeds, though.

greenflash

@pfsjap currently it is already set to Unbound Python Mode. I do have 16GB of RAM installed, on 22.05 the whole system never used more than 5 GB at maximum.

I have not a changed any of my block list feeds since the upgrade. Therefore there has to be some kind of an issue with memory management (or even a memory leak) regarding pfblocker and unbound on 23.01.

pfsjap

@greenflash Ok, hopefully you'll get help with this.

cmcdonald

@greenflash I worked on Unbound quite a bit over the past month tracking down memory-related issues.

What does your DNSBL setup look like?

greenflash

@cmcdonald said in V 3.2.0 with pfsense 23.01 RC 20230202:

What does your DNSBL setup look like?

Do you mean this settings page?

spoiler

Or this one:

spoiler

emikaadeo

@tcw said in V 3.2.0 with pfsense 23.01 RC 20230202:

No change. Confirmed the patch applied. Updated to 23.01.r.20230202.1645 from 23.01.r.20230202.0019 yesterday and confirmed successful pfBlockerNG force reload all, before and after the update, and before and after applying the patch, with success as long as Wildcard Blocking (TLD) is unselected.

The "TLD finalize.." step seemed to take just a couple of seconds on 22.05 with my hardware, so I don't believe it's an issue of my not waiting long enough (especially now since the patch seems to have corrected a typo to enforce timeout in 15 seconds).

Let me know how else I may be able to help.

Finally got time to upgrade to 23.01-RC and can confirm that with Wildcard Blocking (TLD) feature enabled the update/reload process hangs on "TLD finalize..."
There's a Redmine ticket for this issue: https://redmine.pfsense.org/issues/13884

tcw

@jimp's patch just got applied to an updated pfBlockerNG v. 3.2.0_1. (Thanks!) That appears to have been the only change.

Gertjan

Using the latest pfSense RC :

23.01-RC (amd64)
built on Wed Feb 08 06:11:39 UTC 2023

TLD Whitelist selected.
I'm here :

UPDATE PROCESS START [ v3.2.0_1 ] [ 02/8/23 11:13:01 ]

===[  DNSBL Process  ]================================================

Loading DNSBL Statistics... completed
Missing DNSBL stats and/or Unbound DNSBL files - Rebuilding

Loading DNSBL SafeSearch...  enabled
Loading DNSBL Whitelist... completed
Blacklist database(s) ... exists.

[ StevenBlack_ADs ]		 Downloading update .. 200 OK.
 Whitelist: 15.taboola.com|aax-eu.amazon-adsystem.com|adsafeprotected.com|am-match.taboola.com| ..... snipped
 Orig.    Unique     # Dups     # White    # TOP1M    Final                
 ----------------------------------------------------------------------
 177888   177888     0          97         0          177791               
 ----------------------------------------------------------------------

------------------------------------------------------------------------
Assembling DNSBL database...... completed [ 02/8/23 11:13:13 ]
TLD:
TLD analysis.. completed [ 02/8/23 11:13:17 ]
TLD finalize..

and I understand why :

The /tmp/dnsbl_tld_remove file - the list with TLDs to remove is 37000+ lines.
The /var/unbound/pfb_py_data.txt.raw file 133608 lines

[edit]
From what I make of this : each of the 37000+ lines is checked (grepped) with every line in the 133608 file.
So, 37000 times 133608 'greps' to be executed.
That's huge ....

And I have only one dnsbl feed - with "133608" dnsbl entries.
[end edit]

I copied both files to /root/ and repeated the command 'on the command line'.
This command is great to max out one core, 100 %, and it will take minutes if not hours to complete.

pfblockerng-devel does this with PHP handling the return (output). That will make things even worse.

49 degrees and rising. Of to the kitchen, looking for some eggs.

I guess not using (unchecking) Wildcard Blocking (TLD) is the best option right now.

emikaadeo

@gertjan said in V 3.2.0 with pfsense 23.01 RC 20230202:

I guess not using TLD Whitelisting is the best option right now.

I'm not using TLD Whitelist
My DNSBL Mode is set to "Unbound python mode" and as pfBlockerNG states: "TLD Whitelist is not utilized for Unbound python mode! Use DNSBL Whitelist instead."
The main problem is when Wildcard Blocking (TLD) is enabled.

Gertjan

@emikaadeo
You're right :

That's the one :

petrt3522

I believe mine is now having the same issue with my upgrade to 23.01-Final. I manually did a reload and it's at 20 minutes, stuck on "TLD finalize."

I did have an error: On it's first boot I got a banner about this extensive error: https://pastebin.com/aj8q4Mjw than that, It seems to work fine and appears to be passing traffic across 2 VLAN and 1 WAN.

4NVXr3wHBnQYsHwE

This happened to me today as well and likewise disabling Wildcard Blocking (TLD) worked around it. grep was stuck at 100% CPU utilization for several minutes otherwise.