https://oisd.nl
-
@totowentsouth said in https://oisd.nl:
At some point in the near-ish future, I'll update my routers to pfSense 23.09. Will these upcoming changes work in the next/latest version of pfSense?
It should be, yes.
Additional remarks:
- Is "Wildcard blocking (TLD)" relevant anymore? FWIW, I have abandoned use of it going forward.
Only for traditional hosts-style lists
- I noticed the DNSBL update procedure now executes a step described as "Removing redundant DNSBL entries". My unsolicited feedback is it good to see pruning of redundant entries and unfortunate that the process is as slow as it is, lengthening the DNSBL update process further. I am biased for speedy updates now because I now have an operational C implementation of the Python script to deduplicate the collective set. I realize the tools used for removing redundant entries is the limiting factor; it is what it is.
Ah, yes. It's a somewhat slow step, indeed. The silver lining is that the DNS resolution is still up while that runs, and while it does take extra processing, it's not supposed to be hitting too hard on the performance of the firewall while it's running.
But it's done this way right now because 1. adding a custom Python script or some custom native binary (compiled or otherwise) would add more complexity to the package and 2. without zero-downtime reloads, de-duplicating in the script initialization would prolong the downtime.
Since I'll be implementing zero-downtime reloads, we can do everything inside the Python initialization/reload code and not have to worry about downtime at all. The only time it would go down is on a pfBlockerNG version upgrade or when manually restarting Unbound.
Since zero-downtime reloads are essential for my use-case anyway, I'd already have to implement it, so I'm killing two birds with one stone. And it's arguably easy to do anyway (I can probably do it the next time I have more than a couple free hours, next week or so).
-
I stumbled upon this thread while looking for documentation regarding the pre/post script. I'm on 2.6.0.
Am I correct in assuming scripts defined in the "Advanced Tuneables" section do not get executed?Is there a way to enable this functionality?
Gr,
C@andrebrait said in https://oisd.nl:
@totowentsouth said in https://oisd.nl:
Yes. The current deduplication does an admirable job at deduplication despite its limitations. With the introduction of ABP style matching, the limitations of the existing deduplication are more bothersome to me albeit NOT problematic for pfBlockerNG.
Honestly, I ended up finding out it's not that bad for Python mode. It's not optimal, but it doesn't have to be.
I'd like to remove Unbound mode first and then try to improve it. DNSBL code is too full of branches right now, because of both modes being present.
FWIW #1, I have been running 57d776e635e08c7925c46a83b6b8ebbdd9f64d4c for a week or so (along with pfb_dnsbl_prune.py as a final dedup process) on both of my pfSense machines and have not observed issues. That I haven't seen an issue does not carry a lot of weight since I've not had time to poke and observe output.
FWIW #2, I have just updated both pfSense boxes to 3bea1f276e7fd557e285c999cde11202690ea81f. All is well.
I am very excited for the official release of this regex/ABP style matching. I will continue to run the 'next' versions until its release.There's a small bug in the TLD Analysis step, I think, vut I'll be fixing it tomorrow. Thanks a lot for testing it!
FWIW #3, I pushed the pfb_dnsbl_prune.py script to a separate repo https://github.com/babilon/dedup-domains/tree/main. I have also synchronized the script to my fork of FreeBSD to then easily make a system patch. I intend the license to match the pfBlockerNG's license in case anything of it may continue to be inspirational or useful.
Thanks. I'll check it out when I have time.
Question #1: In each IP Group, within the "Advanced Tuneables" section, are two list boxes. One is labeled "Pre-process Script" and the second is labeled "Post-process Script". The code that runs the selected script appears to be in pfblockerng.inc:
// IP v4/6 Advanced Tunable - (Post Script processing) if ($pfb_script_post && file_exists("{$pfb_script_post}")) { pfb_logger("\nExecuting post-script: {$list['script_pre']}\n", 1); $file_org_esc = escapeshellarg("{$file_dwn}.orig"); exec("{$pfb_script_post} {$file_org_esc} {$list['vtype']} {$elog}"); }
In each DNSBL Group, within the "Advanced Tuneables" section, are two list boxes with the same set of labels as is seen on the IP Group page. However, AFAICT, the code that would run the DNSBL's pre- and post- scripts is absent!
(Q1) Do you know if this is a known issue? A search for "advanced tuneables" resulted in one match: #12882. This is not the issue I am looking for. I considered utilizing this functionality to run pfb_dnsbl_prune.py until I realized it's per group list. This leads to Question 2:
I had never heard of or stumbled upon it. The pruning step should be integrated into the pipeline at a deeper level. It should be easier once a huge chunk of the DNSBL Unbound mode code is removed.
Question #2: What are your thoughts on a feature that allows users to optionally run a Pre- and/or Post- entire-DNSBL-process script?
I am thinking long term solution to run scripts like pfb_dnsbl_prune.py that act on the collective input/output of the DNSBL process. A solution that does not involve injecting changes to the core pfBlockerNG code to execute.It could be interesting and it would allow for a ton of flexibility, but that should likely come after removing the Unbound mode. Letting power users tap into your stuff to make it more powerful is almost always a good idea. We can think of it kinda like its own plugin system.
With Python, the sky is the limit.
-
@cukal The code to execute the pre/post scripts for "Advanced Tuneables" is in /usr/local/pkg/pfblockerng/pfblockerng.inc and AFAICT exists ONLY for IPv4 / IPv6 lists. AFAICT, the expected "exec" call for the pre/post DNSBL is absent . By intention or mistake, IDK. I pondered and puzzled the same. The verbiage on the web page alludes to this feature is intended to be operational.
@cukal said in https://oisd.nl:
Is there a way to enable this functionality?
With a bit of coding, yes. As things stand now, AFAICT, no. Having a pre-script for DNSBL would be helpful to clean up input formats that are currently foreign to pfblockerng.
The "exec" call for the pre-script in pfblockerng.inc is in this block of code:
// IPv4/6 Advanced Tunable - (Pre Script processing) if ($pfb_script_pre && file_exists("{$pfb_script_pre}")) { pfb_logger("\nExecuting pre-script: {$list['script_pre']}\n", 1); $file_dwn_esc = escapeshellarg("{$file_dwn}.orig"); exec("{$pfb_script_pre} {$file_dwn_esc} {$list['vtype']} {$elog}"); }
And the "exec" call for the post-script in pfblockerng.inc is in this block of code:
// IP v4/6 Advanced Tunable - (Post Script processing) if ($pfb_script_post && file_exists("{$pfb_script_post}")) { pfb_logger("\nExecuting post-script: {$list['script_pre']}\n", 1); $file_org_esc = escapeshellarg("{$file_dwn}.orig"); exec("{$pfb_script_post} {$file_org_esc} {$list['vtype']} {$elog}"); }
-
@andrebrait said in https://oisd.nl:
Only for traditional hosts-style lists
I realized after reading your reply that I had asked and you answered this question already.
I can see it's use in those cases.@andrebrait said in https://oisd.nl:
But it's done this way right now because 1. adding a custom Python script or some custom native binary (compiled or otherwise) would add more complexity to the package and 2. without zero-downtime reloads, de-duplicating in the script initialization would prolong the downtime.
Makes sense.
@andrebrait said in https://oisd.nl:
Since zero-downtime reloads are essential for my use-case anyway, I'd already have to implement it, so I'm killing two birds with one stone. And it's arguably easy to do anyway (I can probably do it the next time I have more than a couple free hours, next week or so).
I'll give it a whirl when it is ready.
-
@totowentsouth I integrated your fix.
Could you pull the latest pfblockerng-adblock branch (or pfblockerng-next) and check that it's been fixed?
-
@andrebrait Patch looks good and no lines in py_error.log while running pfblockerng-next at 69f3d0455363411179a763a3c39e03b0b027b4a0.
Thanks!
-
@totowentsouth New code in the branch https://github.com/andrebrait/FreeBSD-ports/tree/pfblockerng-adblock
If you're willing to test it, it would be more than appreciated!
-
@andrebrait Yep. I have installed this on my secondary pfSense box. So far, 1 wrinkle and zero problems to report. I will exercise it over the coming days.
FWIW, I created a patch from 6b9d2aa2b78193bd8ce83d0c0e0793f157d3ed77..4683d6825a55667677803bda8444d14eb30ddf71
I removed hunk #38 for pfblockerng.inc from this patch due to a conflict. AFAICT, the change is already in 3.2.0_7 ?? -- afterwards, the patch applied clean so all is well.Hunk #38 of net/pfSense-pkg-pfBlockerNG-devel/files/usr/local/pkg/pfblockerng/pfblockerng.inc:
@@ -10732,7 +10962,7 @@ function pfblockerng_php_pre_deinstall_command() { if (config_path_enabled('system','earlyshellcmd')) { $a_earlyshellcmd = config_get_path('system/earlyshellcmd', ''); if (preg_grep("/pfblockerng.sh aliastables/", $a_earlyshellcmd)) { - config_set_path('system','earlyshellcmd', + config_set_path('system/earlyshellcmd', preg_grep("/pfblockerng.sh aliastables/", $a_earlyshellcmd, PREG_GREP_INVERT)); } }
The wrinkle I've encountered is with the switch to use fontawesome 6 (a999ce5a96e22ab54317e7079b1871e1661f7218). I am wrestling with fonts on my machine and have some improvements. Will pfSense deliver these fonts if the host/browser does not have them installed?
-
@totowentsouth said in https://oisd.nl:
@andrebrait Yep. I have installed this on my secondary pfSense box. So far, 1 wrinkle and zero problems to report. I will exercise it over the coming days.
FWIW, I created a patch from 6b9d2aa2b78193bd8ce83d0c0e0793f157d3ed77..4683d6825a55667677803bda8444d14eb30ddf71
I removed hunk #38 for pfblockerng.inc from this patch due to a conflict. AFAICT, the change is already in 3.2.0_7 ?? -- afterwards, the patch applied clean so all is well.Hunk #38 of net/pfSense-pkg-pfBlockerNG-devel/files/usr/local/pkg/pfblockerng/pfblockerng.inc:
@@ -10732,7 +10962,7 @@ function pfblockerng_php_pre_deinstall_command() { if (config_path_enabled('system','earlyshellcmd')) { $a_earlyshellcmd = config_get_path('system/earlyshellcmd', ''); if (preg_grep("/pfblockerng.sh aliastables/", $a_earlyshellcmd)) { - config_set_path('system','earlyshellcmd', + config_set_path('system/earlyshellcmd', preg_grep("/pfblockerng.sh aliastables/", $a_earlyshellcmd, PREG_GREP_INVERT)); } }
Zero idea about it. Perhaps a recent change from the upstream devel branch that merged cleanly for me but not for you for some reason?
The wrinkle I've encountered is with the switch to use fontawesome 6 (a999ce5a96e22ab54317e7079b1871e1661f7218). I am wrestling with fonts on my machine and have some improvements. Will pfSense deliver these fonts if the host/browser does not have them installed?
Yes, same for me. I have no idea what their plans are.
I'll update the branch later and create a version that's based off of the upstream main branch to see if I observe any differences regarding that.
-
@andrebrait I noticed pfblockerng is blocking many innoculous domains after adding EasyPrivacy list. Examples of domains I would expect to resolve include starbucks.com, sendcloud.net, substack.com, and substackcdn.com.
https://easylist.to/easylist/easyprivacy.txt
In EasyPrivacy list, the entries for these domains are:
.sendcloud.net/track/ .starbucks.com/a/ .substack.com/o/$image .substackcdn.com/open?$image
A peak at what the final list produced for a two of the domains that are blocked:
/var/db/pfblockerng: grep -nR sendcloud dnsbl/ dnsbl/hagezi_pro_plus.txt:43088:,sctrack.sendcloud.net,,2,hagezi_pro_plus,DNSBL_hagezi_pro_plus,1 dnsbl/hagezi_pro_plus.txt:49793:,track.sendcloud.org,,2,hagezi_pro_plus,DNSBL_hagezi_pro_plus,1 dnsbl/hagezi_pro_plus.txt:51974:,tracking.sendcloud.sc,,2,hagezi_pro_plus,DNSBL_hagezi_pro_plus,1 dnsbl/EasyList_Privacy.txt:4012:,sendcloud.net,,0,EasyList_Privacy,DNSBL_EasyList,0
/var/db/pfblockerng: grep -nR substack\.com dnsbl/ dnsbl/OISD_big.txt:480:,0x00000000000.substack.com,,2,OISD_big,DNSBL_Collections,1 dnsbl/OISD_big.txt:66539:,etharticles.substack.com,,2,OISD_big,DNSBL_Collections,1 dnsbl/OISD_big.txt:146726:,publicationgroup.substack.com,,2,OISD_big,DNSBL_Collections,1 dnsbl/OISD_big.txt:178172:,teamproject.substack.com,,2,OISD_big,DNSBL_Collections,1 dnsbl/OISD_big.txt:184591:,tradestrategy.substack.com,,2,OISD_big,DNSBL_Collections,1 dnsbl/OISD_big.txt:188726:,uniproject.substack.com,,2,OISD_big,DNSBL_Collections,1 dnsbl/OISD_big.txt:202727:,web3projects.substack.com,,2,OISD_big,DNSBL_Collections,1 dnsbl/OISD_big.txt:203007:,webpublic.substack.com,,2,OISD_big,DNSBL_Collections,1 dnsbl/StevenBlack_hosts.txt:7301:,email.mg1.substack.com,,2,StevenBlack_hosts,DNSBL_Collections,0 dnsbl/EasyList_Privacy.txt:4337:,substack.com,,0,EasyList_Privacy,DNSBL_EasyList,0
EasyPrivacy seems to be one of the few that lists entries in this manner and with domains that one would expect to resolve.
BTW, this issue appears to exist in pfblockerng-next branch as well. An entry in EasyList_France that follows the pattern as those above is in the final output. This box is running pfblockerng-next code:
[23.09.1-RELEASE][admin@pfSense.localdomain]/var/db/pfblockerng: grep -nR 468\.60\.gif dnsbl dnsbl/C_EasyList_France.txt:228:,468.60.gif,,0,C_EasyList_France,DNSBL_EasyList,0
My pihole loads the same EasyList_France and although it shows ".468.60.gif" in the results for "Search Adlists", when I dig @<pihole.ip> 468.60.gif, the report in pihole is blocked by external i.e. pfblockerng. After removing EasyList_France from pfblockerng, the dig @<pihole.ip> 468.60.gif returns an answer - so it was indeed pfblockerng blocking the lookup even though pihole listed it for some reason.
Let me know if you need more information. I have my main computer behind the latest pfblockerng code and am loading it with lists to give it a thorough workout (at least as far as the adblocking goes).
-
@andrebrait I made this change to my pfblockerng install:
Edited to include exclusion of entries beginning with a hyphen.
diff --git a/net/pfSense-pkg-pfBlockerNG-devel/files/usr/local/pkg/pfblockerng/pfblockerng.inc b/net/pfSense-pkg-pfBlockerNG-devel/files/usr/local/pkg/pfblockerng/pfblockerng.inc index c332706eba77..eca18a486157 100644 --- a/net/pfSense-pkg-pfBlockerNG-devel/files/usr/local/pkg/pfblockerng/pfblockerng.inc +++ b/net/pfSense-pkg-pfBlockerNG-devel/files/usr/local/pkg/pfblockerng/pfblockerng.inc @@ -8730,6 +8730,11 @@ function sync_package_pfblockerng($cron='') { if (!$liteparser) { $lite = FALSE; + # entries that start with a period are probably ABP style. + $beginswith = substr($line, 0, 1); + if ($beginswith == '.' || $beginswith == '-') { + continue; + } if (strpos($line, '.') !== FALSE && ctype_alnum(str_replace('.', '', $line))) { $lite = TRUE;
There are entries such as _c.gif, _adobe_analytics.js, _stat.php which viewed as a domain have an unregisterd TLD as of today AFAICT.
-
Inside the next if block, leading and trailing periods are pruned from the line:
// Remove leading/trailing dots $line = trim(trim($line), '.');
-
@totowentsouth thanks for looking into it.
As far as I can tell, it's an actual bug. It should not parse entries that start with a period. Those entries in EasyList lingo mean other things, not domain names.
The intended behavior is to only parse entries that start with || or @@|| (those are exclusions) and end with ^ (or ^| as sometimes that happens).
The entries you listed should have been skipped and ignored. Those are URL patterns, and DNSBL shouldn't use them.
I'll look into it as soon as possible. Thanks for the detailed information and the code snippets :)
-
@totowentsouth said in https://oisd.nl:
Inside the next if block, leading and trailing periods are pruned from the line:
// Remove leading/trailing dots $line = trim(trim($line), '.');
That still shouldn't let those entries be parsed. They have forward slashes in them, for example, so the fact it still manages to parse them is quite weird. Those should be skipped :/
I'll take a look later this weekend.
EDIT: yup, never mind. Found it:
// If '/' character found, remove characters after '/' if (strpos($line, '/') !== FALSE) { $line = strstr($line, '/', TRUE); }
-
@andrebrait That makes sense. Thank you for the explaination.
I noticed too that the EasyPrivacy list and a few other "easylist" styled lists begin with entries that pfblockerng considers "typical host feed format", i.e.,
$easylist
remains set to its initial value ofFALSE
. After an entry following the "easylist" style is read,$easylist
is set toTRUE
. For the remaining lines in the file, execution of the block beginning withif (!$easylist) {
is then skipped.Is the intent to process the list in "normal" mode until discovery of an "easylist" style entry?
Is the intent to support lists containing a mixture of styles? If the entries in EasyPrivacy are shuffled such that a raw domain entry is after an "easylist" style, then might that throw a wrench in processing since as is currently the case$easylist
isTRUE
after the first "easylist" style entry is found? -
@totowentsouth well, I'd say that it's unusual that lists contain both things, so I assume that's why the code works the way it does, but I think it'd be safe to do it on a per-line basis because EasyList syntax for a domain name are always going to start with || or @@|| and end with ^ or ^|.
So if a line matches that, we parse it as EasyList. Otherwise, we don't.
I guess this would likely be safer and likely more correct. And either way, it should ignore those entries, especially given they have a /.
I think the original intent there was to trim // comments at the end, or some lists which contained example.com/ for some reason. Either way, there are better ways to do that. I'm gonna check it out and fix it.
Could you provide a link to the lists files? Or do you mean the EasyPrivacy URL that is the pfBlockerNG feeds tab?
-
@andrebrait Yes, the EasyPrivacy URL https://easylist.to/easylist/easyprivacy.txt in the pfBlockerNG feeds tab is the same. I create groups and provide the URLs in lieu of using the feeds tab.
-
@totowentsouth I split the check to determine whether it's an EasyList and the parsing. Now there's a first pass through the file for checking for the EasyList headers and entries before moving on to the actual parsing (which I also refined).
I checked and the offending entries are not ending up in the file anymore. Let me know if you can reproduce the fix.
-
@andrebrait I updated my patch to include 4da5a631ae8d82a109fa7880429eff63c4cfa46f and all is well when using the EasyPrivacy list. Thanks!
-
@totowentsouth I gave it some polishing, cleaned up the commit history and produced the pfblockerng-adblock-clean branch (now on 7c3a4eaef2c714c9d97466ec2430e7e867cfd414) .
Could you give it a last go so I have someone else test it?