https://oisd.nl
-
@totowentsouth thanks for looking into it.
As far as I can tell, it's an actual bug. It should not parse entries that start with a period. Those entries in EasyList lingo mean other things, not domain names.
The intended behavior is to only parse entries that start with || or @@|| (those are exclusions) and end with ^ (or ^| as sometimes that happens).
The entries you listed should have been skipped and ignored. Those are URL patterns, and DNSBL shouldn't use them.
I'll look into it as soon as possible. Thanks for the detailed information and the code snippets :)
-
@totowentsouth said in https://oisd.nl:
Inside the next if block, leading and trailing periods are pruned from the line:
// Remove leading/trailing dots $line = trim(trim($line), '.');
That still shouldn't let those entries be parsed. They have forward slashes in them, for example, so the fact it still manages to parse them is quite weird. Those should be skipped :/
I'll take a look later this weekend.
EDIT: yup, never mind. Found it:
// If '/' character found, remove characters after '/' if (strpos($line, '/') !== FALSE) { $line = strstr($line, '/', TRUE); }
-
@andrebrait That makes sense. Thank you for the explaination.
I noticed too that the EasyPrivacy list and a few other "easylist" styled lists begin with entries that pfblockerng considers "typical host feed format", i.e.,
$easylist
remains set to its initial value ofFALSE
. After an entry following the "easylist" style is read,$easylist
is set toTRUE
. For the remaining lines in the file, execution of the block beginning withif (!$easylist) {
is then skipped.Is the intent to process the list in "normal" mode until discovery of an "easylist" style entry?
Is the intent to support lists containing a mixture of styles? If the entries in EasyPrivacy are shuffled such that a raw domain entry is after an "easylist" style, then might that throw a wrench in processing since as is currently the case$easylist
isTRUE
after the first "easylist" style entry is found? -
@totowentsouth well, I'd say that it's unusual that lists contain both things, so I assume that's why the code works the way it does, but I think it'd be safe to do it on a per-line basis because EasyList syntax for a domain name are always going to start with || or @@|| and end with ^ or ^|.
So if a line matches that, we parse it as EasyList. Otherwise, we don't.
I guess this would likely be safer and likely more correct. And either way, it should ignore those entries, especially given they have a /.
I think the original intent there was to trim // comments at the end, or some lists which contained example.com/ for some reason. Either way, there are better ways to do that. I'm gonna check it out and fix it.
Could you provide a link to the lists files? Or do you mean the EasyPrivacy URL that is the pfBlockerNG feeds tab?
-
@andrebrait Yes, the EasyPrivacy URL https://easylist.to/easylist/easyprivacy.txt in the pfBlockerNG feeds tab is the same. I create groups and provide the URLs in lieu of using the feeds tab.
-
@totowentsouth I split the check to determine whether it's an EasyList and the parsing. Now there's a first pass through the file for checking for the EasyList headers and entries before moving on to the actual parsing (which I also refined).
I checked and the offending entries are not ending up in the file anymore. Let me know if you can reproduce the fix.
-
@andrebrait I updated my patch to include 4da5a631ae8d82a109fa7880429eff63c4cfa46f and all is well when using the EasyPrivacy list. Thanks!
-
@totowentsouth I gave it some polishing, cleaned up the commit history and produced the pfblockerng-adblock-clean branch (now on 7c3a4eaef2c714c9d97466ec2430e7e867cfd414) .
Could you give it a last go so I have someone else test it? -
@andrebrait I updated a pfSense box to 7c3a4eaef2c714c9d97466ec2430e7e867cfd414.
I think the extraction of IP addresses in DNSBL is no longer extracting and storing those IPs...This particular pfSense install was using pfblockerng-next -- i.e. before pfblockerng-adblock. FWIW, I uninstalled pfblockerng and removed orphaned files. Then I installed pfblockerng-devel and applied a patch to install 7c3a4. I have yet to try pfblockerng-adblock.In particular, DNSBLIP_v4.txt is absent and original/DNSBL_v4.orig has only one entry 127.1.7.7.
Here is an example of a list that includes domains and IPv4:
https://malware-filter.gitlab.io/malware-filter/phishing-filter.txt
I will do more testing and verification in the next day or so.Edit & Update: https://malware-filter.gitlab.io/malware-filter/phishing-filter-agh.txt is their adblock style. After switching to this list, the IPs are extracted. All is well now.
-
This post is deleted! -
@andrebrait I began a solution for automated test coverage of pfBlockerNG's DNSBL and IP list consolidation. The setup is a little involved and undocumented. I'll flush some documentation for it over the next few days. It is on github at babilon/pfblockerng-tests. I'm now able to trivially run a suite of tests against changes to pfBlockerNG.
-
@andrebrait Functionally, everything appears well. I noticed these duplicate calls to shell functions:
diff --git a/net/pfSense-pkg-pfBlockerNG-devel/files/usr/local/pkg/pfblockerng/pfblockerng.inc b/net/pfSense-pkg-pfBlockerNG-devel/files/usr/local/pkg/pfblockerng/pfblockerng.inc index df3dc385c5f2..03e9990d64cd 100644 --- a/net/pfSense-pkg-pfBlockerNG-devel/files/usr/local/pkg/pfblockerng/pfblockerng.inc +++ b/net/pfSense-pkg-pfBlockerNG-devel/files/usr/local/pkg/pfblockerng/pfblockerng.inc @@ -9119,8 +9119,6 @@ function sync_package_pfblockerng($cron='') { // Consolidate all exclusions exec("{$pfb['script']} dnsbl_py_assemble_exclusions_file unused unused unused {$elog}"); - exec("{$pfb['script']} dnsbl_py_assemble_redundants_file unused unused unused {$elog}"); - // Process Whitelists foreach ($postprocess_dnsbl as $header_esc) { @@ -9139,8 +9137,6 @@ function sync_package_pfblockerng($cron='') { exec("{$pfb['script']} dnsbl_py_remove_redundant {$header_esc} unused unused {$elog}"); } - exec("{$pfb['script']} dnsbl_py_cleanup_exclusions_file unused unused unused {$elog}"); - exec("{$pfb['script']} dnsbl_py_cleanup_redundants_file unused unused unused {$elog}"); } --
-
@totowentsouth the function names are slightly different. One set assembles/removes the master exclusions file and the other assembles/removed the master "might make other entries redundant" file.
Because EasyLists can also contain exclusions, in order to minimize the processed lists as much as possible, I've added a post-processing step to process all files and remove block entries that would be nullified by exclusions, as well as a step to remove redundant entries (e.g. mail.google.com becomes redundant if a wildcard rule for google.com exists).
The old logic already did that a bit, but in a different manner.
Or am I missing what you're referring to?
-
@andrebrait my bad on the duplication claim. I shoulda tried <shift># and I'd have seen the difference.
All is well. I retract my previous claims of issues. Sorry for any inconviences.
I've applied the latest to all my pfSense boxes BTW.