Update to 3.2.0_7 breaks DNSBL
-
@py said in Update to 3.2.0_7 breaks DNSBL:
PFB_FILTER - 6 | pfb_daemon_dnsbl_index [ 12/6/23 03:16:41 ] Failed validation [ - ]
^^ in the error.log and dnsbl_parsed_error (screen image) time stamps don't match.
in the pfblockerng.log find this section in the latest processing
===[ DNSBL Domain/IP Counts ]what is the number for total ?
and what do the numbers under this section look like
pfSense Table Statsand then at very end what is the timestamp here ?
UPDATE PROCESS ENDED [ 12/6/23 10:30:53 ]Can you share your complete DNS resolver settings, General and Advance.
also the from the DNSBL config page. (just the sections DNSBL, DNSBL Webserver Config, and DNSBL Configuration)Thanks
-
@jrey Those timestamps were not intended to match.
===[ DNSBL Domain/IP Counts ] ===================================
3632259 total
pfSense Table Stats
table-entries hard limit 400000
Table Usage Count 36389UPDATE PROCESS ENDED [ 12/6/23 04:34:48 ]
DNS Resolver settings:
DNSBL Settings:
Thanks!
-
@jrey said in Update to 3.2.0_7 breaks DNSBL:
on the top list, the 10.10.10.1 is not even a DNS server -- that is the DNSBL web page. Don't select that in your list. (I've actually created a patch on my system to just remove that from the list for testing, so in cases where "ALL" is selected it can't possibly be used. )
I don't suppose you'd care to share that patch? I have Unbound set to ALL so it's grabbing the 10.10.10.1 entry as well.
- draco
-
Of course I could.
but a counter question to this statement
I have Unbound set to ALL
might be:
Why? -
if you want to before starting, you can clear pfblockerng.log and error.log on the
firewall > pfBlockerNG > Logs (select the log to display and there is a a trash can on the right) will just make it cleaning to see new entriesFirewall > pfBlockerNG > DNSBL
Change DNSBL Mode to "Unbound Python Mode"
for Options that appear after doing thatSelect (Check Enable) DNS Reply Logging, DNSBL Blocking, HSTS Mode
Save DNSBL SettingsThen DNS Resolver config
Enable Python Mode (if not already check)
Module order is Pre Validator
Python Module Script is pfb_unbound
SAVE (button at bottom)
(then scroll back up top page) Apply ChangesThen Firewall > pfBlockerNG -> Update
Force -> Reload -> DNSBLProcess start time ?
Process End time?
new errors in either of the above mentioned log files ?test and report the changes you see.
-
@jrey Applied and saved changes to DNSBL as specified. However, in DNS Resolver, the python module script is not available until after a force reload of DNSBL. So I did the force reload and checked DNS Resolver and those settings are as specified.
Force Reload start time: 9:16 am.
end time: 9:25
(MUCH faster than before!)No errors in error.log or pfblockerng.log.
There are new errors in dnsbl_parsed_error.log, examples below:- 12/7/23 09:21:32,Turkey_High_Risk,denizbank-cepsubem,denizbank-cepsubem
12/7/23 09:21:33,Turkey_High_Risk,gettask,gettask
12/7/23 09:21:33,Turkey_High_Risk,gettasks,gettasks
12/7/23 09:21:33,Turkey_High_Risk,gettasksize,gettasksize
12/7/23 09:21:33,Turkey_High_Risk,getupdates,getupdates
12/7/23 09:21:33,Turkey_High_Risk,godpore2,godpore2
12/7/23 09:21:33,Turkey_High_Risk,http,http
12/7/23 09:21:34,Turkey_High_Risk,instances,instances
12/7/23 09:21:34,Turkey_High_Risk,jqdownload,jqdownload
12/7/23 09:21:34,Turkey_High_Risk,merkezbankasi-com,merkezbankasi-com
12/7/23 09:21:35,Turkey_High_Risk,mobile-activity-site,mobile-activity-site
12/7/23 09:21:35,Turkey_High_Risk,p27dokhp,p27dokhp
12/7/23 09:21:35,Turkey_High_Risk,p27dokhpz2n7n,p27dokhpz2n7n
12/7/23 09:21:35,Turkey_High_Risk,pm,pm*
dns_reply.log is now continuously logging new entries as expected.
DNSBL Alerts GUI and dnsbl.log now in sync.
Hopefully this fixes it.
- 12/7/23 09:21:32,Turkey_High_Risk,denizbank-cepsubem,denizbank-cepsubem
-
@py said in Update to 3.2.0_7 breaks DNSBL:
the python module script is not available until after a force reload of DNSBL
Okay - was't sure, I've flipped back and forth on the test box, I couldn't remember when they appears.
new errors in dnsbl_parsed_error.log
I don't specifically use that list. However all those names are being rejected from the list because they are not really complete. So for example
godpore.0pe.kr godpore2 godpore2.0pe.ke godpore2.0pe.kr
the second entry in that block is rejected.
Others are the same
gettask gettasks gettasksize
so just names, not domain names and therefore rejected by the list parser.
if I added "bob" to the list it would be rejected. but "bob.com" would not.Nothing wrong here. But there is a lot of bloat in that particular list.
Should be good.
-
@jrey Yes, that made a big difference, thank you.
However, I rebooted to test that everything would work and unbound seems to freeze. Every line in dns_reply.log had "servfail" in it, and the DNS Resolver status page seemed frozen in that the only thing changing was TTL and all the other columns had the same values. And the DNS Resolver system log was not showing anything happening. And my network had no internet access.
Restarting unbound fixes it. Rebooted again to confirm, and again restarting unbound fixes it.
I had this problem before and it might have been one of the reasons I stopped using the python module. At that time I installed Cron to put in a script to restart unbound at one minute after a reboot, which worked as I recall. That was at least 2-3 pfsense versions ago.
If you think that's what I should do here I will make it so.
I very much appreciate all the time and effort here, it's been a great help.
-
I would not switch back. I would take a look for messages related to unbound restarting and find out why.
start with System Logs -> System -> DNS Resolver
(filter on process unbound)edit:
the servfail is most likely telling you that there is no upstream server responding to requests for names that are either not available or not cached.. but could also be several other thingsneed to narrow it down.
Client's talk to unbound (DNSBL) - then who is unbound talking to thru 03_PP_VPN_WAN if it can't satisfy the request? That's the outbound interface you have selected.
-
@jrey Filtering for the process 'unbound' just gives me what it's already showing. I've tried
'error'
'fail' and
'stop'
with no entries showing for those. What else can I filter for? -
This post is deleted! -
So there is also a stop right about the middle of that sequence.
Are you sure that you are not just seeing the "static" IP's being added and then it restarts.
You could test this by turning off the Static DHCP "Register DHCP static mappings in the DNS Resolver" That would/should only happen at boot for the static ones.my Resolver hasn't restarted since Nov 30, so restarts are normally rare.
is the restart (at the top) you ? or the system restarting it when then process is done. That said how many static addresses are being added?
-
@jrey I'll have to test that later tonight, but I note the dhcp service starting is the same time as the unbound service issuing the "control cmd: reload", then stopping and restarting:
(Dec 7 14:46:27 dhcpd 22906 Server starting service.)EDIT: The DNS log will remain static and as shown until I restart the service.
Those logs are pristine immediately after a reboot, so nothing in there is done by me.
I have 9 static addresses.
-
@jrey I deleted my post with the previous logs so I'm putting the sanitized versions here:
py_error.txt
DNS Resolver log.txt
DHCP log.txtAnd the latest DHCP log here:
DHCP log 2.txt
The only thing changed for this log was to disable:
"Register DHCP static mappings in the DNS Resolver".It is also very rare for my resolver to restart.
Thanks,
-
@py said in Update to 3.2.0_7 breaks DNSBL:
py_error.txt
My theory : your py_error.txt file is clear enough.
While updating, pfBlockerng creates the two master DNSBL files from all your selected DNSBL feeds.
One of the DNSVL source files you've chosen to uses has a fckd up format, and worse, pfBlockerng didn't detect it (wonder who it was).
This will choke the entire internal DNS handling for unbound, as the two master files pfb_py_data.txt and pfb_py_zone.txt have a syntax error.Easy to repair : remove all your DNSBL, do a full reload, and check if the error are gone.
You probably have to purge the py_error.txt file yourself.Note : this log file should always stay empty. If there are lines in this fie, consider your system unusable and react right away (by undoing your last step, or de activating DNSBL etc).
@py said in Update to 3.2.0_7 breaks DNSBL:
"Register DHCP static mappings in the DNS Resolver"
Will, just before unbound start, create a file /var/unbound/host_entries.conf, and this file get uses by unbound also.
You can keep "Register DHCP static mappings in the DNS Resolver" at any time.
Small reserve : entering invalid stuff as a static host name, or something like that, can have effects also. -
This post is deleted! -
is there a specific reason your upstream DNS has to be over the VPN? For a test you could add 01_WAN on the Outgoing Network Interfaces Resolver Setting and see if it changes way it loads / works. (so select both 01_WAN and 03_PP_VPN_WAN)
It doesn't apply to my case because of the upstream DNS servers being used. I seem to recall from somewhere that once the VPN WAN is up traffic will use that based on the priority of the interface. I'll see if I can find that reference for you @Gertjan might know. I seem to recall it may have been in a netgate OpenVPN video.
Still a good test.
-
As to the question of whether unbound was starting before the VPN was up, I just looked at the logs (duh) and it's clear the VPN is up and initialized at least several seconds before unbound. Still, I wonder if maybe unbound is getting a temporary bogus route from somewhere?
As to DNS going over the WAN, that would be DNS leak and would defeat the purpose of the VPN. I understand that may also create significant vulnerabilities in the VPN. I also understand most if not all VPN protocols and providers go to some length to lock down and prevent DNS leaks before and after the VPN is initialized. I wouldn't want to do anything to defeat that.
@Gertjan I deleted my first reply which I'm restating here and adding addition troubleshooting information:
Thank you, but the py_error.log usually doesn't log an error. I downloaded it yesterday so I could post it here, and I deleted it from DNSBL after. It hasn't logged an error since then, and I keep close eye on it.
Considering your advice and as a precaution, I spent a lot of time this morning and did go through all of my feeds and actually looked at what was being downloaded, and yep, some looked borked. Those were deleted and so were any that were duplicates of other feeds. I then cleared all downloaded lists by following the instructions in pfBlockerNG > General, and then did a reload.
py_error.log still has no entries.
Unbound still becomes unresponsive after a reboot.
A while back I had a realtech NIC on my network which would give out 2 MAC addresses and obtain 2 IPs, one of which was for network "management". I assigned the bogus MAC a static mapping to an IP outside of the IP range so it couldn't communicate, and also created a firewall blocking rule for it. Since that NIC is gone, today based on your suggestion I deleted the static mapping and firewall rule for it. I don't have any other screwy static mappings.
Unbound still becomes unresponsive after a reboot.
Next in DNS Resolver > Advanced I selected Log Level 4, then in the DNS Resolver system log I increased the GUI log entries to 2000 and the Rotation Size to 1024000, then rebooted and immediately saved the log, posted below:
I'm not an expert, but the lines from this log i think may be of interest are (in reverse order):
Dec 8 10:37:28 unbound 32597 [32597:19] debug: return error response SERVFAIL
Dec 8 10:37:28 unbound 32597 [32597:1] debug: Failed to get a delegation, giving up
Dec 8 10:37:28 unbound 32597 [32597:19] debug: opened UDP if=0 port=20158
Dec 8 10:37:28 unbound 32597 [32597:19] notice: send failed: Can't assign requested address
Dec 8 10:37:28 unbound 32597 [32597:19] debug: opened UDP if=0 port=32918
Dec 8 10:37:28 unbound 32597 [32597:19] notice: send failed: Can't assign requested address
Dec 8 10:37:28 unbound 32597 [32597:19] debug: query response was timeout
(DISREGARD, as stated above, the logs make it clear the VPN is initialized before unbound.): This makes me suspect that unbound is trying to connect and timing out before the VPN is up. If that's the case, is there some way to check that?
If it is timing out before the VPN is up, is there a way to delay unbound, or not have it start until the VPN is up?I also did as you suggested and deleted all the DNSBL feeds and reloaded, then rebooted, no change.
I then deleted all the IPv4 feeds, reloaded, rebooted, no change.
I also captured the last DNS Resolver log but it looks about the same as the one above.
SOOOO after all this, I think we've been at this for a week now and I need to move on. I'm again considering wiping all the DNSBL settings and reinstalling, especially now that I've already deleted all the feeds. I don't want to redo those more than once.
Thank You,
-
@py said in Update to 3.2.0_7 breaks DNSBL:
unbound is getting a temporary bogus route from somewhere?
not likely, but if it is -- it's your VPN
so the latest log you attached accounts for 1 second of traffic. it contains 74 "notice" records. You might want to look at those.
It might not be a big enough sample either.Can't tell you anything about your VPN provider and what they do or do not do.
in the log
Your going to see a lot of records mention ?.ROOT-SERVERS.NET. where the ? is a letter. Those are root DNS Servers - then in near proximity to those you are going to see. entries likeOne example:
notice: remote address is ip4 199.7.83.42 port 53 (len 16)
notice: send failed: Can't assign requested addressthat address is registered to ICANN-ARIN. you can look up who that is.
you're going to see lines like
debug: dnssec status: not expectedreinstalling and using the same settings is a path to the same results.
You could try 01-WAN as a TEST and see if the results change. No one is suggesting that you leave it that way.
But if it works, then your VPN DNS traffic is likely the problem and you focus on fixing that.You're never going to get a response when
debug: query response was timeoutit is unlikely that the resolver is hung it just can't get an answer from anything upstream.
I said above:
Client's talk to unbound (DNSBL) - then who is unbound talking to thru 03_PP_VPN_WAN if it can't satisfy the request?
you have now said:
and I need to move on
your choice of course.
-
@jrey @Gertjan
I wanted to provide update to these issues so you know what happened and in case it might help someone else. I left this thread partly because I didn't want to take up anyone else's time on what was becoming an increasingly complex and consuming problem.It turns out this was 2 main problems:
- After encountering what seemed like a cascading number of failures and problems, including a boot loop, I was able to verify bad hardware. Consequently the MB, CPU and RAM were replaced with new. The SSD and NIC PCI card did not seem affected.
- There was indeed a misconfiguration with my account at the VPN provider. Specifically, any filtering at the VPN level was not compatible with DNSSEC. This included any filtering, including of malware, tracking or social sites. This filtering was turned off.
Some notes:
Before wiping out the SSD and in the old installation of pfSense, I did not use any backup files to transfer settings because it seemed likely to transfer misconfigurations as well. Instead I took screens shots of every page I may have modified from a default installation and used those to transfer most of my setup to the new default install of pfsense on the new hardware. Almost every setting that was to be changed from default was questioned and re-researched before doing so. In this process I came across VPN setup instructions from a competing provider that had more specific and complete instructions which included the warning that VPN filtering was not compatible with DNSSEC. Also, I was able to identify and eliminate a few crazy settings made in an apparently futile attempt to address the problems.Yeah, WAY more than a couple of hours, LOL. But the bad hardware was a far bigger problem than I expected.
Everything now works as expected and DNS resolution, and everything else, is faster than it's ever been. In pfBlockerNG, error.log and py_error.log remain empty after a week or two of continuous use.
I want to thank @jrey and @Gertjan for your help which I have tried to heed. I do have fewer pfBlockerNG feeds thanks to @Gertjan .
And thanks and gratitude to @BBcan177 for creating such an awesome package.
Best Regards,