Update to 3.2.0_7 breaks DNSBL
-
@Gertjan said in Update to 3.2.0_7 breaks DNSBL:
the message from lighttpd while (re) starting is not a problem. It's just a warning.
the cipher message in the file, is exactly what I said, not fatal don't worry about it. That wasn't the message causing other folks to have issues. The "other" messages if they are there "are not logged" so you can view them in a log file, and that's a problem. They do cause problems. By checking, the OP, has simply ruled those potential issues out.
I agree, with the "Don't." - in regards to uninstalling/reinstalling.
I disagree somewhat with this: "If there is an issue, its not the code or the scripts."
There are literally 100s of issues, and unless there is an investigation, nothing will ever change.For example, the one line of code, that changed in pfBlockerNG _7 is being perceived by several as the cause of their "new" problem. Somewhat true, but not because of that line of code, as the scope of when it is executed is very limited.
There is an issue, because of something else that gets exposed when the package is updated. (but the pfb code has changed). Specifically on a system, where the underlying system versions have changed. I can recreate this "issue" all day long (in a Break/fix test cycle). Does it have anything to do specifically with the change in pfBlockerNG _7, yes, but also no.
Yes, but only because the update was applied to an underlying system that has changed. No because the code in pfBlockerNG that is causing these issues, hasn't changed, but is being impacted by what has changed under the hood.Does it impact logging? Very possible, however there is not enough data to make that determination conclusively yet. When "broken", logging stops /delays as described. But there are also "other" clues when it is "broken", the OP in this case doesn't seem to have any of those "other" clues. The lack of other clues, of course may or may not be a result of settings.
-
@jrey @Gertjan Okay, so I'm not reinstalling pfBlockerNG.
Thanks for telling me about the dns_reply.log because mine is empty and stays that way.
Hardware is an Intel Xeon E5-4667v4 with 18 cores on an ASRock X99 Taichi, and the NICs are Intel PRO/1000 on a PCI card.
error.log logged 4 of these since I cleared the log yesterday:
PFB_FILTER - 6 | pfb_daemon_dnsbl_index [ 12/6/23 03:16:41 ] Failed validation [ - ]dnsbl.log and unified.log frequently log a block that is not reflected in the GUI "Alerts" tab, and they never log a block from one of my workstations, even though the ping itself is blocked.
Updates are set at 04:15, so the entries from the dnsbl_parsed_error.log must be happening during the update:
I don't see any errors in the VPN log and it seems normal to me.
The DNS Resolver log frequently shows THROWAWAY responses which I don't recall seeing before this incident:
I tried python mode for DNSBL about a year ago but ran into trouble with that and my research showed those problems were inherent in the python implementation so I stopped using it since. I've long since forgotten the specifics.
-
@py said in Update to 3.2.0_7 breaks DNSBL:
PFB_FILTER - 6 | pfb_daemon_dnsbl_index [ 12/6/23 03:16:41 ] Failed validation [ - ]
^^ in the error.log and dnsbl_parsed_error (screen image) time stamps don't match.
in the pfblockerng.log find this section in the latest processing
===[ DNSBL Domain/IP Counts ]what is the number for total ?
and what do the numbers under this section look like
pfSense Table Statsand then at very end what is the timestamp here ?
UPDATE PROCESS ENDED [ 12/6/23 10:30:53 ]Can you share your complete DNS resolver settings, General and Advance.
also the from the DNSBL config page. (just the sections DNSBL, DNSBL Webserver Config, and DNSBL Configuration)Thanks
-
@jrey Those timestamps were not intended to match.
===[ DNSBL Domain/IP Counts ] ===================================
3632259 total
pfSense Table Stats
table-entries hard limit 400000
Table Usage Count 36389UPDATE PROCESS ENDED [ 12/6/23 04:34:48 ]
DNS Resolver settings:
DNSBL Settings:
Thanks!
-
@jrey said in Update to 3.2.0_7 breaks DNSBL:
on the top list, the 10.10.10.1 is not even a DNS server -- that is the DNSBL web page. Don't select that in your list. (I've actually created a patch on my system to just remove that from the list for testing, so in cases where "ALL" is selected it can't possibly be used. )
I don't suppose you'd care to share that patch? I have Unbound set to ALL so it's grabbing the 10.10.10.1 entry as well.
- draco
-
Of course I could.
but a counter question to this statement
I have Unbound set to ALL
might be:
Why? -
if you want to before starting, you can clear pfblockerng.log and error.log on the
firewall > pfBlockerNG > Logs (select the log to display and there is a a trash can on the right) will just make it cleaning to see new entriesFirewall > pfBlockerNG > DNSBL
Change DNSBL Mode to "Unbound Python Mode"
for Options that appear after doing thatSelect (Check Enable) DNS Reply Logging, DNSBL Blocking, HSTS Mode
Save DNSBL SettingsThen DNS Resolver config
Enable Python Mode (if not already check)
Module order is Pre Validator
Python Module Script is pfb_unbound
SAVE (button at bottom)
(then scroll back up top page) Apply ChangesThen Firewall > pfBlockerNG -> Update
Force -> Reload -> DNSBLProcess start time ?
Process End time?
new errors in either of the above mentioned log files ?test and report the changes you see.
-
@jrey Applied and saved changes to DNSBL as specified. However, in DNS Resolver, the python module script is not available until after a force reload of DNSBL. So I did the force reload and checked DNS Resolver and those settings are as specified.
Force Reload start time: 9:16 am.
end time: 9:25
(MUCH faster than before!)No errors in error.log or pfblockerng.log.
There are new errors in dnsbl_parsed_error.log, examples below:- 12/7/23 09:21:32,Turkey_High_Risk,denizbank-cepsubem,denizbank-cepsubem
12/7/23 09:21:33,Turkey_High_Risk,gettask,gettask
12/7/23 09:21:33,Turkey_High_Risk,gettasks,gettasks
12/7/23 09:21:33,Turkey_High_Risk,gettasksize,gettasksize
12/7/23 09:21:33,Turkey_High_Risk,getupdates,getupdates
12/7/23 09:21:33,Turkey_High_Risk,godpore2,godpore2
12/7/23 09:21:33,Turkey_High_Risk,http,http
12/7/23 09:21:34,Turkey_High_Risk,instances,instances
12/7/23 09:21:34,Turkey_High_Risk,jqdownload,jqdownload
12/7/23 09:21:34,Turkey_High_Risk,merkezbankasi-com,merkezbankasi-com
12/7/23 09:21:35,Turkey_High_Risk,mobile-activity-site,mobile-activity-site
12/7/23 09:21:35,Turkey_High_Risk,p27dokhp,p27dokhp
12/7/23 09:21:35,Turkey_High_Risk,p27dokhpz2n7n,p27dokhpz2n7n
12/7/23 09:21:35,Turkey_High_Risk,pm,pm*
dns_reply.log is now continuously logging new entries as expected.
DNSBL Alerts GUI and dnsbl.log now in sync.
Hopefully this fixes it.
- 12/7/23 09:21:32,Turkey_High_Risk,denizbank-cepsubem,denizbank-cepsubem
-
@py said in Update to 3.2.0_7 breaks DNSBL:
the python module script is not available until after a force reload of DNSBL
Okay - was't sure, I've flipped back and forth on the test box, I couldn't remember when they appears.
new errors in dnsbl_parsed_error.log
I don't specifically use that list. However all those names are being rejected from the list because they are not really complete. So for example
godpore.0pe.kr godpore2 godpore2.0pe.ke godpore2.0pe.kr
the second entry in that block is rejected.
Others are the same
gettask gettasks gettasksize
so just names, not domain names and therefore rejected by the list parser.
if I added "bob" to the list it would be rejected. but "bob.com" would not.Nothing wrong here. But there is a lot of bloat in that particular list.
Should be good.
-
@jrey Yes, that made a big difference, thank you.
However, I rebooted to test that everything would work and unbound seems to freeze. Every line in dns_reply.log had "servfail" in it, and the DNS Resolver status page seemed frozen in that the only thing changing was TTL and all the other columns had the same values. And the DNS Resolver system log was not showing anything happening. And my network had no internet access.
Restarting unbound fixes it. Rebooted again to confirm, and again restarting unbound fixes it.
I had this problem before and it might have been one of the reasons I stopped using the python module. At that time I installed Cron to put in a script to restart unbound at one minute after a reboot, which worked as I recall. That was at least 2-3 pfsense versions ago.
If you think that's what I should do here I will make it so.
I very much appreciate all the time and effort here, it's been a great help.
-
I would not switch back. I would take a look for messages related to unbound restarting and find out why.
start with System Logs -> System -> DNS Resolver
(filter on process unbound)edit:
the servfail is most likely telling you that there is no upstream server responding to requests for names that are either not available or not cached.. but could also be several other thingsneed to narrow it down.
Client's talk to unbound (DNSBL) - then who is unbound talking to thru 03_PP_VPN_WAN if it can't satisfy the request? That's the outbound interface you have selected.
-
@jrey Filtering for the process 'unbound' just gives me what it's already showing. I've tried
'error'
'fail' and
'stop'
with no entries showing for those. What else can I filter for? -
This post is deleted! -
So there is also a stop right about the middle of that sequence.
Are you sure that you are not just seeing the "static" IP's being added and then it restarts.
You could test this by turning off the Static DHCP "Register DHCP static mappings in the DNS Resolver" That would/should only happen at boot for the static ones.my Resolver hasn't restarted since Nov 30, so restarts are normally rare.
is the restart (at the top) you ? or the system restarting it when then process is done. That said how many static addresses are being added?
-
@jrey I'll have to test that later tonight, but I note the dhcp service starting is the same time as the unbound service issuing the "control cmd: reload", then stopping and restarting:
(Dec 7 14:46:27 dhcpd 22906 Server starting service.)EDIT: The DNS log will remain static and as shown until I restart the service.
Those logs are pristine immediately after a reboot, so nothing in there is done by me.
I have 9 static addresses.
-
@jrey I deleted my post with the previous logs so I'm putting the sanitized versions here:
py_error.txt
DNS Resolver log.txt
DHCP log.txtAnd the latest DHCP log here:
DHCP log 2.txt
The only thing changed for this log was to disable:
"Register DHCP static mappings in the DNS Resolver".It is also very rare for my resolver to restart.
Thanks,
-
@py said in Update to 3.2.0_7 breaks DNSBL:
py_error.txt
My theory : your py_error.txt file is clear enough.
While updating, pfBlockerng creates the two master DNSBL files from all your selected DNSBL feeds.
One of the DNSVL source files you've chosen to uses has a fckd up format, and worse, pfBlockerng didn't detect it (wonder who it was).
This will choke the entire internal DNS handling for unbound, as the two master files pfb_py_data.txt and pfb_py_zone.txt have a syntax error.Easy to repair : remove all your DNSBL, do a full reload, and check if the error are gone.
You probably have to purge the py_error.txt file yourself.Note : this log file should always stay empty. If there are lines in this fie, consider your system unusable and react right away (by undoing your last step, or de activating DNSBL etc).
@py said in Update to 3.2.0_7 breaks DNSBL:
"Register DHCP static mappings in the DNS Resolver"
Will, just before unbound start, create a file /var/unbound/host_entries.conf, and this file get uses by unbound also.
You can keep "Register DHCP static mappings in the DNS Resolver" at any time.
Small reserve : entering invalid stuff as a static host name, or something like that, can have effects also. -
This post is deleted! -
is there a specific reason your upstream DNS has to be over the VPN? For a test you could add 01_WAN on the Outgoing Network Interfaces Resolver Setting and see if it changes way it loads / works. (so select both 01_WAN and 03_PP_VPN_WAN)
It doesn't apply to my case because of the upstream DNS servers being used. I seem to recall from somewhere that once the VPN WAN is up traffic will use that based on the priority of the interface. I'll see if I can find that reference for you @Gertjan might know. I seem to recall it may have been in a netgate OpenVPN video.
Still a good test.
-
As to the question of whether unbound was starting before the VPN was up, I just looked at the logs (duh) and it's clear the VPN is up and initialized at least several seconds before unbound. Still, I wonder if maybe unbound is getting a temporary bogus route from somewhere?
As to DNS going over the WAN, that would be DNS leak and would defeat the purpose of the VPN. I understand that may also create significant vulnerabilities in the VPN. I also understand most if not all VPN protocols and providers go to some length to lock down and prevent DNS leaks before and after the VPN is initialized. I wouldn't want to do anything to defeat that.
@Gertjan I deleted my first reply which I'm restating here and adding addition troubleshooting information:
Thank you, but the py_error.log usually doesn't log an error. I downloaded it yesterday so I could post it here, and I deleted it from DNSBL after. It hasn't logged an error since then, and I keep close eye on it.
Considering your advice and as a precaution, I spent a lot of time this morning and did go through all of my feeds and actually looked at what was being downloaded, and yep, some looked borked. Those were deleted and so were any that were duplicates of other feeds. I then cleared all downloaded lists by following the instructions in pfBlockerNG > General, and then did a reload.
py_error.log still has no entries.
Unbound still becomes unresponsive after a reboot.
A while back I had a realtech NIC on my network which would give out 2 MAC addresses and obtain 2 IPs, one of which was for network "management". I assigned the bogus MAC a static mapping to an IP outside of the IP range so it couldn't communicate, and also created a firewall blocking rule for it. Since that NIC is gone, today based on your suggestion I deleted the static mapping and firewall rule for it. I don't have any other screwy static mappings.
Unbound still becomes unresponsive after a reboot.
Next in DNS Resolver > Advanced I selected Log Level 4, then in the DNS Resolver system log I increased the GUI log entries to 2000 and the Rotation Size to 1024000, then rebooted and immediately saved the log, posted below:
I'm not an expert, but the lines from this log i think may be of interest are (in reverse order):
Dec 8 10:37:28 unbound 32597 [32597:19] debug: return error response SERVFAIL
Dec 8 10:37:28 unbound 32597 [32597:1] debug: Failed to get a delegation, giving up
Dec 8 10:37:28 unbound 32597 [32597:19] debug: opened UDP if=0 port=20158
Dec 8 10:37:28 unbound 32597 [32597:19] notice: send failed: Can't assign requested address
Dec 8 10:37:28 unbound 32597 [32597:19] debug: opened UDP if=0 port=32918
Dec 8 10:37:28 unbound 32597 [32597:19] notice: send failed: Can't assign requested address
Dec 8 10:37:28 unbound 32597 [32597:19] debug: query response was timeout
(DISREGARD, as stated above, the logs make it clear the VPN is initialized before unbound.): This makes me suspect that unbound is trying to connect and timing out before the VPN is up. If that's the case, is there some way to check that?
If it is timing out before the VPN is up, is there a way to delay unbound, or not have it start until the VPN is up?I also did as you suggested and deleted all the DNSBL feeds and reloaded, then rebooted, no change.
I then deleted all the IPv4 feeds, reloaded, rebooted, no change.
I also captured the last DNS Resolver log but it looks about the same as the one above.
SOOOO after all this, I think we've been at this for a week now and I need to move on. I'm again considering wiping all the DNSBL settings and reinstalling, especially now that I've already deleted all the feeds. I don't want to redo those more than once.
Thank You,