Null blocking SERVFAIL
-
With increased log level and servfail logging, the unbound logs show the following:
debug: mesh_run: python module exit state is module_error error: SERVFAIL <www.googletagmanager.com. A IN>: misc failure
-
@fenichelar said in Null blocking SERVFAIL:
Restarting unbound
You've set unbound logging to the max, so chances are your history is now gone.
But, when you run :grep 'start' /var/log/resolver.log
what do you see ?
Does unbound restart often ?
While it restart, it can't handle 'DNS requests'. That would explain servfail. -
@Gertjan No, unbound isn't restarting often. The issue is intermittently impacting only certain domains. Most domains resolve without issue.
Unbound is returning SERVFAIL because the python module is erroring, but it isn't clear why. I am going to try to add some logging to the pfblocker python script, specifically before each of the module_error exit points, to see if I can work backwards towards the cause of the issue.
-
Plan B, for the moment : use Unbound mode ?!
What is your pfSense and pfBlockerng version ? I could find (nearly ?) identical Python errors, but they were from 4+ years ago.What are your main resolver settings ? Default ?
-
I have experienced the same thing when running pfBlockerNG with IPv6 DNSBL enabled along with null blocking. What I believe is occurring is that IPv4 sourced requests are null blocked (0.0.0.0 (A) /:: (AAAA)) properly, but IPv6 sourced requests are sent to the DNSBL webserver which then returns SERVFAIL due to cert issue. I posted recently (past month) in the Reddit pfBlockerNG thread about this issue, but have not received a response.
-
@tibere86 said in Null blocking SERVFAIL:
but IPv6 sourced requests are sent to the DNSBL webserver which then returns SERVFAIL
If a LAN device, a browser wants to visit www.google.com, it will do a DNS lookup first.
Let's imagine IPv6 and IPv4 is avaible for IP traffic : your pfSense, and your ISP, is full dual stack = can route IPv4 and IPv6.
Your browser, probably the DNS forwarder in your device, will send the request to the default DNS.
This can be over IPv4 or IPv6.
It will ask for an A and an AAAA record if IPv6 is available. It's normal to see your device asking for an AAAA record even when there is no IPv6 gateway (so the AAAA answer can't be used - don't ask me why ^^)If "www.google.com" was listed in a DNSBL 'v4', then 10.10.10.1 or 0.0.0.0 will be return by pfBlockerng.
If "www.google.com" was listed in a DNSBL 'v6', then ::10.10.10.1 or :: will be return by pfBlockerng.
In both cases, it will be this local IP (10.10.10.1 or 0.0.0.0 or the IPv6 counterpart), or the resolved A = IPv4 or AAA = IPv6 - but not SERVFAIL, as this means there was an error somewhere, like the resolver (unbound) couldn't resolve the DNS request.
reasons could be : Uplink (WAN) down or unbound wasn't running ...@tibere86 said in Null blocking SERVFAIL:
DNSBL webserver
If you fully understood what https is, you wouldn't use the "DNSBL webserver" or 10.10.10.1
This DNSBL webserver was nice to have when everything was http (http can redirected).
https can not be redirect.So : use :
and call it a day.
If this (example) would show up :
the browser was authorized to show insecure pages.
That is, TLS was till used, but the browser was asking for this cert :or it got back :
Normally, a browser shouldn't show anything - or just this :
as everybody knows an important security issue happened. They will bail out. They will be safe.
if a web page was showing up, like this one :
then their browser was accepting https redirection.
That's a major security risk (for this user, his device, your network ...). -
@Gertjan Thanks for the reply and information. Can you confirm that even with NULL BLOCK enabled, DNSBL v6 returns :: for sites on block list?
This is not what's occuring for me. When I have NULL BLOCK enabled, DNSBL v6 returns SERVFAIL instead of :: -
@tibere86 said in Null blocking SERVFAIL:
When I have NULL BLOCK enabled, DNSBL v6 returns SERVFAIL instead of ::
Hummmm.
I just tested with "006.freecounters.co.uk" which is part of DNSBL_ADs_Basic.
When I test with
I saw also a "ServFail".Then I started to "play" with the python script ... and now I see :
(adding logs lines on what I think are interesting places ), like on line 1614:log_info("{} Blocked {} Returned IP {} {}".format(q_name, b_ip, q_type_str, q_type))
I saw :
<30>1 2025-01-29T17:56:09.484605+01:00 pfSense.hf.tld unbound 13346 - - [13346:0] info: 006.freecounters.co.uk Blocked :: AAAA 28
... But then humm again.
If unbound return "::" then what can the browser do with this ?
It should be the, to be useful, an IPv6 GUA of pfSense .....
And &@/{#, as I'm in null blocking mode ...
Done. back to Web server mode.Now I see :
info: 006.freecounters.co.uk Blocked ::10.10.10.1 Returned IP AAAA 28
But https://[::10.10.10.1]:443 doesn't really work.
Also : https://serverfault.com/questions/698369/what-is-the-ipv6-equivalent-of-0-0-0-0-0
-
Okay, so I have made some progress diagnosing the bug. The issue is occurring here:
msg = DNSMessage(qstate.qinfo.qname_str, q_type, RR_CLASS_IN, PKT_QR | PKT_RA) msg.answer.append("{}. 60 IN {} {}" .format(q_name, q_type_str, b_ip)) msg.set_return_msg(qstate) if msg is None or not msg.set_return_msg(qstate): qstate.ext_state[id] = MODULE_ERROR return True
https://github.com/pfsense/FreeBSD-ports/blob/devel/net/pfSense-pkg-pfBlockerNG/files/usr/local/pkg/pfblockerng/pfb_unbound.py#L1617-L1623
Specifically, there is a mismatch between
q_type_str
andb_ip
.If the first DNS lookup is for type
A
, thenb_ip
will be0.0.0.0
. If anAAAA
lookup then comes in,b_ip
is still0.0.0.0
, which is wrong.If the first DNS lookup is for type
AAAA
, thenb_ip
will be::
. If anA
lookup then comes in,b_ip
is still::
, which is wrong.This is why the issue is inconsistent and restarting Unbound seems to resolve it.
More detail to come.
-
I haven't thoroughly tested yet, but here is my first pass at a fix:
https://github.com/pfsense/FreeBSD-ports/pull/1407/files.- Instead of saving the answer in the previously blocked domain details dictionary, a boolean for null blocking is saved. Then based on the query type, the appropriate answer is returned.
- I also fixed an issue where SERVFAIL was being returned because the answer (an IP address) was not valid for the query type. An answer is now only included for A, AAAA, or ANY queries.
- In addition, instead of ANY queries defaulting to A, they now default to both A and AAAA.
- Lastly, I increased the TTL from 60 (1 minute) to 3600 (1 hour) to reduce DNS load.
-
I copied pasted your diff, and will test it for a while.
@fenichelar said in Null blocking SERVFAIL:
Lastly, I increased the TTL from 60 (1 minute) to 3600 (1 hour) to reduce DNS load.
If I get it right, this is where the DNS reply "here is a 0.0.0.0 for you" created for the requesting device, if the host name was found in a blocking list (or regex, etc).
Instead of having the client device retry again one minute later as the TTL was low (60) - and if the client device respected the delay (now 3600), then yeah, way less DNS requesting should happen.
That is, if the client accepts 'NO' or 0.0.0.0 as an answer, and it respects the TTL delay. -
@Gertjan Exactly, the intention is to increase the amount of time that the client caches the block response; either null blocking ("0.0.0.0" / "::") or webserver blocking.
-
I've this - shows what my pfSense unbound is doing. Lets see what changes. I'm in now for 12 hours.