DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times
-
@RickyBaker Those are be expected... That 10.10.10.6 is asking for the ptr (reverse) of 172.17.0.1 - is that an IP on your network, seems like a default sort of docker network to me.
But yeah that would return nx normally because your local dns doesn't have it setup.. But the ptr for say your pfsense IP should answer..
example
; <<>> DiG 9.16.50 <<>> -x 192.168.9.253 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58768 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;253.9.168.192.in-addr.arpa. IN PTR ;; ANSWER SECTION: 253.9.168.192.in-addr.arpa. 3600 IN PTR sg4860.home.arpa. ;; Query time: 9 msec ;; SERVER: 192.168.3.10#53(192.168.3.10) ;; WHEN: Fri May 10 11:43:43 Central Daylight Time 2024 ;; MSG SIZE rcvd: 81
Your logs shouldn't be behind.... Maybe you have time off on pfsense? If that was the case you could be having issues with dnssec validation?
Here I did a query for something that was pretty sure would not be asked for often and in cache www.msn.com because don't go there.. The 192.168.3.10 is my pihole IP.. So client via my dig command pihole, which then asked unbound on pfsense to look it up.
You can see the time in the unbound log matches up with the time on my client. 11:48:02
And I can see that query in my pihole as well - at the same time..
If logs are delayed or time is off in them - then yeah you got other things going on.. Do you have log compression setup?
What for sure would like to validate is your browser is even asking unbound for stuff - so if in your browser go to say www.lsjdfldsjdflsjgibberishwhatever.com you should be able to see that get asked for in unbound, and an NX response - like you saw in my previous screenshot.
And normally - by the time you do it in your browser.. And then go open the resolver log in the pfsense gui, that entry should be listed. There sure shouldn't be minutes of delay before that is in your log, fractions of seconds, maybe a second? But if delayed being seen in the log the timestamp should be pretty freaking exact on..
-
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
@RickyBaker Those are be expected... That 10.10.10.6 is asking for the ptr (reverse) of 172.17.0.1 - is that an IP on your network, seems like a default sort of docker network to me.
i can't say off the top of my head but 10.10.10.6 is my unraid server and that is where the dockers live so seems likely. There are also a lot from wpad.localdomain from the computer i'm connected with over VPN. here's a couple other ones:
10.10.10.12 is a hardwired PC i'm running wireshark on to capture any shenanigans@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
Your logs shouldn't be behind.... Maybe you have time off on pfsense? If that was the case you could be having issues with dnssec validation?
i even ssh'ed into the /var/log and it hasn't been updated since 11:11 but other logs have been:
very odd indeed. My pfsense time is up to date but i did notice that the last log update seems to oddly coincide with the last config change:
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
If logs are delayed or time is off in them - then yeah you got other things going on.. Do you have log compression setup?
i just removed the compression per @Gertjan helpful suggestions
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
And normally - by the time you do it in your browser.. And then go open the resolver log in the pfsense gui, that entry should be listed. There sure shouldn't be minutes of delay before that is in your log, fractions of seconds, maybe a second? But if delayed being seen in the log the timestamp should be pretty freaking exact on..
it's less than perfect that I'm doing this over VPN. I'll run these tests in a couple hours the minute i get home. But i'm not seeing that for some reason.
-
@RickyBaker said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
hasn't been updated since 11:11 but other logs have been:
You sure anything is even asking unbound anything? If you set to query unbound, and unbound is not showing anything in the logs for it - and you have it set to log queries.. Then there should be log entries being made - IF!! anything is asking unbound anything..
Can see see a simple output from nslookup, that is pretty much anything other than a phone.. And it will show you what dns your talking too.
$ nslookup Default Server: pi.hole Address: 192.168.3.10
If you set a debug you can get all kinds of info, the response who was asked, etc..
$ nslookup Default Server: pi.hole Address: 192.168.3.10 > set debug > www.msn.com Server: pi.hole Address: 192.168.3.10 ------------ Got answer: HEADER: opcode = QUERY, id = 2, rcode = NXDOMAIN header flags: response, auth. answer, want recursion, recursion avail. questions = 1, answers = 0, authority records = 0, additional = 0 QUESTIONS: www.msn.com.home.arpa, type = A, class = IN ------------ ------------ Got answer: HEADER: opcode = QUERY, id = 3, rcode = NXDOMAIN header flags: response, auth. answer, want recursion, recursion avail. questions = 1, answers = 0, authority records = 0, additional = 0 QUESTIONS: www.msn.com.home.arpa, type = AAAA, class = IN ------------ ------------ Got answer: HEADER: opcode = QUERY, id = 4, rcode = NOERROR header flags: response, want recursion, recursion avail. questions = 1, answers = 3, authority records = 0, additional = 0 QUESTIONS: www.msn.com, type = A, class = IN ANSWERS: -> www.msn.com canonical name = www-msn-com.a-0003.a-msedge.net ttl = 14906 (4 hours 8 mins 26 secs) -> www-msn-com.a-0003.a-msedge.net canonical name = a-0003.a-msedge.net ttl = 30 (30 secs) -> a-0003.a-msedge.net internet address = 204.79.197.203 ttl = 30 (30 secs) ------------ Non-authoritative answer: ------------ Got answer: HEADER: opcode = QUERY, id = 5, rcode = NOERROR header flags: response, want recursion, recursion avail. questions = 1, answers = 2, authority records = 1, additional = 0 QUESTIONS: www.msn.com, type = AAAA, class = IN ANSWERS: -> www.msn.com canonical name = www-msn-com.a-0003.a-msedge.net ttl = 14906 (4 hours 8 mins 26 secs) -> www-msn-com.a-0003.a-msedge.net canonical name = a-0003.a-msedge.net ttl = 3600 (1 hour) AUTHORITY RECORDS: -> a-msedge.net ttl = 3600 (1 hour) primary name server = ns1.a-msedge.net responsible mail addr = msnhst.microsoft.com serial = 2016092901 refresh = 1800 (30 mins) retry = 900 (15 mins) expire = 2419200 (28 days) default TTL = 240 (4 mins) ------------ Name: a-0003.a-msedge.net Address: 204.79.197.203 Aliases: www.msn.com www-msn-com.a-0003.a-msedge.net >
For all we know your clients your having issues with are not even talking to unbound on pfsense - and whatever ns they are talking to your having issues with..
Your typical network with a few devices on it - the dns would be very busy answering queries all the time.. Shoot even when nobody is actually using the device, there is quite often dns queries... If they are asking unbound, and you set it to log queries - then you should be seeing the log file increment like every minute or atleast when there is a query.
If you change it to show seconds... You should be seeing the log change everytime something is written
[23.09.1-RELEASE][admin@sg4860.home.arpa]/var/log: ls -D %H:%M:%S -l resolver.log -rw------- 1 root wheel 734731 13:48:50 resolver.log [23.09.1-RELEASE][admin@sg4860.home.arpa]/var/log: ls -D %H:%M:%S -l resolver.log -rw------- 1 root wheel 735595 13:48:59 resolver.log [23.09.1-RELEASE][admin@sg4860.home.arpa]/var/log: ls -D %H:%M:%S -l resolver.log -rw------- 1 root wheel 735822 13:49:02 resolver.log [23.09.1-RELEASE][admin@sg4860.home.arpa]/var/log: ls -D %H:%M:%S -l resolver.log -rw------- 1 root wheel 736254 13:49:08 resolver.log [23.09.1-RELEASE][admin@sg4860.home.arpa]/var/log: ls -D %H:%M:%S -l resolver.log -rw------- 1 root wheel 736479 13:49:10 resolver.log [23.09.1-RELEASE][admin@sg4860.home.arpa]/var/log: ls -D %H:%M:%S -l resolver.log -rw------- 1 root wheel 736479 13:49:10 resolver.log [23.09.1-RELEASE][admin@sg4860.home.arpa]/var/log: ls -D %H:%M:%S -l resolver.log -rw------- 1 root wheel 736479 13:49:10 resolver.log [23.09.1-RELEASE][admin@sg4860.home.arpa]/var/log: ls -D %H:%M:%S -l resolver.log -rw------- 1 root wheel 736479 13:49:10 resolver.log [23.09.1-RELEASE][admin@sg4860.home.arpa]/var/log: ls -D %H:%M:%S -l resolver.log -rw------- 1 root wheel 736713 13:49:21 resolver.log [23.09.1-RELEASE][admin@sg4860.home.arpa]/var/log:
If its not changing - then nothing is being log makes the most sense!
-
@johnpoz the queries I was doing was just searching in the browser. I will run those sample dig commands you posted earlier when I get home and am not over VPN.
-
@RickyBaker and browsers these days LOVE to use doh, and not even ask your local dns.. If your issues were in the browser its quite possible it was talking to whatever it uses for default doh (dns over https).. Browsers love to switch to this without any user intervention at all.. You know the browser people looking out for their idiot users that are too stupid to decide what dns they want to use..
And if they are using our browser, then clearly we should point them to our dns for their own good without telling them we are doing so, or even asking them if we should.
What browser are you using?
-
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
If you set a debug you can get all kinds of info, the response who was asked, etc..
seems bad
When i attempted it on my unraid server the command wasn't found. when i did on pfsense itself and my plex server nslookup just seemed to hang looking for more input.@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
What browser are you using?
chrome but i can't imagine that's better
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
If logs are delayed or time is off in them
My logs were still stuck on 11:11:03 (last config update). i restarted the service and they are updating again.
mine doens't seem to have an answer section like yours
And this I believe is the corresponding failure in the log:
the previous fail in the log was from the browser...I'm kind of swimming in all the different steps that were needed. Was this helpful? What have I discovered about my devices and their usage of the DNS?
-
@RickyBaker all those are failing.. You see servfail.. So no its never going to work.
Is 10.10.10.1 your actual IP, or are you pointing to the vip of pfblocker?
In your nslookup debug you never even asked for just www.msn.com - you just asked for www.msn.com.localdomain.
Put a . on the end with your nslookup.. You see how mine did search, with my home.arpa but then it dropped that and did my actual query. Your never did that.
What is asking for HTTPS record vs just A record? You see where you see query from 10.10.10.10 its doing both a A record query and a HTTPS query?
You might want to add these two options.. So easier to see what is query and what is reply.. And prob want to add the servfail option so might get some info to why it failed.
log-tag-queryreply: yes
log-servfail: yesAdd those to what you already have in your options box and save and apply.. This can give you more info..
So your not behind a vpn here, pfsense has no vpn client connection? You need to see in your debug for what your actually asking for www.msn.com.localdomain is never going to resolve.. Unless you had created that record locally.
And didn't we go over that 127.0.0.53, you need to know who exactly that is asking.. If you going to do a dig - do a directed query with the @ipaddress...
-
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
So no its never going to work.
but it DOES SOMETIME work! that's why it's so infuriating
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
Is 10.10.10.1 your actual IP, or are you pointing to the vip of pfblocker?
i don't have pfblocker installed 10.10.10.1 is the ip address of my pfsense router.
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
In your nslookup debug you never even asked for just www.msn.com - you just asked for www.msn.com.localdomain.
I def did not intend to ask for www.msn.com.localdomain and I def did not type the words localdomain when I was running the sample you suggested. I merely enacted the samples you suggested as well as pointing my browser at www.msn.com. I'm guessing the https request is a browser feature that forces https, but that's just conjecture
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
So your not behind a vpn here, pfsense has no vpn client connection? You need to see in your debug for what your actually asking for www.msn.com.localdomain is never going to resolve.. Unless you had created that record locally.
I am not behind a VPN here (intentionally at least) and I have not created a record for msn.com locally (intentionally at least).
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
And didn't we go over that 127.0.0.53, you need to know who exactly that is asking.. If you going to do a dig - do a directed query with the @ipaddress...
yes, i knew there was a detail i forgot in that troubleshooting
i'm not 100% sure of the middle one and i have no idea what 127.0.0.53 is. Is there another test i should run to get more color?@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
log-tag-queryreply: yes
log-servfail: yes -
@RickyBaker well lets see what happens with logging of servfail detaills. Because clearly its running and resolved your pfsense.localdomain name from from 10.10.10.1 when you did your nslookup.
Another thing I notice on your servfail your not getting the ede back..
You should be able to enable that with ede: yes in your custom box
See here
-
127.0.0.53
Your screenshot shows Ubuntu, that’s the local DNS resolver.
https://unix.stackexchange.com/questions/612416/why-does-etc-resolv-conf-point-at-127-0-0-53 -
@SteveITS said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
Your screenshot shows Ubuntu, that’s the local DNS resolver.
does this mean that my plex server isn't using pfsense for dns resolving?
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
well lets see what happens with logging of servfail detaills.
tbc you want me to simply rerun those dig/nslookup sample tests you listed earlier right?
DNSKEY MIssing? also apparently way longer to complete
"unfortunately" i was not experiencing an outage at this time
-
@RickyBaker there you go - some actual useful info
So your having some sort of issue with dnssec.. I would expect that to fail with that query - that fqdn is test fqdn for making sure dnssec is working.. But we are seeing the servfail reason..
So now when normal queries fail we might get to the bottom of why your getting servfail vs an answer to what you ask for.
-
@RickyBaker said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
does this mean that my plex server isn't using pfsense for dns resolving?
No what it means is its asking the local cache at 127.0.0.53, your command shows that points to 10.10.10.1
Clearly went over this already like 6 days ago...
-
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
So your having some sort of issue with dnssec.
When looking up dnssec-failed.org, what would you expect ?
https://www.internetsociety.org/resources/deploy360/2013/dnssec-test-sites/
-
@Gertjan exactly - like I said ;)
-
First, I would like to again apologize for my lack of knowledge. I promise I'm not trying to be difficult or annoying. This is all foreign terminology and concepts to me, but I'm trying my best and can't quantify how much I appreciate the time you're taking
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
So your having some sort of issue with dnssec.. I would expect that to fail with that query - that fqdn is test fqdn for making sure dnssec is working.. But we are seeing the servfail reason..
So now when normal queries fail we might get to the bottom of why your getting servfail vs an answer to what you ask for.
What do you mean by a normal query? How is this NOT a normal query? (ducks:)) What's the next step you'd like to see to further clarify?
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
No what it means is its asking the local cache at 127.0.0.53, your command shows that points to 10.10.10.1
Clearly went over this already like 6 days ago...
ahh that makes sense, sorry I missed that earlier. so does this mean i should be constantly trying new websites i don't ever visit to avoid it falling back to local cache? or is that a fundamental misunderstanding of the steps
@Gertjan said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
When looking up dnssec-failed.org, what would you expect ?
thank you for the links. It somehow moved me closer AND farther away from understanding. I have AT&T fiber, why did it attempt a comcast run dnssec fail website. Is going to this website something built into the dig command? Also, correct me if I'm wrong, but I believe y'all had me re-enable DNSSEC just cause it was good practice. I can see how this failing is symptomatic of my greater problems but it's odd to me that whats manifesting itself is something I've been told is really optional and best practice, not required.
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
that fqdn is test fqdn for making sure dnssec is working.. But we are seeing the servfail reason..
All of this leaves me a little lost as to next steps. I keep going back to this line. I know what fqdn stands for, but this collection of words together just doesn't make sense to me, and I believe it's the key to understanding what I need to do next. as always, thanks for everything and further guidance would be greatly appreciated.
-
dnssec-failed.org
Just for reference I see SERVFAIL for it via Google or others.
>dig dnssec-failed.org @8.8.8.8 ; <<>> DiG 9.16.44 <<>> dnssec-failed.org @8.8.8.8 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 64906 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ; EDE: 9 (DNSKEY Missing): (No DNSKEY matches DS RRs of dnssec-failed.org) ;; QUESTION SECTION: ;dnssec-failed.org. IN A ;; Query time: 120 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Mon May 13 10:38:02 Central Daylight Time 2024 ;; MSG SIZE rcvd: 97
https://bluecatnetworks.com/blog/the-top-four-dns-response-codes-and-what-they-mean/
"a SERVFAIL is the DNS server telling you, “Hey, I can’t give you the answer for that query.”" -
@SteveITS well yeah forwarding and trying to do dnssec is going to be problematic.. But that dnssec-failed.org should always fail.. It meant to fail.. As a way to validate your dnssec is working..
So yeah if you query any NS that is doing dnssec, google, quad9, etc.. then it would fail.. But if you query some NS that isn't doing dnssec than it would pass..
example
; <<>> DiG 9.16.50 <<>> @8.8.8.8 dnssec-failed.org ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 3602 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ; EDE: 9 (DNSKEY Missing): (No DNSKEY matches DS RRs of dnssec-failed.org) ;; QUESTION SECTION: ;dnssec-failed.org. IN A ;; Query time: 95 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Mon May 13 10:54:20 Central Daylight Time 2024 ;; MSG SIZE rcvd: 97
But if say ask something not doing dnssec..
$ dig @4.2.2.2 dnssec-failed.org ; <<>> DiG 9.16.50 <<>> @4.2.2.2 dnssec-failed.org ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39041 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 8192 ;; QUESTION SECTION: ;dnssec-failed.org. IN A ;; ANSWER SECTION: dnssec-failed.org. 300 IN A 96.99.227.255 ;; Query time: 52 msec ;; SERVER: 4.2.2.2#53(4.2.2.2) ;; WHEN: Mon May 13 10:55:08 Central Daylight Time 2024 ;; MSG SIZE rcvd: 62
This is another example where it makes no sense to check to use dnssec if your forwarding.. Either where you forward is doing dnssec already. Most of the major players do, some have some different IPs you can query that don't.. But pretty much all of them do dnssec. If where you forward does not do dnssec, asking for it in unbound settings isn't going to do anything other then more than likely cause failures..
-
@RickyBaker said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
All of this leaves me a little lost as to next steps.
The next step is to wait till you fail again.. You were seeing servfail - but we didn't know why or what was the reason for it. Now that you have enabled logging of servfail details.. Next time you have a problem - we can hope to see why.. And then address that..
Also have you updated to 2.7.2 yet? This should be your next step to be honest..
-
@johnpoz said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
Also have you updated to 2.7.2 yet? This should be your next step to be honest..
no i have not but I can prioritize. i know it SHOULD be easy and smooth but i'm so nervous. especially with it not updating by itself.
@SteveITS said in DNS_PROBE_FINISHED_NXDOMAIN sporadically for anywhere from 30secs to 10min. works flawlessly at all other times:
https://bluecatnetworks.com/blog/the-top-four-dns-response-codes-and-what-they-mean/
thanks this is a very useful article