Slow DNS after 22.05
-
@mikymike82 can you reiterate what your solution was, I've scrolled back but there's a lot of fluff in here so I can't seem to find it.
-
@johnpoz For answering your question (from my experience), its not just "facebook.com", its everything, from apps to normal websites, random not resolving websites and apps not working. So not a specific client, website, browser etc..., mobile, desktop, laptops, narrowcasting etc.. everything thats trying to resolve an adress.
Again my "solution" seems to resolve the problem at hand... but not "normal" behaviour in my opinion. -
@tentpiglet well where did it fail, www.facebook.com is a cname
;; ANSWER SECTION: www.facebook.com. 30 IN CNAME star-mini.c10r.facebook.com. star-mini.c10r.facebook.com. 30 IN A 157.240.18.35
With a 30 second TTL, etc. where did you go ask after, are you doing qname forced strict? A NX is a specific response from a NS.. Its not a timeout or a servfail - its a specific response saying hey what your asking for doesn't exist..
even a typo of 4 wwww returns an answer not a NX
;; QUESTION SECTION: ;wwww.facebook.com. IN A ;; ANSWER SECTION: wwww.facebook.com. 3600 IN CNAME star.facebook.com. star.facebook.com. 3600 IN CNAME star.c10r.facebook.com. star.c10r.facebook.com. 3600 IN A 157.240.18.15
If I ask for some gibberish, the I get back NX, from the AUTHORITATIVE SOA for that domain..
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 33101 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;lsjfdsldjf.facebook.com. IN A ;; AUTHORITY SECTION: facebook.com. 3600 IN SOA a.ns.facebook.com. dns.facebook.com. 220198737 14400 1800 604800 300
So to troubleshoot a NX you are getting we need to know the specifics.. Its not a opps unbound is not running currently, or I trying to ask 1.2.3.4 but they are not answering..
-
total.num.queries=77
total.num.queries_ip_ratelimited=0
total.num.cachehits=22
total.num.cachemiss=55
total.num.prefetch=0
total.num.expired=0
total.num.recursivereplies=55
total.requestlist.avg=0.272727
total.requestlist.max=4
total.requestlist.overwritten=0
total.requestlist.exceeded=0
total.requestlist.current.all=0
total.requestlist.current.user=0
total.recursion.time.avg=0.127636
total.recursion.time.median=0.0505173
total.tcpusage=0What do you make of this. Unbound has restarted again...
-
@cool_corona said in Slow DNS after 22.05:
Unbound has restarted again...
yup - its going to be horrible as a caching resolver if it restarts ever few seconds..
-
@mikymike82 said in Slow DNS after 22.05:
@Kempain @tentpiglet ; have you tried my suggestion for sesolving the issue (although maybe a temporary resolution), as stated i im running 1,5 week without problems at this moment
I'm actually not keen on forwarding my DNS unnecessarily but I can understand how it would resolve the issue since it would then bypass unbound.
I was planning on doing a fresh install until I heard you're still experiencing issues after doing that.
Still might to rule it out for me too though.I've been monitoring unbound and it's been going strong for 13371 seconds now!
Memory usage of unbound doesn't seem to be particularly high although it is increasing slowly which is probably to be expected.Just set logging to level 5 to try and capture more info although not sure the differences between 4/5 will help as it seems to be related identifying which client is having an issue and I know I'm experiencing it across device types.
This has meant that unbound has been restarted so will see how it goes... -
unbound-control -c /var/unbound/unbound.conf stats_noreset | grep total total.num.queries=159 total.num.queries_ip_ratelimited=0 total.num.cachehits=61 total.num.cachemiss=98 total.num.prefetch=0 total.num.expired=0 total.num.recursivereplies=98 total.requestlist.avg=1.11224 total.requestlist.max=31 total.requestlist.overwritten=0 total.requestlist.exceeded=0 total.requestlist.current.all=0 total.requestlist.current.user=0 total.recursion.time.avg=1.462659 total.recursion.time.median=1.03385 total.tcpusage=0
Unbound has only just restarted so take those with a pinch of salt.
Interesting @tentpiglet mentioned 'DNS_PROBE_FINISHED_NXDOMAIN' errors in the browser because I've also been experiencing those at the same time as having DNS issues so I believe they are in some way related.
-
@kempain Same here, but in my production environment i can only troubleshoot so much.... im very curious if you can find anything else, rather then my workaround.
-
Mine is at home fortunately not in a corporate environment although I do have the wife's complaints to contend with
I have the image on USB ready to go with my backup in conf so I should be able to get back up and running pretty quickly once I finally decide to bite the bullet and do a re-install.
Just a bit wary of blowing out my settings because I'm using HAProxy and bunch of certs for internal services.
Relatively new to pfSense so don't want to F it up and spend all night fixing it. -
Could it be an issue with cache?
-
Doing more nslookups from client during the issue I noticed a few things that seem pretty consistent.
My initial request/s to pfSense seem to timeout despite my client knowing the IP of pfSense.
If I keep placing requests, eventually I get a response, and usually only to IPv6 first.
Then in the next response both IPv6 and IPv4 after more timeouts.After it does fully resolve, subsequent requests seem ok for a while.
It seems like it takes a few tries to resolve some un-cached addresses sometimes.Unbound is not restarting at this time as I can see it's been running for a while now.
nslookup youtu.be Server: pfsense.localdomain Address: 10.x.x.x DNS request timed out. timeout was 2 seconds. DNS request timed out. timeout was 2 seconds. *** Request to pfsense.localdomain timed-out nslookup youtu.be Server: pfsense.localdomain Address: 10.x.x.x DNS request timed out. timeout was 2 seconds. DNS request timed out. timeout was 2 seconds. DNS request timed out. timeout was 2 seconds. Name: youtu.be Address: 2a00:1450:4009:81e::200e nslookup youtu.be Server: pfsense.localdomain Address: 10.x.x.x DNS request timed out. timeout was 2 seconds. DNS request timed out. timeout was 2 seconds. DNS request timed out. timeout was 2 seconds. Name: youtu.be Address: 2a00:1450:4009:81e::200e 142.250.180.14
version: 1.15.0 verbosity: 5 threads: 4 modules: 2 [ validator iterator ] uptime: 19646 seconds options: control(ssl) unbound (pid 96286) is running...
total.num.queries=10304 total.num.queries_ip_ratelimited=0 total.num.cachehits=2806 total.num.cachemiss=7498 total.num.prefetch=0 total.num.expired=0 total.num.recursivereplies=7497 total.requestlist.avg=3.25433 total.requestlist.max=39 total.requestlist.overwritten=0 total.requestlist.exceeded=0 total.requestlist.current.all=3 total.requestlist.current.user=1 total.recursion.time.avg=9.417250 total.recursion.time.median=0.423254 total.tcpusage=0
-
IMHO, this says to me :
nslookup tries to contact a fist DNS server, after the time out, it decided it can't.
A next DNS server is tried. It can't neither.
A third one is tried (pfsense ?) and this time there is an answer.What do you have here ( Dashboard System information ) :
?
You have unbound running with maximum log details ?
Ok to debug, but think about putting that back to default as soon as possible.
Max logd details will overflow the (small) max log file size, so it will get rotated often == even more system and disk resources used.Run on the command line
grep 'start' /var/log/resolver.log
and try the settings I showed above, under Services > DNS Resolver > General Settings, remove the check from :
DHCP Registration
OpenVPN ClientsThese two should be unchecked if you use pfBlockerng-devel anyway.
Wait a day or so and run the command again.
unbound will also restart on interface events, like a WAN that changes his IP. Or some other interface goes down and up. These events can be seen in the main system log.
@tentpiglet said in Slow DNS after 22.05:
My wife was also reporting random disconnects from an online game she was playing,
This might be a red flag.
Game playing involves no DNS interfaction.
Its here PC/device against the male server. If this connection gets interrupted, then the issue is : you have a bad connection.
It could be local, like : the wifi is plain bad. Easy to test : that issue goes away as soon as you roll out a cable.
Or worse, your ISP uplink isn't as good as you think it is.
A bad uplink would also explain unreachable remote DNS servers. -
@gertjan said in Slow DNS after 22.05:
What do you have here ( Dashboard System information ) :
Thanks for the reply @Gertjan
I have pfSense itself (127.0.0.1) and 2 remote name servers from quad9 listed, although my understanding is that those aren't actually used anyway because I have it set to use local ignore remote. Interestingly I don't have the IPv6 listing that you have but I may have disabled something there?
I don't see any unbound restarts (start of service messages) when checking the logs:
Unbound has been up for about 15 hours now:
I've removed the external DNS now just to rule that out so it should just be using local now.
From what you said, it sounds like it's failing on all DNS servers if it's trying one after the other, as you can see there are occasions it times out twice and also three times. Not sure why it would bomb out after 2 attempts if it is running through the list of DNS servers though because there are 3 and the 2 remote are supposed to be ignored anyway from my understanding.
What I find strange is it seems to be having issues contacting pfSense itself like pfSense isn't responding at all (or DNS on pfSense at least).
-
@gertjan said in Slow DNS after 22.05:
DHCP Registration
OpenVPN ClientsForgot to mention these were already disabled and have always been on my box.
-
@kempain said in Slow DNS after 22.05:
total.recursion.time.avg=1.462659
total.recursion.time.median=1.03385Your avg to resolve something is 1.5 seconds? Well that is going to be problematic for sure.. clients timeout normally at 2 seconds.
Then you have this
total.recursion.time.avg=9.417250
Yeah you have a problem with talking to the NSs for sure..
-
@kempain
Ok, good info.
Unbound is running for 15 hours already.
So no 'very frequent' restarts.Still, strange.
unbound should be listening on port 53 of every LAN interface.
Run thissockstat | grep 'unbound'
to check.
Can you also check the device where you have run nslookup ?
Start withipconfig /all
You should have a line with :
Serveurs DNS. . . . . . . . . . . . . : 192.168.1.1 2001:470:1f14:5d0:2::1
where 192.168.1.1 is the default pfSense IP. The IPv6 is my IPv6 LAN IP, as I'm using both IPv6 and IPv6 without issues.
Or was this :
nslookup youtu.be Server: pfsense.localdomain Address: 10.x.x.x DNS request timed out. timeout was 2 seconds. DNS request timed out. timeout was 2 seconds. *** Request to pfsense.localdomain timed-out
executed on pfSense ? In that case, you can see that nslookup is using 127.0.0.1 and 9.9.9.9 and 149.112.112.112 and two of them are 'unreachable'.
If nslookup was run on pfSense, and it couldn't reach unbound on 127.0.0.1 that I would ditch the entire system. As this is close to impossible.
Do you have :
?
I know, other choices are available, like :but I would kill DNS on my system and internal networks. I'll leave it up to you to understand why
And drop this one :
to a lower level, like "1"@kempain said in Slow DNS after 22.05:
Forgot to mention these were already disabled and have always been on my box.
Yeah, I get that ;)
Your unbound isn't restarting every minute or so - it restarted 15 hours ago, that's is ok ;) -
@gertjan said in Slow DNS after 22.05:
Game playing involves no DNS interfaction.
Sorry for spamming replies.
I do seem to get connection issues to all services not just web requests. This includes games also when connecting to game servers/lobbies usually which I assume must have to do some kind of lookup?
-
@kempain you can not expect to run a good resolver if your avg time to resolve is 9 seconds.. You can't it just not going to work..
-
@gertjan said in Slow DNS after 22.05:
sockstat | grep 'unbound'
Potential problem here? I see port 953 for local.
@gertjan said in Slow DNS after 22.05:
Serveurs DNS. . . . . . . . . . . . . : 192.168.1.1
2001:470:1f14:5d0:2::1Yup I get back the pfSense IP only (no IPv6) when running this on my client.
@gertjan said in Slow DNS after 22.05:
to a lower level, like "1"
I'll lower the logging level just had it jacked up to capture as much as possible when troubleshooting.
@johnpoz said in Slow DNS after 22.05:
@kempain you can not expect to run a good resolver if your avg time to resolve is 9 seconds.. You can't it just not going to work..
Thanks @johnpoz
Any ideas how I begin to troubleshoot this issue? I'm running on a Netgate SG-5100 so should be ample resources for what I'm doing with it which is pretty basic stuff.
I've removed those external NS now so let's see if that improves things.
I have DNSSEC support enabled but I think we determined that's ok earlier in the thread as long as you're not forwarding DNS?
-
@kempain I would do a dig +trace on you pfsense box - where is having issues talking to in the chain..
[22.05-RELEASE][admin@sg4860.local.lan]/root: dig www.google.com +trace ; <<>> DiG 9.16.26 <<>> www.google.com +trace ;; global options: +cmd . 26136 IN NS d.root-servers.net. . 26136 IN NS e.root-servers.net. . 26136 IN NS f.root-servers.net. . 26136 IN NS g.root-servers.net. . 26136 IN NS h.root-servers.net. . 26136 IN NS a.root-servers.net. . 26136 IN NS i.root-servers.net. . 26136 IN NS j.root-servers.net. . 26136 IN NS k.root-servers.net. . 26136 IN NS l.root-servers.net. . 26136 IN NS m.root-servers.net. . 26136 IN NS b.root-servers.net. . 26136 IN NS c.root-servers.net. . 26136 IN RRSIG NS 8 0 518400 20220822170000 20220809160000 20826 . flMR2MvFJojKfO7Ys2tb+EWssvMaUJSejiq3L4Y5Sy288C16sZOrhvdF GWY0+gtUl5C40kBOnkVdqYDgBfSjbpmPn+NWbkJZMTqwjmz2QN+28MsR bplhWzeu8HghrmCLZkBrmLoTe1e1DXNWbvQbQfmLkBSQmwqntw1QX3GZ UleYopxJngO6hdEQOXbsrcdnfSVUOgxf4wwbS9XZugf1xpaSbKIPCBFd 1xGKWFr9MbcIzogRIDKH4RPZAt8aRIkwZMh53ZaZ2Zo+9n63MdI5N6zB WJlNpJqmFxhE8wVEegqm++2lQSsqYmDikIpCxn/+P8FXDr+vc6erU9jT OP9sFw== ;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 86400 IN DS 30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766 com. 86400 IN RRSIG DS 8 1 86400 20220823050000 20220810040000 20826 . clI/vPSk2t2LVd1WtfHI8VklYtaUPAOK/8Sr30o0VjGp7xZXZ5EhlVRz YarbopAZ+8yOIwwbIl82ByVZFf/qJEprOKW6TiG8goIoPBG6jghCoglV p4IqNUVqqwpJpAxNKbOA0cOeOr6qTwqugtJnU5J7TGG4QBi63KjYBoin gyYxaKkV/fvK9njoqxbn2hzaKB08mLlYj/9TKxS285UaxbMxYfPDJejQ FA+33Y+KjJlGhdYrCFy/o/JW+YKrfmLrMs2C3+6XGUFFDGSN9WUCF59j LMuQ0ZVAmZQkodwaKM0L1dojQtJU73fEnRkm/1ZhhiDYkQv/HPFHM3ur f3bWww== ;; Received 1174 bytes from 2001:500:2d::d#53(d.root-servers.net) in 28 ms google.com. 172800 IN NS ns2.google.com. google.com. 172800 IN NS ns1.google.com. google.com. 172800 IN NS ns3.google.com. google.com. 172800 IN NS ns4.google.com. CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q2D6NI4I7EQH8NA30NS61O48UL8G5 NS SOA RRSIG DNSKEY NSEC3PARAM CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20220815042401 20220808031401 32298 com. If06bSVXL7llnV+iyjrWR5yStSfeLZzeCKgsfVsqNQKKnP35dCsaGffz dRWOIGK+WzMKxVCiJZLiNyG5iYR9RybjH9jXMhDyYqro3M8eplcZtHnd DN0XXqhP/UDjMThDkJHxFERmmzraaU1wQLMcse/uOLMzmZVmOalWkbC+ TtXIfd8f4KB+h/M6X23C9UZ7oyYI78gqwS3Rq0fKInDxXA== S84BKCIBC38P58340AKVNFN5KR9O59QC.com. 86400 IN NSEC3 1 1 0 - S84BUO64GQCVN69RJFUO6LVC7FSLUNJ5 NS DS RRSIG S84BKCIBC38P58340AKVNFN5KR9O59QC.com. 86400 IN RRSIG NSEC3 8 2 86400 20220816051414 20220809040414 32298 com. BI5XFgxbzwBugDjaV04ygejNaUOMRGmTaJvdufqfRnJPaDEHLnqVpw7p 8UdjQnXLbtO4Fns2BpPOTD9DSSaWGRjxeZxlb6Rwxw1n4RGmJGe9QyaI f/zBXzn69uRXpgeRP6FFBmUuFCb7OQTBqoReLat+3fwKkebSv7epenW1 SvO4dXilsSZzTAUN8RvIdz9SgkBe+QxG8TiioAFYuTqdkw== ;; Received 840 bytes from 192.5.6.30#53(a.gtld-servers.net) in 13 ms www.google.com. 300 IN A 142.251.32.4 ;; Received 59 bytes from 216.239.34.10#53(ns2.google.com) in 22 ms [22.05-RELEASE][admin@sg4860.local.lan]/root:
Notice the times to get answers from the different NS in the chain - they are only a few ms..
Here with almost 80k queries my avg is less than 0.1 of a second.
total.recursion.time.avg=0.088754 total.recursion.time.median=0.0410133