Slow DNS after 22.05
-
So in my morning routine of going to my different sites, reading morning news, viewing threads and looking for stuff.
In general I didn't really have any major issues. But I did run into something odd, trying to go to support.apple.com
This is a horrible dns setup if you ask me..
;; QUESTION SECTION: ;support.apple.com. IN A ;; ANSWER SECTION: support.apple.com. 3589 IN CNAME prod-support.apple-support.akadns.net. prod-support.apple-support.akadns.net. 3589 IN CNAME support-china.apple-support.akadns.net. support-china.apple-support.akadns.net. 3589 IN CNAME support.apple.com.edgekey.net. support.apple.com.edgekey.net. 21589 IN CNAME e2063.e9.akamaiedge.net. e2063.e9.akamaiedge.net. 3589 IN A 23.218.161.64
that is a lot of cnames in the chain..
And a +trace was showing me a failure to get NS for
couldn't get address for 'a.ns.apple.com': not found couldn't get address for 'b.ns.apple.com': not found couldn't get address for 'c.ns.apple.com': not found couldn't get address for 'd.ns.apple.com': not found dig: couldn't get address for 'a.ns.apple.com': no more [22.05-RELEASE][admin@sg4860.local.lan]/root:
And i was getting a SERVFAIL
$ dig support.apple.com @192.168.9.253 ; <<>> DiG 9.16.30 <<>> support.apple.com @192.168.9.253 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 61133 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;support.apple.com. IN A ;; Query time: 1 msec ;; SERVER: 192.168.9.253#53(192.168.9.253) ;; WHEN: Thu Aug 11 07:15:19 Central Daylight Time 2022 ;; MSG SIZE rcvd: 46
I flipped back on the do-ip6: no setting - and no issues with that site..
Was this some issue with unbound and IPv6? I don't think so - because the dig +trace would have zero to do with unbound..
And doing another +trace it seems to be working fine - and notice those were obtained via IPv6 transport
apple.com. 172800 IN NS a.ns.apple.com. apple.com. 172800 IN NS b.ns.apple.com. apple.com. 172800 IN NS c.ns.apple.com. apple.com. 172800 IN NS d.ns.apple.com. CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q2D6NI4I7EQH8NA30NS61O48UL8G5 NS SOA RRSIG DNSKEY NSEC3PARAM CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20220815042401 20220808031401 32298 com. If06bSVXL7llnV+iyjrWR5yStSfeLZzeCKgsfVsqNQKKnP35dCsaGffz dRWOIGK+WzMKxVCiJZLiNyG5iYR9RybjH9jXMhDyYqro3M8eplcZtHnd DN0XXqhP/UDjMThDkJHxFERmmzraaU1wQLMcse/uOLMzmZVmOalWkbC+ TtXIfd8f4KB+h/M6X23C9UZ7oyYI78gqwS3Rq0fKInDxXA== S0NU5UHI013IVS6TTUSM4FC3OQJB2EJF.com. 86400 IN NSEC3 1 1 0 - S0NU9MFCV1Q1NSBMAPQMU0H7MHN47JDT NS DS RRSIG S0NU5UHI013IVS6TTUSM4FC3OQJB2EJF.com. 86400 IN RRSIG NSEC3 8 2 86400 20220818054103 20220811043103 32298 com. S3RZuwhflaFxPtxa1gjp5aRyT31BpAfWm9hdOXwx+JSGmHTyGfDaSCMa coQlorU8pjmvpGdQ4SnQKWkj81DdjIo7aCu9BeKRGG7wDkulttlovYcp vrg2qmHQZIEIP5pdjk8Ht15AyzZIi9vxcx6tg51WzkMV2SApd/AUFMzM 13iULU+d5+tll6yK6iGyD9ZT17SOTGOsZXFroo0CehBOCw== ;; Received 838 bytes from 2001:502:1ca1::30#53(e.gtld-servers.net) in 34 ms
What does this tell us - not sure as of yet. But clearly I had an issue with a +trace pulling info, and this was just for the 1st cname in the long chain required to get to the actual IP for support.apple.com
And unbound is not involved in doing a dig +trace.. Other than pulling the roots. when the client asks loopback.
I will leave do-ip6 no set for now, and see if run into any more issues, and then will flip it back off and allow ipv6 transport and see if see problems again.
My advice for anyone having issues with dns currently is set the no-ipv6: no for unbound - does this make it better?
-
Is the unbound ipv6 setting "do-ip6" or do-ipv6" ?
I've seen in these forums, that some say "do-ipv6" and others "do-ip6", also google search results mostly "do-ip6"...
-
Maybe both work but I have
-
@mvikman it is
My bad if was saying do-ipv6, guess habit of ipv6 which is common representation of it.
And seems @Gertjan has it listed as do-ipv6 in his post.
https://nlnetlabs.nl/documentation/unbound/unbound.conf/
I will go through my posts and make sure fix any of my typos - great catch
-
Nice catch !
You should not automatically believe what people propose here. Especially me included.
After all, this is unbound land, not really related to pfSense.Settings, mentioned in the pfSense doc, or not, could influence your security, you should do the new thing : fact checking for yourself.
For example : the FreeBSD official man pages : unbound.conf
Or go to the source : the doc from the author : https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html
Real dingos would check the code and compile from source.The image I've shown above isn't dangerous. unbound would simply bail out on the syntax error.
So it is
server: do-ip6: no
and check that
edit !
@johnpoz said in Slow DNS after 22.05:
And seems @Gertjan has it listed as do-ipv6 in his post.
Yeah, was looking just fine was it ?
I was cutting corners, wanted to do it fast, and didn't really test (apply) it.
As it looks so simple and innocent.
I've tested it now.
This is what I got :and yes, if parameters are not recognized at the pre config test, then if will show a message.
pfSense uses the unbound config test tools to check first. Nice. -
@gertjan yeah it would scream at you if something wrong in your config options text box.
i think I have edited all my mistakes of using the wrong setting name in actual post text.
As check - if you look in your resolver status menu item in status - if you see IPv6 in the cache - then setting is not there. See my above examples showing the IPv6 address in the resolver status..
Now that I have put it back to no, no IPv6 listed there.. And have yet to see any issues in browser having issues with sites, etc. But then again, I only found that 1 issue when I was allowing use of my HE tunnel.. But different users go to different sites, etc. So if there is an issue with transport over ipv6, it might be so pronounced for some users that to them dns is just borked, or could be for others 1 out of 100 sites or something and simple refresh of page and it works so they don't even notice it, etc.
My recommended settings for resolving would be to
turn on prefetch
serve 0These can help reduce any sort of first query delay when a resolver has to start from scratch and work all the way down from roots. Should they be required no, are they default - I don't think so. But prefetch can allow for something that is close to expiring in the cache to get a refresh before someone asks for it and cache is completely empty and has to be resolved from scratch again.
Same thing with serve 0 ttl - this allows for a client asking for whatever.something.com to get the last entry that was cached for it, and then in the background unbound will resolve it again.
This can also reduce any sort of delay with initial resolve all the way from roots.
Keep in mind even a full resolve all the way from roots shouldn't take very long - but these settings should help if whatever reason there is delays in a full resolve, long cname chains, connectivity issues with specific NS in the path to get to the specific fqdn, etc.
I have been using these for years, and have not seen any problems with them.
-
Isn't disabling IPv6 in unbound just masking the problem? I may be a great work-around, but it seems a little short-sighted.
This behavior changed between 22.01 and 22.05 due to unbound itself changing. Since pfSense defaults to it instead of bindd then that dependency is clear and the problem needs to be escalated properly for review.
Yes, we're in a IPv6 transition period and yes, today, you MIGHT be able to disable it and walk away, but I would consider it a bit of a head-in-the-sand response.
What is the proper channel for escalating observations to the unbound team?
-
@lohphat said in Slow DNS after 22.05:
What is the proper channel for escalating observations to the unbound team?
Hello!
Isnt the fix for the do-ip6 workaround (as derived from https://github.com/NLnetLabs/unbound/issues/670) already in the pfsense development snaphots?
John
-
@lohphat said in Slow DNS after 22.05:
Isn't disabling IPv6 in unbound just masking the problem
See the "670" link below.
Short answer : yes, of course, it's a sledge hammer solution.
But read the bug thread @redmine from unbound : it looks like buffers fill up to the max, and outstanding requests get aborted. Reading the patch proposed also makes me think the code wasn't isn't resilient enough. The issue might even be bad DNS record / bad DNS setup on the DNS server side. And the result is : unbound bails out.
The "670" mentioned other variables you can set to bigger values so cache memory becomes bigger.IPv6 isn't always well implemented. Or less tested. And there is more to test.
@serbus said in Slow DNS after 22.05:
Isnt the fix for the do-ip6 workaround (as derived from https://github.com/NLnetLabs/unbound/issues/670) already in the pfsense development snaphots?
If that unbound version can be back ported to 12.3 or whatever FreeBSD it has to use.
Netgate isn't using the latest and greatest. As the latest has also less known bugs ^^@johnpoz said in Slow DNS after 22.05:
I only found that 1 issue when I was allowing use of my HE tunnel..
In the past, I had way more issues using he.net.
The web site of my own ISP, Netflix, and yes, Apple (the cloud stuff) : when I was using IPv6 ( using he.net ) pages won't load, stalled, CSS was broken.So I used pfBlockerng-devel with the No-AAAA option, so I could list sites to which I wanted to talk to using IPv4 only.
That now over :
My Netflix works well over tghe he.net ipv6 tunnel. More and more sites are IPv6 without any IPv4 pollution.
And remember : the IPv6 of he.net can be considered as a VPN access.
These are the access points : https://tunnelbroker.net/status.php. I took the closest to me of course, but I could take an tunnel server in the states, and then I would have a US IPv6 ?
No need to recall : some sites don't like to be accessed using VPN !? :)@johnpoz said in Slow DNS after 22.05:
turn on prefetch
serve 0Yeah, reading the description made me think : these are to good to be true !
I've checked these from day 1. -
@johnpoz said in Slow DNS after 22.05:
My recommended settings for resolving would be to
turn on prefetch
serve 0Would you recommend these instead of 'do-ip6: no' or as well as?
Happy to play around with settings and see what impact it has.
Strangely lost internet last night and didn't come back on it's own. I had to bounce pfSense and all my other networking gear to get it back for some reason. I did noticed the dpinger gateway monitoring service had died.
Suspect all my kit was overheating because it's pretty hot here in the UK at the moment and my makeshift comms cabinet has limited airflow. One of the things on my very long list to sort out! For now I've strung a load of computer fans together and chucked them in there which seems to be doing the job for now
-
@gertjan This is the reply I was hoping for. Thank you.
It's clear there IS a problem with unbound running out of resources and thus is affecting 22.05. My hunch that all (or close to all of) the reports of broken DNS had IPv6 enabled as a common symptom has been corroborated.
The fact that there's been a significant and open unbound bug since April is interesting that this known problem wasn't somehow included in a "known issue" version release notes of pfSense is of minor concern. We all know there are open issues in all products in our modern software driven world. But we shouldn't have to make upgrade decisions blindly.
Could this be a wake-up call that upon pfSense releases that there's an inclusion of known open issues from the upstream BSD or component (e.g. unbound) bug trackers so that we can make a better informed decision to apply the update or not?
-
Okay all,
After alle the comments and usefull information provided we can come / came to the conclusion that the essence lies within IPv6, DNS (unbound) and 22.05.With that in mind i reversed my previous temporary solution of for enabling query forwarding (essentialy forwarding all my requests to an upstream provider). Essentialy back to basic.
With a lot of settings, changes, and some involved headache because my provider isn't that informative when it comes to having your own firewall/router. I configured IPv6 to completly function. Testing this with different websites like https://ipv6-test.com/ and https://test-ipv6.com/ i cloud confirm that my configuration was a success.
I can fully confirm that with a non working IPv6 configuration or a provider supporting that.... that you should look at elliminating IPv6 from your current config as suggested in this forum post.
Ok... i must say i have been only running for an hour... but all seems fine now.
So my suggesttion would be to check if your provider supports IPv6, if so, check your settings and follow the test websites to see if your resolving ok and your config is as expected.... if yes; you are probaly in the clear and smiling, if not... then;
Option 1: start all over again with your IPv6 config as i did severall times (TEST TEST TEST)
Option 2: just follow the instructions for diabling IPv6 in the resolver and wait for your provider to fully support IPv6 as they should.Basicly: Standard PFsense configuration with a good ISP IPv6 config.
-
@johnpoz said in Slow DNS after 22.05:
@pcol-it-admin said in Slow DNS after 22.05:
said that they had "stock" pfSense DNS resolver settings
I find this is rarely the case to be honest..
I run pfSense in a proxmox VM on a Dell workstation. These are my DNS settings, which exhibit the issue.
As previous indicated I believe the only two "non-stock" options are the ones related to dhcp.
My problem is 100% reproducible. If I use the pfSense resolver after a random period of time I still start to get NXDOMAIN errors in Chrome for common websites. Hitting refresh/reload a couple of times will clear the error and the page will load. It's not because I have fat fingers and am typing facebook.c0m rather than facebook.com.
This is less of an issue for me as I simply spun up a pi-hole lxc on my proxmox server and redirected all my dns inquires to the pi-hole (which has no issues w/ resolution), but obviously not everyone has this option.
-
@tentpiglet
Do some more testing with this option removed :as it is perfectly normal to see NXDOMAIN popping up ones in a while : unbound is restarting because of DHCP leases activity.
Add some DHCP MAC static leases for devices that you always want to have the same IP, like printers, servers, NAS etc. -
@gertjan
As my tests and conclusions, with al your and others help made IPv6 the conclusive problem of this build when you have not correctly configured IPv6.Ruining your setup with other settings may make reverting back a lot less harder for some.
-
@tentpiglet
Did you read my post?
Can you verify that your IPv6 setup is correct?
You can check on advance by forwarding al you dns requests in the resolver to your providers dns servers.When you have a working IPv6 connection you probably can revert to basic configuration.
If not…. Then just use the no-ip6 option in the resolver
-
@lohphat said in Slow DNS after 22.05:
The fact that there's been a significant and open unbound bug since April
.....
...upon pfSense releases that there's an inclusion of known open issuesI can only say : it looks like what's descibed there.
For me : Itr's an OpenBSD thing. If it was an 'any' BSD bug, then why specify OpenBSD ?
The bug was also closed back in Avril 2022.
Also : I'm using the IPv6, and do not have any issues what so ever.What should Netfgate have to do : list every closed bug from an external package in the past as "maybe not solved yet" ? That would be thousands of entries.
I saw you posted you posted to the bug report @unbound.
You should do what has been asked many times over there : you should add complete (very detailed) detailed unbound logs, so the author can see what's up and confirm what happened.
Right know, they (the author) will say : use the unbound version with the merged solution
included, and that's not possible right now.All this IMHO of course.
-
@mihaifpopa said in Slow DNS after 22.05:
Anyone else experiencing this?
This has been an amazing post... I got my issues fixed with the contributions of everyone, and in that process I got to learn how to debug dns unbound issues and get IPv6 working in my lab.
@Gertjan contributions have been great - made me want to start looking at Server Monitoring with Munin.
-
Can you verify that your IPv6 setup is correct?
ip6 is functioning. My wan has a 2001: address and clients on my network have a 2601: address. I can ping 2001:4860:4860::8888 from any of my network clients.
-
@tentpiglet did you try the tests on the suggested websites as well…. This will sometimes give you a bit more insight.