Major DNS Bug 23.01 with Quad9 on SSL
-
Mmm, I assume you are seeing the same intermittent behaviour as other users? It's not failing for every query with that configuration?
-
@jimp I just re-checked that single setting. DNS appears to continue to work. Curious to see whether it starts to fail again at some point.
-
@isotope1842 said in Major DNS Bug 23.01 with Quad9 on SSL:
@jimp I just re-checked that single setting. DNS appears to continue to work. Curious to see whether it starts to fail again at some point.
Since you hit it once it's likely to fail again at some point, but nobody has yet to be able to pinpoint exactly when/why it happens.
I've been periodically checking my lab systems and they all just keep resolving no matter what I do. But they are lab systems so the load is considerably lower than it would be in a live environment.
-
@isotope1842 There are a few threads on this topic, or variations thereof, and in another one someone posted their problem seemed likely to happen when opening a group/folder of bookmarks/favorites at once...implying a higher number of simultaneous requests might trigger it.
I was also unable to replicate my issue by simply (re)checking the DNSSEC option, but I left it off as recommended.
-
After playing alot more - it might be an issue with Quad9's TLS DNS limiting responses more than anything.
While there is nothing helpful in the pfsense logs, Quad9 just appears to stop replying and then start responding again - almost as if there is a limit being imposed by Quad9 on requests - but of course pfsense must have some role as it never occurred before 23.01
This network is a very high traffic network, so maybe others that see the same thing manage high traffic networks as well - either way the only long term solution has been doing TLS DNS through Cloudflare
As another point - DHCP lease registrations is definitely not fixed as claimed in 23.01, unbound still likes to reboot too much to consider enabling it - as such at the most I am still only registering the clients that need it via static mapping.
-
Hmm, that would explain why we haven't been able to replicate it in test setups without the loading a production box has.
-
There were memory leaks and a segfault that were fixed. There is still ongoing work to eliminate the unbound reloads associated with lease registration events. I'm hopeful it will make it in 23.05.
-
@cmcdonald said in Major DNS Bug 23.01 with Quad9 on SSL:
There were memory leaks and a segfault that were fixed. There is still ongoing work to eliminate the unbound reloads associated with lease registration events. I'm hopeful it will make it in 23.05.
That is great to hear! Hopefully that finally is solved
-
@nononono said in Major DNS Bug 23.01 with Quad9 on SSL:
DHCP lease registrations is definitely not fixed as claimed in 23.01
Where is that stated? What it says is a "crash" was fixed when unbound restarting
https://redmine.pfsense.org/issues/11316?#note-79
Where does it say that unbound is not going to restart on dhcp registrations.. The issue with unbound restarting every time there is something going on with dhcp is ok for some users.. but if you have clients renewing lever few minutes then your still going to have issues with unbound and dhcp - even if they fixed some crash that could happen.
-
@johnpoz said in Major DNS Bug 23.01 with Quad9 on SSL:
What it says is a "crash" was fixed when unbound restarting
It may be referring to https://docs.netgate.com/pfsense/en/latest/releases/23-01.html "A long-standing difficult-to-reproduce crash in Unbound...It is now safe again to enable DHCP registration alongside Unbound Python mode in pfBlockerNG." ...which is talking about pfBlocker vs DHCP, not that DHCP registration won't restart Unbound anymore. (I've seen others post about this sentence as well...)
-
@steveits said in Major DNS Bug 23.01 with Quad9 on SSL:
someone posted their problem seemed likely to happen when opening a group/folder of bookmarks/favorites at once...implying a higher number of simultaneous requests might trigger it.
Playing around with various settings, and I don't know if this has anything to do with the quoted comment, but I raised incoming and outgoing tcp buffers in resolver adv settings from default 10 to 20. I believe (or imagine) that it has reduced the frequency of unbound 'hangs' on my system.
-
-
Just thought I would comment on this as I am seeing the same symptoms. It only just started happening recently and thought it was load related but not much has changed on my end except upgrading PFSense and associated packages.
I have turned parameter off this for now... Use SSL/TLS for outgoing DNS Queries to Forwarding Servers as I need the stability.
Summary:
- Using Cloudflare SSL/TLS on 23.01 (ipv4 / ipv6 resolvers)
- Build: 23.01-RELEASE (amd64) built on Fri Feb 10 20:06:33 UTC 2023
- System Patches package with recommended fixes applied including a manual patch for redmine #13851
- Machine Core(TM) i5-8365U, 16GB Ram, 128GB SSD
- Internet Connection 100/20
- Running a number of wireguard gateways (3 servers, both ipv4 and 6)
- Running a number of packages including...
According to PFBlockerNG graph, my dns reply stats are around 10-20K per hour during the busiest times.
Over the last week or two, uptime Kuma randomly alerted me to dns issues which is the same time my family complained they couldn't access any websites or their game would fail stating their was no internet. This would happen randomly and would take a couple of minutes to clear up again.
Fortunately I had graylog setup and unbound in debug mode for the most recent errors above.
Looking at the logs I note the following messages around the time where DNS briefly cut out...
Looks like PfSense is not processing DNS requests quick enough for me. The sample above occurred whilst the internet link wasn't even loaded, neither was my machine.
I have turned this setting off and will monitor... Use SSL/TLS for outgoing DNS Queries to Forwarding Servers
I haven't done any further debugging and I hope this helps someone out, if not I apologise for this long post.
-
@joedan said in Major DNS Bug 23.01 with Quad9 on SSL:
10-20K per hour during the busiest times.
In your home? how many devices / users - you have a lot shit banging its head trying to resolve stuff your blocking.. That seems really high for a home..
That is what like 240K to 480K queries in 24 hours.. I see like 50k in 24 hours, and that is with 28 active devices doing dns.. Many of them banging their heads looking for shit that is blocked.
-
I also had occasional dns failures when using quad9 dns. I simply turned off forwarding mode and have had no issues with root servers doing the work.
I find this to be non-ideal, but at least functional. -
Just reporting in here that I have the same issues with 23.01 and Quad 9 DNS when using unbound DNS
My settings were as follows on 22.05 since 22.05 was released with no issues with internet and DNS resolution.
-
pfsense ubound DNS resolver enabled
-
Using forwarding to Quad 9 DNS servers
-
DNSSEC unticked
-
Use SSL/TLS for outgoing DNS Queries to Forwarding Servers TICKED
Upgraded to 23.01 this weekend just gone 2nd April 2023. No other settings have changed on LAN network and no other firmware's have been upgraded on any devices on network so I am confident that the only change was the upgrade to pfsense from 22.05 TO 23.01.
Everything worked fine yesterday 3rd April until this morning when I now get multiple DNS resolution issues and cannot reach websites.
I have now changed the following single setting.
- Use SSL/TLS for outgoing DNS Queries to Forwarding Servers UNTICKED
Websites are now resolving correctly.
I will continue to monitor DNS resolution over the coming days to see if the issue reappears and report back.
-
-
@johnpoz said in Major DNS Bug 23.01 with Quad9 on SSL:
In your home? how many devices / users - you have a lot shit banging its head trying to resolve stuff your blocking.. That seems really high for a home..
Yes I have around 35 devices, a few that are quite bad including a Swann Security DVR that is taking up 25% of that DNS traffic calling out dropbox domains even though I don't have that function setup.
Just wanted to loop back and advise I did resolve my issue however it involved going back about 6 backup iterations and manually reapplying all my config changes one at a time and am now back on DNS over TLS (Cloudflare) with 23.01, and all recommended patches applied. I did also move ISP's during that time of panic so unfortunately have nothing further to contribute here because I was under pressure to get things working again quickly and made significant change in such a short time.
I was never using Quad9 though which appears to be an ongoing issue with other posters.
-
@joedan said in Major DNS Bug 23.01 with Quad9 on SSL:
a few that are quite bad including a Swann Security DVR that is taking up 25% of that DNS traffic calling out dropbox domains even though I don't have that function setup
Introduction : read this first.
If a device asks ever x seconds to resolve an identical host name, then, at first, your uplink isn't used : pfSense will just serve the same info edit : from local DNS cache, as log as this is TTL permitted end edit to the device again, and again. You're just burning some watt hours at your place.
But if the TTL is short, like mere minutes or less, then, yeah, the WAN gate is opened, and you start to activate a whole lot more of software/hardware on both sidesThe thing is : as I said in the other thread : if you were looking at the WAN gate of quad9 (because, lets imagine, you work over there) : what would you do if you saw that some IP was clearly misbehaving ? => You throttle ?! As "just scale up the system up once more" without any cash flow in return will kill your job.
So, on your home site : your entire DNS just got delayed.And that's of course a 'pfSense DNS bug'
-
I agree this could be Quad9 throttling responses. But still it would have been the same in 22.05.
-
@skogs said in Major DNS Bug 23.01 with Quad9 on SSL:
I find this to be non-ideal, but at least functional.
Not wanting to start a big base line debate but why is letting a resolver do resolving non-ideal? Letting your box do DNS resolving on its own without any upstream forwarder that can be censored is a much more resilient setup (as we see) then having a single point of failure like a big DNS forwarder that throttles or censors DNS lookups (or in case of quad9 is jurisdically forced to do so). Just to try to hide your DNS queries from your ISP?
Just asking because I'm interested.
-
@jegr said in Major DNS Bug 23.01 with Quad9 on SSL:
Just to try to hide your DNS queries from your ISP?
hahah - which isn't really happening anyway. So I do a dns query for www.amazon.com, which I want to hide from my isp ;)
So my isp doesn't see the dns query, but they see me go to IP address 1.2.3.4 which pretty obvious is amazon.com - and here is the big one.. They still see what you are asking for - its just not via dns..
Until such time that esni, which is dead - long live ech.. Is everywhere - you are just sending your sni in the clear in your handshake - so who exactly are you hiding your dns query from? And for what purpose?