Random DNS Resolver failure with Quad9 over SSL
-
@bmeeks said in Random DNS Resolver failure with Quad9 over SSL:
more current than the daemon running in
Your post rang a bell...there was actually this, but it was fixed in 2.7.0/23.09:
https://redmine.pfsense.org/issues/14056FWIW I've been using SSL at home to Quad9 without noticeable issues on 24.03 and now 24.11.
-
@SteveITS said in Random DNS Resolver failure with Quad9 over SSL:
@bmeeks said in Random DNS Resolver failure with Quad9 over SSL:
more current than the daemon running in
Your post rang a bell...there was actually this, but it was fixed in 2.7.0/23.09:
https://redmine.pfsense.org/issues/14056FWIW I've been using SSL at home to Quad9 without noticeable issues on 24.03 and now 24.11.
Yeah, I didn't go back and research the history, but I do recall some early bugs in the DNS Resolver that got fixed with later releases. My comment was intended to remind folks there are multiple versions of some of these "problematic" binaries out there (
unbound
andkea
), and something that works just fine on a recent 24.11 Plus installation may indeed behave differently on an older version. -
@bmeeks said in Random DNS Resolver failure with Quad9 over SSL:
got fixed with later releases
wait .. no one here is using older, buggy versions, right ?
I presume @digitalgimpus uses 2.7.2. I've been using 2.7.2 half a year or so, and did some Quad9 testing. Worked just fine IRC, and I'm using pfSense with a load of hotel clients behind it, so if there was an issue they would have told me about it. After all, if a free service doesn't work, that would be considered as inadmissible (here in Europe), they would have asked for a a refund right immediately.
-
So here's how I have things configured, I don't think there's anything particularly unique going on here.
I've tried just the IPv4 hosts, I've tried IPv6 only, I've tried both.
-
Unbound does time out from time to time, use service watchdog to restart it. you will not see anymore glitch.
-
@nattygreg I have watchdog running. It sees the process as still up when this happens.
There are other times where unbound crashes or whatever and watchdog restarts successfully. If unbound was crashing it would be less problematic. The problem here is a prolonged dns outage because watchdog doesn't know to do anything.
-
What is your pfSense version ?
Then here is a solution : do not do this (uncheck !) :
as every time a lease comes in on any LAN type interface, unbound gets restarted.
If you have have many LAN devices, and/or devices that are using wifi, this can get create a "x times a minute" unbound restart and as unbound needs anything from 5 to xx seconds to restart, you will get the impression unbound isn't running at all. And correct, its restarting all the time.
And the moment it is running, the service watchdog find it wasn't running a second ago, and adds its doses of restarts.
This issue is very known and very old, and a solution will be coming very soon now : pfSense 2.8.0 (pfSense Plus already has the solution). -
@keyser i agree here with you and I'm glad I'm not the only one. When i use Quad9, DNS breaks. No rhyme or reason. It can be a week or 2 weeks after i make the change. All external DNS resolution fails. At first it was weird where certain sites wouldn't resolve such as anything related to Microsoft. Ok strange but then after some time, everything external stopped resolving. This is only with Quad9.
I have since switched to Cloudflare - no issues.
-
@Gertjan said in Random DNS Resolver failure with Quad9 over SSL:
as every time a lease comes in on any LAN type interface, unbound gets restarted.
If you have have many LAN devices, and/or devices that are using wifi, this can get create a "x times a minute" unbound restart and as unbound needs anything from 5 to xx seconds to restart, you will get the impression unbound isn't running at all. And correct, its restarting all the time.
And the moment it is running, the service watchdog find it wasn't running a second ago, and adds its doses of restarts.
This issue is very known and very old, and a solution will be coming very soon now : pfSense 2.8.0 (pfSense Plus already has the solution).I don't think this is the problem here.
If I use my ISP's DNS, or Google or CloudFlare, this isn't an issue. Only Quad9 requires a manual restart.
If this was the culprit, it should happen with all upstream providers regardless.
-
@digitalgimpus said in Random DNS Resolver failure with Quad9 over SSL:
I don't think this is the problem here
No need to think ^^ Fact check.
[25.03-BETA][root@pfSense.bhf.tld]/root: grep "start" /var/log/resolver.log .... <30>1 2025-02-26T09:46:43.449085+01:00 pfSense.bhf.tld unbound 22263 - - [22263:0] info: start of service (unbound 1.22.0). <30>1 2025-02-26T10:02:58.437287+01:00 pfSense.bhf.tld unbound 44152 - - [44152:0] info: start of service (unbound 1.22.0). <30>1 2025-02-26T15:19:51.097535+01:00 pfSense.bhf.tld unbound 10684 - - [10684:0] info: start of service (unbound 1.22.0). <30>1 2025-03-03T00:15:23.627116+01:00 pfSense.bhf.tld unbound 65579 - - [65579:0] info: start of service (unbound 1.22.0).
If your resolver(unbound) restarts a coupe of times a day, you'll be ok.
Several times per hour or even more : that's less optimal, or plain bad. Read again what has been said above ...
That said, using the watchdog and "Register DHCP leases in the DNS resolver" introduces race conditions. Many have tried and they all lost. See forum : hundreds or more posts about this subject) -
@Gertjan did you patch unbound they are patches for that, that will stop inbound from restarting every time it gives out a lease. Do that first. Then set it up in watchdog. Yes a few times per day, it will restart, I know because it says connection loss when I’m streaming and it depends also sometimes unbound stops and will not restart therefore throwing everyone off the network.
Use the patches you can download patch in packages and then it will show recommended patches just install all.
-
Read ... please.
I'm using 23.05-Beta, which is the latest and greatest.
All know patches are included in that included of pfSense. Maybe not the ones discovered after 5 February 2025.To stop unbound from restating when new leases coming in, are when leases are renewed, uncheck "Register DHCP leases in the DNS resolver".
After all, by default, that option is not checked (by Netgate).This situation is known since ... can't remember, 2012 ?!!
@nattygreg said in Random DNS Resolver failure with Quad9 over SSL:
because it says connection loss when I’m streaming
When the resolver restarts this will not influence or even break any connections already established.
After all the resolver (unbhund) handles DNS, which exists for us, humans. Your TV, phone, Pad, PC, etc etc uses ethernet traffic - not "host names". Only when a connection has to be created with a host name, for example, "www.youtube.com", then that "www.youtube.com" is translated ones into an IP address. pfSense's unbound and your device will then keep that resolved host name for a while (cached).
I see no reason why streaming stops when unbound restarts.
I fired up a Youtube and a netflix stream on my PC, and stopped unbound on pfSense for half a minute. Nothing stopped ...
And even when the netflix or youtube needed to resolve a publicity server host name, it will wait a bit before everything comes crashed down.@nattygreg said in Random DNS Resolver failure with Quad9 over SSL:
sometimes unbound stops and will not restart therefore throwing everyone off the network
Nobody goes of the network, actually, the network works just fine.
Only resolving aka DNS doesn't work anymore. So, use the ancient method : use I addresses and things works very well.
I know, that tedious, people don't use numbers any more, and with Ipv6 it close to impossible.@nattygreg said in Random DNS Resolver failure with Quad9 over SSL:
Use the patches you can download patch in packages and then it will show recommended patches just install all.
so : nope - no patches exist for me.
I did create my own patches, they are listed at the top of the page.IMHO, the issue "Random DNS Resolver failure with Quad9 over SSL" can't be resolved with a patch.
-
@Gertjan you’re right I don’t have this problem since I only connect to quad9 over TLS. I read you said SSL, but from what I have read I could be wrong dns connection are over TLS, HTTPS, again I could wrong the ports are 53, 853, 443 if you are able to connect by SSL maybe and I said maybe using wrong to connect to Quad9
-
@digitalgimpus I had that same issue, when it happens look at the dns status under ping it would say zero, remove that dns and use another one
-
@Gertjan said in Random DNS Resolver failure with Quad9 over SSL:
@digitalgimpus said in Random DNS Resolver failure with Quad9 over SSL:
I don't think this is the problem here
No need to think ^^ Fact check.
Double checked. No restarts. No evidence of restarts.
Which isn't surprising. If it failed to come back up, watchdog would catch that and I'd see email's at a minimum.
-
@digitalgimpus said in Random DNS Resolver failure with Quad9 over SSL:
Double checked. No restarts. No evidence of restarts.
So all your LAN(s) device(s) have a static IP setup ? You don't use DHCP anywhere ?
This :
means that on every (new) lease (renewal) the resolver (unbound) gets restarted.
-
@Gertjan I'm well aware what it means.
And once again: If this was a problem on restart, that would be obvious and consistent in the logs. It would also be predictable and self remedying. It obviously isn't that.
In fact, the fix for the problem is to restart, which is why it's perplexing you think that's the problem.
-
@digitalgimpus:
Might your issue be related to this problem discovered by the OPNsense users?https://forum.opnsense.org/index.php?topic=44414.0
I have not investigated this, and the failure reported over there seems to be more immediate and permanent (as in not random), but there still might be some relation. It seemed to only be impacting users over there attempting to use Quad9 with TLS (DoT).
Also found a related thread on a different forum here with a potential solution: https://discourse.pi-hole.net/t/cant-get-quad-9-dot-to-work-using-pi-hole-and-unbound/75683/3.
Another GitHub issue posted on the NLnetLabs
unbound
repo: https://github.com/NLnetLabs/unbound/issues/1247. This one sounds related as well.And one more from the
unbound
GitHub repo issues list (this one closed, but the fix is not in pfSense yet): https://github.com/NLnetLabs/unbound/issues/1202.So, it looks like
unbound
may have some internal TLS issues that seem to really manifest themselves with Quad9's servers when using DoT. Possible short-term solution is disable use of Quad9 and try to duplicate what you desire using Cloudflare untilunbound's
Quad9 TLS issues are resolved and the patched binary is pulled into pfSense. -
@bmeeks The first two look like something with the cert bundle included with the distro vs application to me, which would explain losing connectivity all at once, they likely updated a cert and the root cert wasn't in their bundle. That doesn't seem to be the case here. A restart wouldn't fix a problem like that.
The second two seem potentially more related, though not sure exactly how to verify if that's the case offhand.
-
@digitalgimpus said in Random DNS Resolver failure with Quad9 over SSL:
The second two seem potentially more related, though not sure exactly how to verify if that's the case offhand.
I agree. If the problem is something in the
unbound
binary, then you are stuck until a subsequent pfSense update comes out with a newunbound
package bundled within. The binary is not fixable via any sort of patch in pfSense. Binary portions of packages are compiled against the base OS kernel and get more or less "locked" to the pfSense kernel version and thus they must both be updated together.Once in a blue moon you can get by with manually installing an updated binary package, but the chances are high that doing so can wreck the pfSense installation by overwriting shared system libraries with newer versions that might be required for the updated binary package.