Unbound dns resolver stops resolving every few days after 22.05 upgrade
-
DNS service breaks every 1-2 days after upgrading from 22.01 to 22.05.
It just stops resolving queries that are not cached (most recent sites are still resolved possible by client cache)
Dashboard shows that unbound is running (status ok) but resolves just timing out.
22.01 was fine for months no problems with unbound.
After upgrading to 22.05 (did additional reboot) then ~2days and resolves just stopped.
restart of unbound service (no need to reboot pfsense) solved problem but it returned again after few days.
had a lots of problems with unbound totally crashing in CE2.5 then it was solved now problems again, this time unbound seems to report good status (running/ok) when it stops resolving, so watchdog service just to restart crashed service does not help.If there is any logs or something that would be helpful to troubleshoot please ask.
I am attaching latest log file from unbound, had problem that day possible logs capture that.
unboundlogs.txt -
@vaidas said in Unbound dns resolver stops resolving every few days after 22.05 upgrade:
so watchdog service just to restart
The watchdog service is like dynamite.
It never repairs stuff, it makes stuff even more messy, so you can't see the original mess.
My advise : stop using it.unbound is big and can be slow to start.
watchdog service is inpatient and stops a stating unbound to restart it : you've just created your own mess.See here :
.... Jul 5 14:33:22 router unbound[55240]: [55240:0] notice: Restart of unbound 1.15.0. Jul 5 14:33:22 router unbound[55240]: [55240:0] notice: init module 0: validator Jul 5 14:33:22 router unbound[55240]: [55240:0] notice: init module 1: iterator Jul 5 14:33:22 router unbound[55240]: [55240:0] info: start of service (unbound 1.15.0). Jul 5 14:33:23 router unbound[55240]: [55240:0] info: generate keytag query _ta-4f66. NULL IN Jul 5 14:36:41 router unbound[55240]: [55240:0] info: service stopped (unbound 1.15.0). ...
or some other signal stopped unbound 3 minutes after it started.
A interface went down and up ?
Look in the other logs around "Jul 5 14:36:41" to know what happened. unbound was asked to stop by some system event.Things get worse further along :
Jul 5 14:44:15 router unbound[55240]: [55240:0] info: start of service (unbound 1.15.0). Jul 5 14:44:17 router unbound[55240]: [55240:0] info: service stopped (unbound 1.15.0).
My unbound 22.05 restarts ones in a while, every 2 or 3 days or so, as I'm also using pfBlockerng-devel. Other reasons might exist.
You've disabled this one :
right ?
As it restarts unbound for every DHCP lease created or renewed. -
@gertjan No I have register DHCP leases in DNS resolver as I need this service in my network. that's probably biggest reason why I use local DNS resolver.
I have manually restarted unbound once in that log, because it was not resolving.
other restarts are due to dhcp.
this bug has been unresolved for long time https://redmine.pfsense.org/issues/5413
but even with the restarts resolves worked in 22.01 and many previous versions (2.5 was mess)I don't use pfblockerng or other plugins related to DNS my only plugins I have is nut for my ups and openvpn export utility, also installed watchdog as of 2.5 version because unbound in that version was always crashing (it only monitors unbound but i can stop it just to test) whenever watchdog restarts service that crashed it should notify me (no notifications today about any restarts).
same config worked fine in 22.01 -
@vaidas said in Unbound dns resolver stops resolving every few days after 22.05 upgrade:
No I have register DHCP leases in DNS resolver as I need this service in my network. that's probably biggest reason why I use local DNS resolver.
I have manually restarted unbound once in that log, because it was not resolving.
other restarts are due to dhcp.This is a very old and known subject.
Please understand : with any incoming DHCO lease, unbound gets restarted.
If you have many devices on your LAN, the unbound restart frequency will get bigger and bigger.
Majors issues will arrive as soon as a (big) swtch goes power down and power up : all attached devices will request a new lease at nearly the same time. Unbound will get hammered, and that is - I suspect - where it will get slammed to brain dead : you have your issue explained.Btw : I've myself about 40 devices on my LAN that I need to know by host name and IP. I've solved the issue by creating several static DHCP leases.
I don't care about DHCP leases - the host names, or their IP's, on my captive portal networks.
Only server type devices need to have a name and a fixed IPv4.This one :
stays checked, as these leases are imported by unboud upon startup and they never change.
unbound restart issues are thus, for me, many years ago, solved.
So : now you can change your opinion :
@vaidas said in Unbound dns resolver stops resolving every few days after 22.05 upgrade:
No I have register DHCP leases in DNS resolver as I need
and solve the issue ;)
-
@gertjan then why even offer this option if it breaks everything. I spin up and down hosts/servers everyday that I need resolved by name, adding static records and then removing them just for that would be chore. But lets end this.
Question is then why it has been working in 22.01 for months no problems, upgraded to 22.05 and it breaks every few days. (haven't changed any config or network size)
if you only option that you suggest is disable that setting then I thank you I got you. Will wait maybe there another people that have other valid solutions. -
@vaidas said in Unbound dns resolver stops resolving every few days after 22.05 upgrade:
Will wait maybe
Check the forum, you will find hundreds if not thousands of posts about this subject.
Check pfSEnse redmine, proposals and bug reports have been made. Some are years old.
Check this one : https://redmine.pfsense.org/issues/5413
Yours : https://redmine.pfsense.org/issues/13337 fits right into first one, already 6 years old.
edit : and I saw you found that one ^^For the last several ( ! ) years many people (like a lot) have asked about this issue.
I'm not saying my proposal is 'the' solution. It's 'a' workaround.Btw : start thinking about what needs to be done when IPv6 isn't optional any more.
@vaidas said in Unbound dns resolver stops resolving every few days after 22.05 upgrade:
Question is then why it has been working in 22.01 for months no problems, upgraded to 22.05 and it breaks every few days.
Probably pure luck ? Dono.
Nothing changed in 22.05 - was different in 22.01.
For me 22.01 and 22.05 are not showing any differences about DHCP/DNS. And I have the graphs to confirm this, you saw them. -
@gertjan well thanks for trying to help, will need to change my workflows then and try the workaround or maybe migrate to windows server dns/dhcp stack as it would be nice to also have it with AD :) don't want to say this, but windows dns seems to be more stable nowadays :)
Maybe there is a hope they seem to finally fixed multiple console problem on same network after many years, even thought I don't own any consoles, saw many people wanted it fixed.
hey at least I got great bug number 13337I was hit by this
https://redmine.pfsense.org/issues/11316
hard that's why used watchdog at least in that case unbound would die so watchdog would be useful. -
Disabled register DHCP leases in DNS setting and outages still happening.
logs don't even show unbound restart.
the only 2 records for unbound log today when outage happenedJul 8 20:56:06 unbound 28376 [28376:0] info: generate keytag query _ta-4f66. NULL IN Jul 8 09:11:23 unbound 28376 [28376:0] info: generate keytag query _ta-4f66. NULL IN
what is happening ?
Is there a way to download 22.05 image for reinstall maybe that would solve problem, I seriously considering rolling back to 22.01. -
possibly related to
https://forum.netgate.com/topic/173148/slow-dns-after-22-05/11now at least DNS resolves recover after ~15 min (when disabled DNS registration of dhcp leases) but still 15min outages is annoying
Just getting dns time out or ip address not found in browser. -
@vaidas said in Unbound dns resolver stops resolving every few days after 22.05 upgrade:
query _ta-4f66.
Or 20326 decimal.
Nothing special, '20326' is the mother of all DNSSEC keys at this moment. Its re fetched regularly. A correct fetch is the start of a good DNSSEC functionality.
So these messages :are sometimes the only sign of life that unbound is giving while it's humming.
-
Strange but it seems that problem mostly went away after unchecking(disabling) DNSSEC setting. Still testing but for a more then a day I haven't seen any problems.
Even reenabling register DHCP leases did not cause any noticeable problems.
-
@vaidas said in Unbound dns resolver stops resolving every few days after 22.05 upgrade:
unchecking(disabling) DNSSEC setting.
DNSSEC is an extension of DNS. It's a complicated thing, but 'on the wire' you'll find some more requests. There are the use A, AAAA, PTR, MX, CNAME. Added to that, thewe will be some DS, NSEC and DNSKEY.
These are just other UDP TCP packets addressed to the same DNS name servers unbound was already talking to.
If these are unknownon the domain name server, no issue, unbound proceeds without DNSSEC checking without any time lost.
Most, if not all TLDs (com org net etc etc etc) are DNNSEC signed.
The top level dot "." are signed , that's the 4f66 key you see in the unbound logs.
If a web site owner took the time to sign its domain name, like this one, a domain name I own/rent, the the entire chain will be ok, and dnssec will work.
DNS will work with DNSSEC as nearly fast as without DNSSEC.It should not make DNS work slower or worse or something like that. If that's the case, there is an underlying access- or DNS problem.
-
I own a Netgate 6100 and have been having the same issue. DNS resolving went to shit after 22.05 update. Until then it was working fine. Most of times wouldnt resolve until after a restart. have had to resort to 4G a few times :-(
I have been fiddling with a few settings but I think these last 2 have made it better for me:
Untick Enable DNSSEC Support
And on the outgoing interfaces I reduced it to only use WAN ( removed my VPN outgoing interfaces ).
I will be changing everything back to what I was using before the update but want to confirm slowly each option to try to single out the one that has broke it for me.
Will keep an eye here and in this other thread https://forum.netgate.com/topic/173148/slow-dns-after-22-05
to see if anyone has managed to single out the main issue...Kind of big one for Netgate... not sure how they managed to screw this one.
-
@pajinha said in Unbound dns resolver stops resolving every few days after 22.05 upgrade:
not sure how they managed to screw this one.
The forum mentions a couple of 'DNS' issues since 22.05.
But, what is a couple ?
22.05 has been downloaded and installed many thousands times (I can't tell, but I'm pretty sure).@pajinha said in Unbound dns resolver stops resolving every few days after 22.05 upgrade:
( removed my VPN outgoing interfaces )
If your DNS also goes over this VPN and the VPN is bad - as this can happen, they are not all equal and perfect - then, yeah, DNS looks bad.
Because your uplink is bad.
DNS is mostly UDP, these can get lost. unbound won't hammer away, and return a SERVFAIL.
TCP get renegotiated and is far more resilient.For now, my DNS using 22.05 using default settings and no VPN is working as before. And don't tale my word for it, see for yourself.