Unbound seems to be restarting frequently
-
CiscoKid85 is on to something. I had Google DNS configured under general settings (8.8.8.8/8.8.4.4). When I cleared those entries, saved, and redid the test, DHCP did NOT cause unbound to restart. Even stranger is that when I put the entries back in, the test no longer causes unbound to restart. However, when I ran my previous tests again, it started happening after I reanabled "Register DHCP leases in the DNS Resolver". Here is the sequence:
New DHCP causes unbound restart
+Removed 8.8.8.8/8.8.4.4 from general settings DNS
New DHCP does NOT cause unbound restart
+Added 8.8.8.8/8.8.4.4 back to general settings
New DHCP does NOT cause unbound restart
+Rebooted pfSense
New DHCP does NOT cause unbound restart
+Unchecked "Register DHCP leases in the DNS Resolver"
New DHCP does NOT cause unbound restart
+Checked "Register DHCP leases in the DNS Resolver"
New DHCP causes unbound restart
+Removed 8.8.8.8/8.8.4.4 from general settings DNS
New DHCP does NOT cause unbound restart
+Unchecked "Register DHCP leases in the DNS Resolver"
New DHCP does NOT cause unbound restart
+Checked "Register DHCP leases in the DNS Resolver"
New DHCP causes unbound restartConfusing right? If I had to guess, there is a bug related to enabling "Register DHCP leases in the DNS Resolver" that causes unbound to restart when dhcpd issues leases, and that making other changes to the DNS system somehow fixes it. There is another thread on the forum about unbound restarts where people seem to have resolved it by playing with similar settings, so it makes sense that this is only happening in certain difficult to reproduce scenarios. Also, unless you're digging in the logs or experiencing DNS outages, most people wouldn't even notice this is happening. I'd be interested to see if anyone else can produce similar findings by enabling and disabling the "Register DHCP leases in the DNS Resolver" setting.
-
One more data point as a direct continuation from the sequence above:
+Left general settings exactly the same (dns servers are blank) and hit save
New DHCP does NOT cause unbound restartMy best guess of the bug at this point:
Enabling "Register DHCP leases in the DNS Resolver" causes it
Something on the general settings page save sequence fixes it (logs indicate the save kicks off a dhcpd and unbound restart, possibly other actions) -
Even more craziness when I cleared out some of the old DHCP leases from my testing earlier:
+Deleted leases using the delete button in DHCP Leases page (this seems to trigger a dhcpd and unbound restart with each delete request)
New DHCP causes unbound restart
+Left general settings exactly the same (dns servers are blank) and hit save
New DHCP does NOT cause unbound restart -
Well dhcp doesn't seem to be restarting mine.. But now that take a closer look it does seem to have restarted a few time when seems odd that it did. But must not be happening enough for me to notice. Last time I looked at the log didn't see any craziness there.. But now there is more restarts than you would think should be there.
If its a combinations of things, and something removes the issue like save or no dns in general, etc. Then sure that makes sense that less people would see it that had the specific settings and combinations of things.
Will keep an eye on it more, have not noticed any issue with resolving anything. But does seem to have been restarting more than it should..
Mar 3 04:15:10 unbound: [26324:0] notice: Restart of unbound 1.5.1.
Mar 3 04:03:19 unbound: [26324:0] notice: Restart of unbound 1.5.1.
Mar 3 03:25:24 unbound: [26324:0] notice: Restart of unbound 1.5.1.
Mar 3 03:05:42 unbound: [26324:0] notice: Restart of unbound 1.5.1.
Mar 3 01:33:22 unbound: [26324:0] notice: Restart of unbound 1.5.1.
Mar 3 01:10:28 unbound: [26324:0] notice: Restart of unbound 1.5.1.
Mar 3 00:36:41 unbound: [26324:0] notice: Restart of unbound 1.5.1.
Mar 3 00:12:51 unbound: [26324:0] notice: Restart of unbound 1.5.1.If I look in the dhcp log there is lots of dhcp stuff going on with renews and such at 2 in the morning but no restart that matches up to then, etc. I don't see any dhcp traffic that matches up with these restart times.
-
Hmmm - Strange. Mine is also often showing "notice: Restart of unbound" now that I take a closer look, but I'm not noticing any performance issues.
-
When I upgraded to the current 2.2 release from the previous version I switched to using unbound but I've been finding that it stops working ever now and then - suddenly nothing on the network resolves. Manually restarting unbound fixes this … until the next time. I've just switched back to dnsmasq on my work system and rebooted in the hope that this will fix it.
I'm seeing this both on my work system and on my home firewall too. Looking at the status display it seems that unbound is still working - it doesn't show up as stopped, it just doesn't work. Sorry if these notes aren't very helpful but there does seem to be an issue here.
Both firewalls are pretty much vanilla systems, different hardware (DELL and Netgate) with similar configurations. The only non-standard thing about them both is that I have two WAN connections on each machine - other than that they are pretty boring configurations.
-
edmund, that is basically the same behavior I see. Unbound restarts frequently but it generally doesn't affect anything; only occasionally does it stop resolving. When that happens, the service shows as running but it just doesn't resolve properly. My nagios monitor reports it as "DNS CRITICAL - 0.129 seconds response time (No ANSWER SECTION found)" when it happens. A manual restart via the webgui, or even just waiting for unbound to restart itself will fix it. I don't know if this is directly related to the DHCP bug or if it is just a consequence of the service restarting so frequently. I'm running pfSense in a VM on ESXi with a single WAN connection.
Another followup to my testing from yesterday: Unbound eventually resumed its restart behavior a few hours after I had "solved" it by pushing the save button on the general settings page. According to the logs, it looked like it resumed after dhcp did a routine write of the leases file to disk. I still haven't changed any settings in the system 20 hours later, and unbound is restarting on some, but not all, DHCPREQUEST events. Interestingly, in the current system state my test scenario (new MAC requesting DHCP) currently doesn't trigger an unbound restart like it did before. So hitting save under general settings isn't a perfect fix but it seems to get it into a slightly more reliable state.
This is my best understanding of the issue so far:
- Enabling "Register DHCP leases in the DNS Resolver" reliably puts unbound into a state where it restarts on brand new DHCP leases
- Pressing the save button on the general settings screen seems to stop unbound from restarting on DHCP requests for a short time
- However, unbound still manages to get into a slightly unstable state in the course of normal dhcpd activities, possibly precipitated by dhcpd writing to the leases file
-
Just checking… did you did turn on "Harden Glue" and "Harden DNSSEC data"?
There have been a couple threads about Unbound ceasing to resolve if these were not enabled.
When I upgraded to the current 2.2 release from the previous version I switched to using unbound but I've been finding that it stops working ever now and then - suddenly nothing on the network resolves. Manually restarting unbound fixes this … until the next time. I've just switched back to dnsmasq on my work system and rebooted in the hope that this will fix it.
-
Just checking… did you did turn on "Harden Glue" and "Harden DNSSEC data"?
No - both unchecked, my general philosophy is not to check boxes unless there's a good reason and I didn't think that either of these were relevant. So unbound was running with the defaults. They get rather upset at work if the resolver goes walkabout so I'll leave unbound disabled here and see what the home configuration is doing when I get home tonight.
-
Those defaults are already being changed for next release I believe - because they matter…
-
I enabled both the Harden Glue and Harden DNSSEC data options (it looks like these are best practices that should be enabled by default). However, this does not appear to have an effect on the unbound restart behavior. I've observed that even if unbound is acting somewhat stable and not restarting on every DHCPREQUEST, the following sequence by dhcpd appears to always trigger a restart:
Mar 4 12:25:17 dhcpd: Wrote 22 leases to leases file.
Mar 4 12:25:17 dhcpd: Wrote 0 new dynamic host decls to leases file.
Mar 4 12:25:17 dhcpd: Wrote 0 deleted host decls to leases file.This seems to happen on a regular basis, perhaps hourly based on the logs. I'm guessing this is a routine operation of dhcpd, although I don't know if unbound's expected behavior is to also restart as part of this operation.
-
Those defaults are already being changed for next release I believe - because they matter…
That's interesting - so I can select Harden Glue and Harden DNSSEC data in Advanced settings without actually Enabling DNSSEC in General Settings? Seems a little odd to me…
Maybe I should also check Enable DNSSEC although it doesn't seem to be required by the interface?
-
I haven't tried without selecting DNSSEC since it doesn't make much sense, but if you can click that strange combo of buttons, I'd call that a bug. haha
Not sure if we should call it a user bug or interface bug (-;
-
I agree, it doesn't make any sense to do that but the interface does allow it - and probably shouldn't but I have not idea what the logic is behind the GUI. From a human interface POV there's just too many boxes to check on unbound.
-
x.y.z versions of pfsense are usually very well sorted out.
the ones with just x.y (2 digits) usually pretty solid but still being polished abit.
-
I haven't tried without selecting DNSSEC since it doesn't make much sense, but if you can click that strange combo of buttons, I'd call that a bug. haha
Not sure if we should call it a user bug or interface bug (-;
It doesn't make Unbound fail to function, so not really a big deal. Still, I added input validation to prevent enabling that option if DNSSEC support isn't enabled.
-
Those defaults are already being changed for next release I believe - because they matter…
The only default we're changing is hard coding harden-glue to yes. That checkbox is gone in 2.2.1, and the config.xml setting ignored if it exists.
-
From a human interface POV there's just too many boxes to check on unbound.
I agree there's a lot there. It has a lot of options. We either have a ton of boxes, or force people to use manual configuration in the advanced box which is error-prone and could break on upgrade where the checkboxes won't. We'll be putting out guidance on usage that should help clarify things.
Make sure harden glue is enabled (on 2.2 and earlier), and defaults are otherwise fine. You don't have to be pushing buttons there unless you have atypical needs (mostly very large networks).
-
I have only been observing for about 5 hours after upgrading to 2.2.1, but it appears the frequent unbound restarts triggered by DHCP may be resolved with the latest update. Unbound has not restarted since the update, even during routine DHCP events like writes to the leases file that previously triggered it. Perhaps it is something that was updated in pfSense 2.2.1 or perhaps there was a change with unbound 1.5.3, but I would suggest that anyone who has been reporting unbound instability try the new version.
-
It's still restarting as soon as you change anything in the resolver settings or change the DNS addresses from general setup. It's not restarting if you don't touch any settings after a reboot….go figure.