2.5.0.a.20210126.2350 unbound keeps stopping after upgrade
-
@thedaveca
Disable the watchdog thing.
Look at the logs : you'll probably see that unbound is started, more precise : it is actually restarted which means it's stopped first (it isn't dying or so) and then started.
Some other process is doing that.
Your mission : what process ? Then you'll be close to the solution. -
@gertjan without Watchdog I simply have no DNS. With it, at least DNS comes back promptly, and I can see the frequency of the failures. It is multiple times an hour, but not yet predictable.
It does seem to shut down at the request of something, but how to determine what?
-
@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:
but how to determine what?
By comparing several logs - checking what happens at the moment - or a second or so before that - when unbound is told to stop.
@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:
without Watchdog I simply have no DNS
You mean, you see :
Feb 8 08:44:37 unbound 99631:0 info: service stopped (unbound 1.x.y).
and it never starts again ? Or tries to ?
A 'stop' is always coupled with a 'start' - on my system, 3 seconds later, shown here :
Feb 8 08:44:40 unbound 79395:0 info: start of service (unbound 1.x.y).
Between the stop and start, unbound dumps 20 lines (or so) simple statistics.
-
@gertjan It doesn't manage to start again, I'll check the logs next time I'm in the office to see if it attempts to start.
Either way, constantly needlessly restarting is still an issue that needs to be resolved as this will, in the best case, cause a momentary outage and discard the entire DNS cache.
Needlessly is obviously an assumption, if there were an IP or network configuration change or something else there could be a trigger, but based on the frequency it is happening, I don't suspect that this is the situation here. If nothing else, I am MultiWAN with static IPs on all interfaces.
-
@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:
with static IPs on all interfaces.
That permits an easy test : stop the DHCP server on all interfaces.
Imagine this : if you have a device on a network exists that chain-guns the (a) DHCP server with DHCP requests, you see exactly what you're seeingg now : unbound getting hupped x times per minute or worse, per second.Also : check the main log if there is an interface that fires constant LINK DOWN - LINK UP - .....
If you can take the network down a minute or so :
Use the console access, and check if unbound is running.
Remove one by one all interfaces, until a stable situation is reached. Put them back, untill unbound starts 'stopping' again.If unbound stops without any networks present .... well, that would be something new.
t.
-
@thedaveca Are you running DNSBL/pfBlockerNG ? There is a known bug that causes unbound to restart if you have DHCP Registration enabled in the DNS Resolver settings page.
-
@ab5g I do have DHCP registrations enabled, but I do not use DNSBL/pfBlockerNG.
-
@ab5g said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:
There is a known bug that causes unbound to restart if you have DHCP Registration enabled in the DNS Resolver settings page.
A 2.4.5-p1 issue.
Not a bug.
If you want unbound to get restarted (or reloaded) when a new lease comes in, and (see above) you added a "brain dead device" that emits a lot of DHCP-renews or DHCP-requests, then you get what you asked for.What is the relation with pfBlockerNG ? If you set the cron update to every minute or so then yes, unbound can gets kicked around also. Again : what you asked for.
On systems with 'all default settings', unbound doesn't get restarted (often).
Not to forget : this is bleeding edge technology (== beta) as it concerns 2.5.0.xxxx
-
-
Failing to start again is a bug.
-
Shutting down at all for routine updates is a bug, instead use unbound-control to update the needed record/zone/whatever.
-
What brain-dead device? I have ~100 leases active, depending on the day, sometimes including a public wifi. Losing DNS service for a short period and dumping the entire cache every time a DHCP client is assigned an IP is absolutely a bug, even if the restart worked.
And yes, it’s a beta. I run the beta to find issues, and get comfortable before I am in a position to be supporting something in front of a client. I’m comfortable with the instability risks of beta platforms.
-
-
@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:
Failing to start again is a bug.
True.
But I like to add : it would be a fail if there is not a reason logged why it failed. It always does.@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:
routine updates is a bug
as mentioned in other threads (right now) : the update tree is redone .....
@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:
What brain-dead device? I have ~100 leases active,
and you have the resolver restarted at every incoming lease ?
This option still exists in 2.5.0 :
?
If so, disable it and check.
@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:
And yes, it’s a beta. I run the beta to find issues, and get comfortable before I am in a position to be supporting something in front of a client. I’m comfortable with the instability risks of beta platforms.
Thanks for that !
-
and you have the resolver restarted at every incoming lease ?
This option still exists in 2.5.0 :
?
If so, disable it and check.
No, I have the option to register DHCP leases enabled, nothing implies any need to shut down the service. unbound-control should be used to reload the record/zone on the fly.
Under temporarily disabled the option to test, I won’t know until the morning.
-
@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:
nothing implies any need to shut down the service
Check out the history of this box checked on this forum.
Hundreds of posts (issues) will argue that it's close to 'mandatory'.No service will get shut down. But DHCP clients that do not have a static MAC lease won't get their device name registered (if they even have one) into the DNS.
And unbound will thank you for that.That is, this was the "2.4.5-p1" solution.
Recently, I found out that, with 2.5.0 , which is using a newer unbound versionit will get hupped (send a signal called HUP) : and that it behaves diffreently. It shouldn't restart any more on DHCP events. So up to you to discover the reason.
The answers will be / should be in the logs.