2.5.0.a.20210126.2350 unbound keeps stopping after upgrade

Gertjan

Be aware : I'm not using 2.5.0 as I need a stable environment - for the same reasons as you mentioned.
Simplicity is also preferred over complexity.

As you can see, at 15:19:39 it receives a restart.
=> What / who triggered that restart ? Use the logs to find out.

error: duplicate forward zone . ignored.

What are your unbound settings ? Complex, or not : make them go away.

Then the 20 ( ??) threads have to get killed - and while doing so, stats are logged. But really : 20 threads ? Maybe that's normal for unbound 1.13.0 unbound, or do you really have that many cores ?

After the restart, the start shows up :

15:19:39.944719-06:00 unbound 25437 [25437:0] info: start of service (unbound 1.13.0).

to get stopped less then 40 seconds later ?

15:20:18.451387-06:00 unbound 25437 [25437:0] info: service stopped (unbound 1.13.0).

go logs to find the why part.

Now all the stats are shown.
And while doing so, it receives another restart :

15:20:18.498232-06:00 unbound 25437 [25437:0] notice: Restart of unbound 1.13.0.

So, right, very true, this is going no where.
Did I mention to see the other logs, so you can tell us what is happening ?

What happens in the DHCP log ?

Maybe not related, as this is 2.5.0 : 2.4.5-p1 restart unbound on every DHCP request/renew : you'll be needing just one device on your LAN(s) that renews it lease every second and unbound gets restarted every second : your DNS will go ko / play dead.
But : with the click on a button on the right place (unbound settings page) - and the help of one of the thousand times it's already explained in this forum - for the why part, it's easy to stop this from happening.

thedaveCA

For whatever it is worth, I installed 2.5.0.r.20210209.1125 last night and I'm seeing the same thing. I set up the service watchdog package and I have a dozen notes about Unbound being restarted.

Happy to pull logs or whatever if needed, I'm not really clear what else is needed at this stage.

Gertjan

@thedaveca
Disable the watchdog thing.
Look at the logs : you'll probably see that unbound is started, more precise : it is actually restarted which means it's stopped first (it isn't dying or so) and then started.
Some other process is doing that.
Your mission : what process ? Then you'll be close to the solution.

thedaveCA

@gertjan without Watchdog I simply have no DNS. With it, at least DNS comes back promptly, and I can see the frequency of the failures. It is multiple times an hour, but not yet predictable.

It does seem to shut down at the request of something, but how to determine what?

Gertjan

@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:

but how to determine what?

By comparing several logs - checking what happens at the moment - or a second or so before that - when unbound is told to stop.

@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:

without Watchdog I simply have no DNS

You mean, you see :

Feb 8 08:44:37 	unbound 	99631:0 	info: service stopped (unbound 1.x.y).

and it never starts again ? Or tries to ?

A 'stop' is always coupled with a 'start' - on my system, 3 seconds later, shown here :

Feb 8 08:44:40 	unbound 	79395:0 	info: start of service (unbound 1.x.y).

Between the stop and start, unbound dumps 20 lines (or so) simple statistics.

thedaveCA

@gertjan It doesn't manage to start again, I'll check the logs next time I'm in the office to see if it attempts to start.

Either way, constantly needlessly restarting is still an issue that needs to be resolved as this will, in the best case, cause a momentary outage and discard the entire DNS cache.

Needlessly is obviously an assumption, if there were an IP or network configuration change or something else there could be a trigger, but based on the frequency it is happening, I don't suspect that this is the situation here. If nothing else, I am MultiWAN with static IPs on all interfaces.

Gertjan

@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:

with static IPs on all interfaces.

That permits an easy test : stop the DHCP server on all interfaces.
Imagine this : if you have a device on a network exists that chain-guns the (a) DHCP server with DHCP requests, you see exactly what you're seeingg now : unbound getting hupped x times per minute or worse, per second.

Also : check the main log if there is an interface that fires constant LINK DOWN - LINK UP - .....

If you can take the network down a minute or so :
Use the console access, and check if unbound is running.
Remove one by one all interfaces, until a stable situation is reached. Put them back, untill unbound starts 'stopping' again.

If unbound stops without any networks present .... well, that would be something new.

t.

AB5G

@thedaveca Are you running DNSBL/pfBlockerNG ? There is a known bug that causes unbound to restart if you have DHCP Registration enabled in the DNS Resolver settings page.

thedaveCA

@ab5g I do have DHCP registrations enabled, but I do not use DNSBL/pfBlockerNG.

Gertjan

@ab5g said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:

There is a known bug that causes unbound to restart if you have DHCP Registration enabled in the DNS Resolver settings page.

A 2.4.5-p1 issue.
Not a bug.
If you want unbound to get restarted (or reloaded) when a new lease comes in, and (see above) you added a "brain dead device" that emits a lot of DHCP-renews or DHCP-requests, then you get what you asked for.

What is the relation with pfBlockerNG ? If you set the cron update to every minute or so then yes, unbound can gets kicked around also. Again : what you asked for.

On systems with 'all default settings', unbound doesn't get restarted (often).

Not to forget : this is bleeding edge technology (== beta) as it concerns 2.5.0.xxxx

thedaveCA

@gertjan

Failing to start again is a bug.
Shutting down at all for routine updates is a bug, instead use unbound-control to update the needed record/zone/whatever.
What brain-dead device? I have ~100 leases active, depending on the day, sometimes including a public wifi. Losing DNS service for a short period and dumping the entire cache every time a DHCP client is assigned an IP is absolutely a bug, even if the restart worked.

And yes, it’s a beta. I run the beta to find issues, and get comfortable before I am in a position to be supporting something in front of a client. I’m comfortable with the instability risks of beta platforms.

Gertjan

@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:

Failing to start again is a bug.

True.
But I like to add : it would be a fail if there is not a reason logged why it failed. It always does.

@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:

routine updates is a bug

as mentioned in other threads (right now) : the update tree is redone .....

@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:

What brain-dead device? I have ~100 leases active,

and you have the resolver restarted at every incoming lease ?

This option still exists in 2.5.0 :

?

If so, disable it and check.

@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:

And yes, it’s a beta. I run the beta to find issues, and get comfortable before I am in a position to be supporting something in front of a client. I’m comfortable with the instability risks of beta platforms.

Thanks for that !

thedaveCA

and you have the resolver restarted at every incoming lease ?

This option still exists in 2.5.0 :

?

If so, disable it and check.

No, I have the option to register DHCP leases enabled, nothing implies any need to shut down the service. unbound-control should be used to reload the record/zone on the fly.

Under temporarily disabled the option to test, I won’t know until the morning.

Gertjan

@thedaveca said in 2.5.0.a.20210126.2350 unbound keeps stopping after upgrade:

nothing implies any need to shut down the service

Check out the history of this box checked on this forum.
Hundreds of posts (issues) will argue that it's close to 'mandatory'.

No service will get shut down. But DHCP clients that do not have a static MAC lease won't get their device name registered (if they even have one) into the DNS.
And unbound will thank you for that.

That is, this was the "2.4.5-p1" solution.

Recently, I found out that, with 2.5.0 , which is using a newer unbound versionit will get hupped (send a signal called HUP) : and that it behaves diffreently. It shouldn't restart any more on DHCP events. So up to you to discover the reason.
The answers will be / should be in the logs.