unexpected unbound restarts / matching dhcpcd syslog entry
-
My unbound log (running on a remote machine, not the pfsense) shows, multiple times per day:
May 14 06:15:26 unbound[790:0] info: service stopped (unbound 1.13.1). .... notice: Restart of unbound 1.13.1.
unbound than almost always immediately stops a second time:
May 14 06:15:27 unbound[790:0] info: service stopped (unbound 1.13.1).
I've narrowed it down to a specific matching syslog entry:
The syslog entries:May 14 06:15:26 raspberrypi dhcpcd[562]: eth0: part of Router Advertisement expired May 14 07:09:04 raspberrypi dhcpcd[562]: eth0: Router Advertisement from fe80::20e:c4ff:fecf:f3cf May 14 07:09:04 raspberrypi dhcpcd[562]: eth0: Router Advertisement from fe80::20e:c4ff:fecf:f3cf May 14 14:53:25 raspberrypi dhcpcd[562]: eth0: part of Router Advertisement expired
The unbound entries:
May 14 06:15:26 unbound[790:0] info: service stopped (unbound 1.13.1). May 14 06:15:27 unbound[790:0] info: service stopped (unbound 1.13.1). May 14 14:53:25 unbound[790:0] info: service stopped (unbound 1.13.1). May 14 14:53:29 unbound[790:0] info: service stopped (unbound 1.13.1).
Notice the time match, part of Router Advertisement expired apparently triggers an unbound restart
the IPv4 address is served by pfsense (static (MAC based) entry, thus never changes).
the IPv6 address is served by pfsense (configured with 'track interface'), never changes, unless ISP resets... (didn't happen today).Not sure how to proceed from here, I don't understand why an expired route causes unbound to restart.
If the dhcp daemon is responsible for the second restart (could be), there appears to be a missing syslog entry, that would explain the unbound restart.Of course, I would prefer unbound not to restart at all, the messages make it look like an expired route causes unbound to restart, I would like to avoid that, if possible.
-
@jpgpi250 In DNS Resolver's settings, uncheck DHCP Registration:
"If this option is set, then machines that specify their hostname when requesting an IPv4 DHCP lease will be registered in the DNS Resolver so that their name can be resolved. Note that this will cause the Resolver to reload and flush its resolution cache whenever a DHCP lease is issued."
-
As I indicated, unbound is running on a remote machine. I don't have a running instance of unbound on the pfsense box.
I would however very much like to know what the effect is of checking/unchecking this checkbox on the unbound configuration file on the pfsense box. It might be a solution for the config file on the remote machine.
-
@jpgpi250 > As I indicated, unbound is running on a remote machine.
Whoops, that didn't register with me.
-
@jpgpi250 said in unexpected unbound restarts / matching dhcpcd syslog entry:
I would however very much like to know what the effect is of checking/unchecking this checkbox on the unbound configuration file on the pfsense box. It might be a solution for the config file on the remote machine.
It does what it says. It enables the registration of DHCP clients in unbound. That happens each time a client lease is renewed, maybe the lease has expired (you can extend the lease time if it's short), maybe the client joined the network (again).
Subsequently unbound has to be restarted to apply the new settings. -
@jpgpi250 said in unexpected unbound restarts / matching dhcpcd syslog entry:
May 14 06:15:26 unbound[790:0] info: service stopped (unbound 1.13.1).
It received a 'process stop' (actually : restart) signal.
Something is telling it to do so.
It could be the DHCP client process, that renews the a lease. Make the DHCP client process log more verbose, or assign a static IP and check if the restarts stop (in that case, it was the DHCP process).@jpgpi250 said in unexpected unbound restarts / matching dhcpcd syslog entry:
unbound is running on a remote machine.
Some of know know something about how unbound is used on pfSense.
None of us knows anything about how unbound is used on your system - what system ? -
@gertjan
It could be the DHCP client process, that renews the a lease.: The default lease time in pfsense (IPv4) is 7200 seconds. Ift his was the cause, unbound would restart 12 times per day. It's NOT. The DHCP renew messages in pfsense don't match the restart time.The matching message says:
May 14 06:15:26 raspberrypi dhcpcd[562]: eth0: part of Router Advertisement expired
I'm not an expert, certainly not, but this message looks more like an IPv6 event. As I indicated, I'm using IPv6 'track interface'. There isn't anything else to configure with that option (as far as I know)
unbound and remote system info:
Unbound 1.13.1, with ./configure --prefix=/usr --sysconfdir=/etc
--disable-static --enable-tfo-client --enable-tfo-server --with-libevent
--with-libhiredis --enable-cachedb --with-pidfile=/run/unbound.pid
Hardware: raspberry pi 3B
OS: latest Raspberry Pi OS Lite (=debian 10) -
I am having a very similar issue with a device on my network. Pfsense has the DNS resolver disabled and I provide the IP for two pihole devices on my network for clients to use as DNS addresses.
The two pihole devices have unbound installed, and interestingly, both devices seem to have unbound restart at the same time. Here is what I have found so far:
Pihole 1:
May 28 02:09:48 raspberrypi unbound[678]: [678:0] info: service stopped (unbound 1.9.0). May 28 02:09:48 raspberrypi unbound[678]: [678:0] info: start of service (unbound 1.9.0). May 28 07:21:21 raspberrypi unbound[678]: [678:0] info: service stopped (unbound 1.9.0). May 28 07:21:21 raspberrypi unbound[678]: [678:0] info: start of service (unbound 1.9.0). May 28 07:21:21 raspberrypi unbound[678]: [678:0] info: service stopped (unbound 1.9.0). May 28 07:21:22 raspberrypi unbound[678]: [678:0] info: start of service (unbound 1.9.0). May 28 08:22:04 raspberrypi unbound[678]: [678:0] info: service stopped (unbound 1.9.0). May 28 08:22:05 raspberrypi unbound[678]: [678:0] info: start of service (unbound 1.9.0). May 28 08:22:05 raspberrypi unbound[678]: [678:0] info: service stopped (unbound 1.9.0). May 28 08:22:05 raspberrypi unbound[678]: [678:0] info: start of service (unbound 1.9.0).
pihole 2
May 28 02:09:48 raspberrypi-2 unbound[431]: [431:0] info: service stopped (unbound 1.9.0). May 28 02:09:48 raspberrypi-2 unbound[431]: [431:0] info: start of service (unbound 1.9.0). May 28 07:21:21 raspberrypi-2 unbound[431]: [431:0] info: service stopped (unbound 1.9.0). May 28 07:21:29 raspberrypi-2 unbound[431]: [431:0] info: start of service (unbound 1.9.0). May 28 07:21:29 raspberrypi-2 unbound[431]: [431:0] info: service stopped (unbound 1.9.0). May 28 07:21:37 raspberrypi-2 unbound[431]: [431:0] info: start of service (unbound 1.9.0). May 28 08:22:04 raspberrypi-2 unbound[431]: [431:0] info: service stopped (unbound 1.9.0). May 28 08:22:12 raspberrypi-2 unbound[431]: [431:0] info: start of service (unbound 1.9.0). May 28 08:22:13 raspberrypi-2 unbound[431]: [431:0] info: service stopped (unbound 1.9.0). May 28 08:22:21 raspberrypi-2 unbound[431]: [431:0] info: start of service (unbound 1.9.0).
The results from a stacktrace of the unbound process:
08:22:05.137224 --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=13932, si_uid=0} --- 08:22:05.137545 sigreturn({mask=[]}) = 431 08:22:12.975311 rt_sigprocmask(SIG_UNBLOCK, [HUP], NULL, 8) = 0 08:22:12.975850 rt_sigprocmask(SIG_UNBLOCK, [INT], NULL, 8) = 0 08:22:12.976154 rt_sigprocmask(SIG_UNBLOCK, [QUIT], NULL, 8) = 0 08:22:12.976423 rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0 08:22:12.976719 rt_sigaction(SIGHUP, {sa_handler=0x76dc9d88, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x1a15c, sa_mask=[HUP], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:12.977263 rt_sigaction(SIGQUIT, {sa_handler=0x76dc9d88, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x1a15c, sa_mask=[QUIT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:12.977671 rt_sigaction(SIGTERM, {sa_handler=0x76dc9d88, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x1a15c, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:12.978024 rt_sigaction(SIGINT, {sa_handler=0x76dc9d88, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x1a15c, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:13.008918 rt_sigaction(SIGTERM, {sa_handler=0x1a15c, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x76dc9d88, sa_mask=~[KILL STOP RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:13.009509 rt_sigaction(SIGQUIT, {sa_handler=0x1a15c, sa_mask=[QUIT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x76dc9d88, sa_mask=~[KILL STOP RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:13.009909 rt_sigaction(SIGHUP, {sa_handler=0x1a15c, sa_mask=[HUP], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x76dc9d88, sa_mask=~[KILL STOP RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:13.010274 rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:13.010589 rt_sigaction(SIGINT, {sa_handler=0x1a15c, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x76dc9d88, sa_mask=~[KILL STOP RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:13.024148 rt_sigaction(SIGINT, {sa_handler=0x1a15c, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, NULL, 8) = 0 08:22:13.024769 rt_sigaction(SIGTERM, {sa_handler=0x1a15c, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, NULL, 8) = 0 08:22:13.025090 rt_sigaction(SIGQUIT, {sa_handler=0x1a15c, sa_mask=[QUIT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, NULL, 8) = 0 08:22:13.025377 rt_sigaction(SIGHUP, {sa_handler=0x1a15c, sa_mask=[HUP], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, NULL, 8) = 0 08:22:21.318199 rt_sigprocmask(SIG_UNBLOCK, [HUP], NULL, 8) = 0 08:22:21.318744 rt_sigprocmask(SIG_UNBLOCK, [INT], NULL, 8) = 0 08:22:21.319135 rt_sigprocmask(SIG_UNBLOCK, [QUIT], NULL, 8) = 0 08:22:21.319416 rt_sigprocmask(SIG_UNBLOCK, [TERM], NULL, 8) = 0 08:22:21.319712 rt_sigaction(SIGHUP, {sa_handler=0x76dc9d88, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x1a15c, sa_mask=[HUP], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:21.320238 rt_sigaction(SIGQUIT, {sa_handler=0x76dc9d88, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x1a15c, sa_mask=[QUIT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:21.320662 rt_sigaction(SIGTERM, {sa_handler=0x76dc9d88, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x1a15c, sa_mask=[TERM], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0 08:22:21.320996 rt_sigaction(SIGINT, {sa_handler=0x76dc9d88, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, {sa_handler=0x1a15c, sa_mask=[INT], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76556120}, 8) = 0
Trying to get info on that PID has not been fruitful. It seems to close before I can use
ps -p
It seems like before each of the restarts there is a message in the system log about part of the router advertisement expiring:
May 28 14:22:04 raspberrypi dhcpcd[411]: eth0: part of Router Advertisement expired May 28 14:22:04 raspberrypi unbound: [678:0] info: service stopped (unbound 1.9.0).
Is this an issue related to the device I am using, or is the pfsense box somehow causing the restarts on the other devices? Any help would be appreciated.
Relevant issue on the other forum
-
solved: the culprit appears to be the resolvconf script (which resolvconf)
sha1sum /usr/sbin/resolvconf 4bfee7ac4e855ae48e35ab9ac37ebb8c2d37d210 /usr/sbin/resolvconf
I haven't had an unbound stop message since I commented out
#unbound_conf=/var/cache/unbound/resolvconf_resolvers.conf
in /etc/resolvconf.conf, this on raspberry pi 3b, Raspberry Pi OS Lite, Release date: May 7th 2021.
You can read the full story here
summary of the events I noticed:
- stop message in unbound: May 14 06:15:26 unbound[790:0] info: service stopped (unbound 1.13.1)
- matching syslog error : May 14 06:15:26 raspberrypi dhcpcd[562]: eth0: part of Router Advertisement expired
- setup monitoring in screen: sudo strace -tt -ffo /tmp/trace-unbound -e trace=%signal -p "pidof unbound"
- logging script in screen (the topic also describes a method using audit):
#!/bin/bash file="/home/pi/ps-test.txt" while : do while IFS=" " read -r USER PID CPU MEM VSZ RSS TTY STAT START TIME COMMAND; do if ! grep -q "${PID}" "${file}"; then if [[ ( "${COMMAND}" != 'ps aux' ) && ( "${COMMAND}" != *"ps-test"* )]]; then echo "$(date),${PID},${COMMAND}" | sudo tee -a "${file}" fi fi done < <(ps aux) done
result in log:
Sun 30 May 18:48:15 CEST 2021,32610,/bin/sh /usr/sbin/resolvconf -a eth0.ra
commenting out the unbound entry in /etc/resolvconf.conf eliminates the unbound stops.
For the first time, since I started monitoring errors and warnings in my logs, the unbound log doesn't contain any errors / warnings.