Unbound crashes periodically with signal 11
-
Unbound stops, requires manual restart several times a day.
Version: 2.5.0.a.20210118.2350
Unbound 1.13.0
Packages: apcupsd 0.3.91_92021-01-21 04:43:35.153316+00:00 unbound 66192 [66192:0] info: 0.002048 0.004096 1 2021-01-21 04:43:35.153307+00:00 unbound 66192 [66192:0] info: lower(secs) upper(secs) recursions 2021-01-21 04:43:35.153296+00:00 unbound 66192 [66192:0] info: [25%]=0 median[50%]=0 [75%]=0 2021-01-21 04:43:35.153287+00:00 unbound 66192 [66192:0] info: histogram of recursion processing times 2021-01-21 04:43:35.153277+00:00 unbound 66192 [66192:0] info: average recursion processing time 0.002168 sec 2021-01-21 04:43:35.153262+00:00 unbound 66192 [66192:0] info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0 2021-01-21 04:43:35.153246+00:00 unbound 66192 [66192:0] info: server stats for thread 3: 3 queries, 2 answers from cache, 1 recursions, 0 prefetch, 0 rejected by ip ratelimiting 2021-01-21 04:43:35.152091+00:00 unbound 66192 [66192:0] info: 0.131072 0.262144 3 2021-01-21 04:43:35.152076+00:00 unbound 66192 [66192:0] info: 0.065536 0.131072 1 2021-01-21 04:43:35.152061+00:00 unbound 66192 [66192:0] info: lower(secs) upper(secs) recursions 2021-01-21 04:43:35.152052+00:00 unbound 66192 [66192:0] info: [25%]=0.131072 median[50%]=0.174763 [75%]=0.218453 2021-01-21 04:43:35.152039+00:00 unbound 66192 [66192:0] info: histogram of recursion processing times 2021-01-21 04:43:35.152028+00:00 unbound 66192 [66192:0] info: average recursion processing time 0.175769 sec 2021-01-21 04:43:35.152014+00:00 unbound 66192 [66192:0] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0 2021-01-21 04:43:35.151997+00:00 unbound 66192 [66192:0] info: server stats for thread 2: 4 queries, 0 answers from cache, 4 recursions, 0 prefetch, 0 rejected by ip ratelimiting 2021-01-21 04:43:35.150737+00:00 unbound 66192 [66192:0] info: 0.032768 0.065536 1 2021-01-21 04:43:35.150728+00:00 unbound 66192 [66192:0] info: lower(secs) upper(secs) recursions 2021-01-21 04:43:35.150717+00:00 unbound 66192 [66192:0] info: [25%]=0 median[50%]=0 [75%]=0 2021-01-21 04:43:35.150708+00:00 unbound 66192 [66192:0] info: histogram of recursion processing times 2021-01-21 04:43:35.150698+00:00 unbound 66192 [66192:0] info: average recursion processing time 0.049706 sec 2021-01-21 04:43:35.150689+00:00 unbound 66192 [66192:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0 2021-01-21 04:43:35.150673+00:00 unbound 66192 [66192:0] info: server stats for thread 1: 2 queries, 1 answers from cache, 1 recursions, 0 prefetch, 0 rejected by ip ratelimiting 2021-01-21 04:43:35.149631+00:00 unbound 66192 [66192:0] info: 0.131072 0.262144 2 2021-01-21 04:43:35.149620+00:00 unbound 66192 [66192:0] info: 0.002048 0.004096 1 2021-01-21 04:43:35.149611+00:00 unbound 66192 [66192:0] info: lower(secs) upper(secs) recursions 2021-01-21 04:43:35.149600+00:00 unbound 66192 [66192:0] info: [25%]=0 median[50%]=0 [75%]=0 2021-01-21 04:43:35.149591+00:00 unbound 66192 [66192:0] info: histogram of recursion processing times 2021-01-21 04:43:35.149580+00:00 unbound 66192 [66192:0] info: average recursion processing time 0.145391 sec 2021-01-21 04:43:35.149567+00:00 unbound 66192 [66192:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 2021-01-21 04:43:35.149546+00:00 unbound 66192 [66192:0] info: server stats for thread 0: 3 queries, 0 answers from cache, 3 recursions, 0 prefetch, 0 rejected by ip ratelimiting 2021-01-21 04:43:35.149002+00:00 unbound 66192 [66192:0] info: service stopped (unbound 1.13.0). 2021-01-21 04:42:07.648917+00:00 unbound 66192 [66192:0] info: start of service (unbound 1.13.0). 2021-01-21 04:42:07.642700+00:00 unbound 66192 [66192:0] notice: init module 0: iterator 2021-01-21 04:42:07.641500+00:00 unbound 66192 [66192:0] notice: Restart of unbound 1.13.0. 2021-01-21 04:42:07.640307+00:00 unbound 66192 [66192:0] info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0 2021-01-21 04:42:07.640290+00:00 unbound 66192 [66192:0] info: server stats for thread 3: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting 2021-01-21 04:42:07.639234+00:00 unbound 66192 [66192:0] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0 2021-01-21 04:42:07.639218+00:00 unbound 66192 [66192:0] info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting 2021-01-21 04:42:07.638136+00:00 unbound 66192 [66192:0] info: 0.131072 0.262144 1 2021-01-21 04:42:07.638121+00:00 unbound 66192 [66192:0] info: lower(secs) upper(secs) recursions 2021-01-21 04:42:07.638106+00:00 unbound 66192 [66192:0] info: [25%]=0 median[50%]=0 [75%]=0 2021-01-21 04:42:07.638097+00:00 unbound 66192 [66192:0] info: histogram of recursion processing times 2021-01-21 04:42:07.638086+00:00 unbound 66192 [66192:0] info: average recursion processing time 0.218760 sec 2021-01-21 04:42:07.638077+00:00 unbound 66192 [66192:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0 2021-01-21 04:42:07.638060+00:00 unbound 66192 [66192:0] info: server stats for thread 1: 1 queries, 0 answers from cache, 1 recursions, 0 prefetch, 0 rejected by ip ratelimiting 2021-01-21 04:42:07.636993+00:00 unbound 66192 [66192:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 2021-01-21 04:42:07.636974+00:00 unbound 66192 [66192:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting 2021-01-21 04:42:07.636447+00:00 unbound 66192 [66192:0] info: service stopped (unbound 1.13.0). 2021-01-21 04:41:56.104192+00:00 unbound 66192 [66192:0] info: start of service (unbound 1.13.0).
David
-
@maverick_slo
did not help: Jan 21 06:45:09 kernel pid 5252 (unbound), jid 0, uid 59: exited on signal 11 -
@n8it said in pfSense 2.50 snapshots have been dying for the past couple of days:
2021-01-21 04:43:35.149002+00:00 unbound 66192 [66192:0] info: service stopped (unbound 1.13.0).
2021-01-21 04:42:07.648917+00:00 unbound 66192 [66192:0] info: start of service (unbound 1.13.0).What/who is restarting unbound ? Check other logs, like DHCP.
How often does it restart.Btw : the unbound log sequence you showed is the normal one for stop and start : on stop it dumps some stats for every internal thread.
-
@gertjan
For me it is just this log line: Jan 21 06:45:09 kernel pid 5252 (unbound), jid 0, uid 59: exited on signal 11
Nothing in unbound logs or anywhere else.
Started few days ago. -
-
DHCP logs do show DHCP is sending a HUP to DNS daemon. It happens more than 12 times a day. Thanks for telling me to check.
code_text 2021-01-21 06:10:05.551715+00:00 dhcpd 49502 DHCPACK on 192.168.1.89 to 04:79:b7:ad:9c:7a via igb1 2021-01-21 06:10:05.551693+00:00 dhcpd 49502 DHCPREQUEST for 192.168.1.89 from 04:79:b7:ad:9c:7a via igb1 2021-01-21 06:07:14.269409+00:00 dhcpleases 35662 Sending HUP signal to dns daemon(43713) 2021-01-21 06:07:14.253979+00:00 dhcpd 49502 DHCPACK on 192.168.1.101 to ac:5f:3e:fe:91:3e (DesGS7) via igb1 2021-01-21 06:07:14.252255+00:00 dhcpd 49502 DHCPREQUEST for 192.168.1.101 from ac:5f:3e:fe:91:3e (DesGS7) via igb1 2021-01-21 06:06:52.752895+00:00 dhcpleases 35662 Sending HUP signal to dns daemon(43713) 2021-01-21 06:06:52.738449+00:00 dhcpleases 35662 Sending HUP signal to dns daemon(43713) 2021-01-21 06:06:52.721929+00:00 dhcpd 49502 Server starting service. 2021-01-21 06:06:52.721763+00:00 dhcpd 49502 Sending on Socket/fallback/fallback-net 2021-01-21 06:06:52.721705+00:00 dhcpd 49502 Sending on BPF/igb1/40:62:31:0a:d9:7b/192.168.1.0/24 2021-01-21 06:06:52.721631+00:00 dhcpd 49502 Listening on BPF/igb1/40:62:31:0a:d9:7b/192.168.1.0/24 2021-01-21 06:06:52.721562+00:00 dhcpd 49502 Sending on BPF/igb2/40:62:31:0a:d9:7c/192.168.100.0/24 2021-01-21 06:06:52.721536+00:00 dhcpd 49502 Listening on BPF/igb2/40:62:31:0a:d9:7c/192.168.100.0/24 2021-01-21 06:06:52.719470+00:00 dhcpd 49502 Wrote 24 leases to leases file. 2021-01-21 06:06:52.719236+00:00 dhcpd 49502 Wrote 0 new dynamic host decls to leases file. 2021-01-21 06:06:52.719187+00:00 dhcpd 49502 Wrote 0 deleted host decls to leases file. 2021-01-21 06:06:52.719107+00:00 dhcpd 49502 Wrote 0 class decls to leases file. 2021-01-21 06:06:52.718612+00:00 dhcpd 49502 For info, please visit https://www.isc.org/software/dhcp/ 2021-01-21 06:06:52.718602+00:00 dhcpd 49502 All rights reserved. 2021-01-21 06:06:52.718593+00:00 dhcpd 49502 Copyright 2004-2020 Internet Systems Consortium. 2021-01-21 06:06:52.718582+00:00 dhcpd 49502 Internet Systems Consortium DHCP Server 4.4.2 2021-01-21 06:06:52.718525+00:00 dhcpd 49502 PID file: /var/run/dhcpd.pid 2021-01-21 06:06:52.718493+00:00 dhcpd 49502 Database file: /var/db/dhcpd.leases 2021-01-21 06:06:52.718428+00:00 dhcpd 49502 Config file: /etc/dhcpd.conf 2021-01-21 06:06:52.716270+00:00 dhcpd 49502 For info, please visit https://www.isc.org/software/dhcp/ 2021-01-21 06:06:52.716232+00:00 dhcpd 49502 All rights reserved. 2021-01-21 06:06:52.716207+00:00 dhcpd 49502 Copyright 2004-2020 Internet Systems Consortium. 2021-01-21 06:06:52.716056+00:00 dhcpd 49502 Internet Systems Consortium DHCP Server 4.4.2 2021-01-21 06:06:51.481598+00:00 dhcpleases 35662 Could not deliver signal HUP to process 30993: No such process. 2021-01-21 06:06:51.481494+00:00 dhcpleases 35662 Sending HUP signal to dns daemon(30993) 2021-01-21 06:00:48.433505+00:00 dhcpleases 23469 Sending HUP signal to dns daemon(30993) 2021-01-21 06:00:48.419710+00:00 dhcpd 36710 Server starting service. 2021-01-21 06:00:48.419547+00:00 dhcpd 36710 Sending on Socket/fallback/fallback-net 2021-01-21 06:00:48.419432+00:00 dhcpd 36710 Sending on BPF/igb1/40:62:31:0a:d9:7b/192.168.1.0/24 2021-01-21 06:00:48.419387+00:00 dhcpd 36710 Listening on BPF/igb1/40:62:31:0a:d9:7b/192.168.1.0/24 2021-01-21 06:00:48.419220+00:00 dhcpd 36710 Sending on BPF/igb2/40:62:31:0a:d9:7c/192.168.100.0/24 2021-01-21 06:00:48.419168+00:00 dhcpd 36710 Listening on BPF/igb2/40:62:31:0a:d9:7c/192.168.100.0/24 2021-01-21 06:00:48.416935+00:00 dhcpd 36710 Wrote 24 leases to leases file. 2021-01-21 06:00:48.416727+00:00 dhcpd 36710 Wrote 0 new dynamic host decls to leases file. 2021-01-21 06:00:48.416680+00:00 dhcpd 36710 Wrote 0 deleted host decls to leases file. 2021-01-21 06:00:48.416615+00:00 dhcpd 36710 Wrote 0 class decls to leases file. 2021-01-21 06:00:48.416054+00:00 dhcpd 36710 For info, please visit https://www.isc.org/software/dhcp/ 2021-01-21 06:00:48.416044+00:00 dhcpd 36710 All rights reserved. 2021-01-21 06:00:48.416035+00:00 dhcpd 36710 Copyright 2004-2020 Internet Systems Consortium. 2021-01-21 06:00:48.416024+00:00 dhcpd 36710 Internet Systems Consortium DHCP Server 4.4.2 2021-01-21 06:00:48.415966+00:00 dhcpd 36710 PID file: /var/run/dhcpd.pid 2021-01-21 06:00:48.415930+00:00 dhcpd 36710 Database file: /var/db/dhcpd.leases 2021-01-21 06:00:48.415865+00:00 dhcpd 36710 Config file: /etc/dhcpd.conf 2021-01-21 06:00:48.413782+00:00 dhcpd 36710 For info, please visit https://www.isc.org/software/dhcp/ 2021-01-21 06:00:48.413752+00:00 dhcpd 36710 All rights reserved. 2021-01-21 06:00:48.413709+00:00 dhcpd 36710 Copyright 2004-2020 Internet Systems Consortium. 2021-01-21 06:00:48.413590+00:00 dhcpd 36710 Internet Systems Consortium DHCP Server 4.4.2
-
You have a single LAN with single W10 client and you see this :
@n8it said in pfSense 2.50 snapshots have been dying for the past couple of days:
2021-01-21 06:06:52.752895+00:00 dhcpleases 35662 Sending HUP signal to dns daemon(43713)
2021-01-21 06:06:52.738449+00:00 dhcpleases 35662 Sending HUP signal to dns daemon(43Who are these 192.168.1.101 - 192.168.1.89
Why is the dhcpd server restarting so often ?
If the LAN goes down, it's normal that clients trigger a DHCP sequence. But your LAN is virtual ... -
@gertjan well that resolved it.
Thanks for the tip now to find root cause... -
it looks like dhcp tries to restart the unbound service because of a new dhcp lease but unbound does not start again.
if I enter this
kill -1 `cat /var/run/unbound.pid`
in the comandline on my production system unbound restarts on the 2.5 snapshot unbound dies and doesn't start again.
can anyone confirm this?
-
@supermaster
Yup. -
@supermaster said in pfSense 2.50 snapshots have been dying for the past couple of days:
it looks like dhcp tries to restart the unbound
dhcp == the DHCP client, running on a WAN type interface to get a WAN IP from an upstream DHCP server, like your ISP router, or your ISP.
dhcpd is a server type app that receive DHCP requests from dhcp clients on your LAN.
I presume that your dhcp is the dhcpd process.Ins't
kill -1 `cat /var/run/unbound.pid`
just killing it ? It's not a 'restart' command.
Also, check if the PID, the integer in the file is actually the PID of the running instance of unbound.
ps ax | grep 'unbound'
to see how many unbound instances are running, and if the PID corresponds.
If the dhcpd is restarting unbound to often, set up unbound to ignore new DHCP leases - this works well with 2.4.5-p1.
-
yes i mean the dhcpd process. the problem is not that dhpcd restarts unbound too often.
if I understand the code correctly https://github.com/pfsense/FreeBSD-ports/blob/devel/sysutils/dhcpleases/files/dhcpleases.c
syslog(LOG_INFO, "Sending HUP signal to dns daemon(%u)", pidno); if (kill((pid_t)pidno, SIGHUP) < 0) syslog(LOG_ERR, "Could not deliver signal HUP to process %d: %m.", pidno)
it will send kil -HUP (kill -1 is the same as HUP) to unobund but i dont find where unbound is startet after that again.
this works in version 2.4.5 without problems because unbound just starts again when you do a kill -HUP but not in 2.5 since the update to unbound 1.13
-
I checked a few different systems here and I have no problem doing a HUP to unbound. I tried it multiple times multiple ways. It stays running and operating properly.
: ps uxaww | grep unbound unbound 36369 0.0 1.0 55904 4696 - Is 14:05 0:01.58 /usr/local/sbin/unbound -c /var/unbound/unbound.conf root 39523 0.0 0.0 10844 0 - IWs - 0:00.00 /usr/local/sbin/dhcpleases -l /var/dhcpd/var/db/dhcpd.leases -d lab.jimp.pw -p /var/run/unbound.pid -u /var/unbound/dhcpleases_entries.conf -h /etc/hosts root 33981 0.0 0.4 11044 1984 0 R+ 16:00 0:00.00 grep unbound : killall -HUP unbound : ps uxaww | grep unbound unbound 36369 3.9 5.0 51116 22852 - Ss 14:05 0:01.82 /usr/local/sbin/unbound -c /var/unbound/unbound.conf root 39523 0.0 0.0 10844 0 - IWs - 0:00.00 /usr/local/sbin/dhcpleases -l /var/dhcpd/var/db/dhcpd.leases -d lab.jimp.pw -p /var/run/unbound.pid -u /var/unbound/dhcpleases_entries.conf -h /etc/hosts root 34630 0.0 0.4 10988 1940 0 R+ 16:00 0:00.00 grep unbound : kill -HUP `cat /var/run/unbound.pid` : ps uxaww | grep unbound unbound 36369 2.0 4.6 51836 20828 - Ss 14:05 0:02.03 /usr/local/sbin/unbound -c /var/unbound/unbound.conf root 39523 0.0 0.0 10844 0 - IWs - 0:00.00 /usr/local/sbin/dhcpleases -l /var/dhcpd/var/db/dhcpd.leases -d lab.jimp.pw -p /var/run/unbound.pid -u /var/unbound/dhcpleases_entries.conf -h /etc/hosts root 20156 0.0 0.4 11036 1956 0 R+ 16:01 0:00.00 grep unbound : kill -1 `cat /var/run/unbound.pid` : ps uxaww | grep unbound unbound 36369 2.0 5.1 53956 23404 - Ss 14:05 0:02.18 /usr/local/sbin/unbound -c /var/unbound/unbound.conf root 39523 0.0 0.0 10844 0 - IWs - 0:00.00 /usr/local/sbin/dhcpleases -l /var/dhcpd/var/db/dhcpd.leases -d lab.jimp.pw -p /var/run/unbound.pid -u /var/unbound/dhcpleases_entries.conf -h /etc/hosts root 21124 0.0 0.5 11192 2064 0 S+ 16:01 0:00.00 grep unbound
Perhaps there is something specific to your unbound configuration that is making it happen (custom options? pfblocker? python module?)
Try to narrow it down more.
-
@jimp said in pfSense 2.50 snapshots have been dying for the past couple of days:
killall -HUP unbound
Hummmm. Sounds great.
For my own curiosity, I'll check upon unbound's source what's its doing with a HUP. The "2.4.5p1 version" = 1.10.1 wa simply restarting itself. This could have - now very known - consequences.This is a strong signal for me to try 2.5.0.xxxxx
And true : it's not only "dhcpleases"that can restart unbound.
If dhcpleases still exists under 2.5.0....
And if so, is it sending a HUP to unbound ?
Etc. -
@gertjan said in pfSense 2.50 snapshots have been dying for the past couple of days:
This is a strong signal for me to try 2.5.0.xxxxx
There are many things looking forward to in 2.5 ;) But I am just going to wait til release. I am going to go with clean install.. Since going to take the opportunity of moving my sg4860 to zfs..
But yeah unbound updates, able to do dhcp registrations without issues will be big for many users. I personally am looking forward to the openssl update and gui using tls 1.3.. Plus many more.. 2.5 has lots of good stuff in it..
Was just looking over the release notes again - hadn't noticed the usb tethering.. So seems be easy enough to plug my IPhone in and have network wide internet... That will be slick on power outages.. Where my network is still up via UPS.. Or when ISP goes out..
Yeah 2.5 looks very nice.. Well worth the wait ;)
-
@gertjan Did you ever figure this out? I just updated to 2.5.0-RELEASE and started having this issue (DNS was completely stable before the update). I've resorted to having Service Watchdog restart it when it goes down as a temporary measure.
I do not have "Enable registration of DHCP client names in DNS." enabled for either DHCP or DHCPv6.
-
@salander27-0 said in pfSense 2.50 snapshots have been dying for the past couple of days:
Did you ever figure this out?
Noop, sorry.
Had other occupations. -
@salander27-0 I just upgraded and I'm having the same problem
I have both "Register DHCP leases in the DNS Resolver" and "Register DHCP static mappings in the DNS Resolver" enabledEdit: Enabled watchdog, too. This is ridiculous, hope it gets fixed for real very soon!
-
@fry-kun Ah, I have both of those settings enabled too. I was thinking "Enable registration of DHCP client names in DNS" was what the above posters were referring to but I was mistaken.
The issue certainly seems to be DHCP-related. I had 4 crashes in a 10 minute span and then upped the DHCP lease time from 15 minutes to 6 hours and haven't seen any crashes yet (though I would expect a few in a few hours once the initial 6hr leases expire and get renewed). You may wish to increase your lease time as well just to help reduce the crash frequency.
-
@salander27-0 said in pfSense 2.50 snapshots have been dying for the past couple of days:
though I would expect a few in a few hours once the initial 6hr leases expire and get renewed
A lease gets renewed after half the duration of the lease.
A 15 minutes lease will get renewed after 7 minutes.
Known OS's like Windows, MAC etc are set up ask for leases lasting a day or two.Why 15 minutes ??
Although, pfSense - the DHCP server - should handle that just fine.