Unbound crashes periodically with signal 11
-
@jkv said in pfSense 2.50 snapshots have been dying for the past couple of days:
I see unbound stopping
&
@jkv said in pfSense 2.50 snapshots have been dying for the past couple of days:
General System log that unbound exited on signal 11
You see it dying.
You use Service_Watchdog to restart it - right ?@jkv said in pfSense 2.50 snapshots have been dying for the past couple of days:
pfBlockerNG-devel (3.0.0_10).
How often is the pfBlockerNG-devel doing it's cron task ? This task is logged. Does it restart unbound ?
What happens when you stop "Service_Watchdog ", so it doesn't restart unbound ?What I'm trying to find out : if Service_Watchdog detects that unbound stops, it launches another instance. But it was actually just stopping and restarting, ordered by pfBlockerNG-devel. So, two instances are started, one dies .....
This is just a theory, as I'm not using Service_Watchdog myselfAlso, SG-5100 is an Intel based machine, so "You and I" are using the same executable / same binary. Only our "config" differs. I don't know nothing about ARM based binaries, but I tend to say the "Intel" ones are pretty solid.
These :
@jkv said in pfSense 2.50 snapshots have been dying for the past couple of days:
have some Static DHCP entries
do nothing to unbound. The "static DHCP settings" (host name IP relation) are copied in the /etc/hosts file during boot. this file is (also) read by unbound during it's initial start up. These 'static DHCP setting' rarely change, that is, only if you delete/modify/add one. Look at this file, you'll see what I mean.
(In the past) the "DHCP Registration / Register DHCP leases in the DNS Resolver" could be problematic. The ""static DHCP settings"" were never a source of issue. -
the cron job for pfBlockerNG-devel is hourly and there does not appear to be any correlation between this cron job and unbound exiting. I will do some testing with Service_Watchdog disabled to see what happens to unbound.
-
I doubt that service watchdog is the cause of the issue. It wasn't even present on my installation until I installed it so I wouldn't have to manually restart unbound after the crashes.
-
If there was a way for me to get a testing version of pfSense with Unbound 1.13.1 I would be more than happy to install that promptly and give feedback as to whether or not it is helpful at dealing with the issue.
Also, can we get the title of this forum post updated to something like "DNS Resolver/Unbound crashing on pfSense 2.5" so that we can attract the attention of anyone else searching for this issue?
-
@salander27-0 said in pfSense 2.50 snapshots have been dying for the past couple of days:
If there was a way for me to get a testing version of pfSense with Unbound 1.13.1 I would be more than happy to install that promptly and give feedback as to whether or not it is helpful at dealing with the issue.
We brought it in for snapshots (2.6.0 in the branch choices) but a new one hasn't built yet which includes it. In theory the branches are close enough at the moment you may be able to manually install the pkg archive file from the snapshot repo without much harm.
-
I edited the title of the thread to more accurately describe the issue.
It would also be helpful to know the hardware in the cases where this is happening (e.g. SG-3100, SG-5100, whitebox/custom hardware running CE, etc)
-
@jimp What is the repo URL for the snapshot repo that I can find that updated package in? I checked
pkg+https://packages-beta.netgate.com/packages/pfSense_master_amd64-core
andpkg+https://packages-beta.netgate.com/packages/pfSense_master_amd64-pfSense_devel
and both still had unbound-1.13.0_2. -
@jimp In my case, it's a custom Mini-ITX box I made with a Gigabyte B-150N motherboard (dual gigabit Intel NIC), and this is what the dashboard says about it:
CPU Type Intel(R) Celeron(R) CPU G3900 @ 2.80GHz 2 CPUs: 1 package(s) x 2 core(s) AES-NI CPU Crypto: Yes (active) Hardware crypto AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS Kernel PTI Enabled
It's been running pfSense successfully for more years than I remember.
I'm getting occasional unbound crashes, and turned on the watchdog to restart the service when it dies.
ETA: I'm running the Community Edition.
-
@salander27-0 said in Unbound crashes periodically with signal 11:
@jimp What is the repo URL for the snapshot repo that I can find that updated package in? I checked
pkg+https://packages-beta.netgate.com/packages/pfSense_master_amd64-core
andpkg+https://packages-beta.netgate.com/packages/pfSense_master_amd64-pfSense_devel
and both still had unbound-1.13.0_2.In my previous reply I said "but a new one hasn't built yet which includes it." -- check later tonight/tomorrow AM.
-
@jimp I am running pfSense CE inside a Proxmox (6.2-10) VM on a Qotom-Q555G6-S05 (i5 7200u).
I only installed the service watchdog package after this issue started occurring as suggested earlier on this thread. In the meantime, I have reverted to a backup of my VM pre-update running pfSense 2.4.5-1.
-
@jimp Sorry, I misunderstood what you saying. I'll check on a built package later.
Also, looks like people are posting on Reddit too.
-
This was happening to me as well. I unchecked "DHCP registration" in the DNS Resolver config and for now it has eliminated the crash.
There was an issue before with this setting triggering an "unable to HUP" type error report, but I don't recall it causing a crash.
-
I got tired of seeing delayed DNS queries (because watchdog doesn't restart the service immediately), so I'm currently running a bash loop:
while true; do /usr/local/sbin/unbound -vd -c /var/unbound/unbound.conf; done
-v makes Unbound print a message while starting, so I have a record of all restarts for the last 3 hours, here are the number of seconds between those (in case it helps):
249
213
1982
266
143
45
970
647
1312
4065
174
60Doesn't seem to be consistent in my case -- but I also have an amount of devices on my network, maybe more devices makes things more noisy..?
-
@jimp Do you still want more configuration examples?
Here's mine:
Packages:
Acme, Avahi, RRD Summary, Service_Watchdog, Shellcmd, System_Patchesserver: chroot: /var/unbound username: "unbound" directory: "/var/unbound" pidfile: "/var/run/unbound.pid" use-syslog: yes port: 53 verbosity: 1 hide-identity: yes hide-version: yes harden-glue: yes do-ip4: yes do-ip6: no do-udp: yes do-tcp: yes do-daemonize: yes module-config: "validator iterator" unwanted-reply-threshold: 0 num-queries-per-thread: 512 jostle-timeout: 200 infra-host-ttl: 900 infra-cache-numhosts: 10000 outgoing-num-tcp: 10 incoming-num-tcp: 10 edns-buffer-size: 4096 cache-max-ttl: 86400 cache-min-ttl: 0 harden-dnssec-stripped: yes msg-cache-size: 4m rrset-cache-size: 8m num-threads: 4 msg-cache-slabs: 4 rrset-cache-slabs: 4 infra-cache-slabs: 4 key-cache-slabs: 4 outgoing-range: 4096 auto-trust-anchor-file: /var/unbound/root.key prefetch: no prefetch-key: no use-caps-for-id: no serve-expired: no aggressive-nsec: no statistics-interval: 0 extended-statistics: yes statistics-cumulative: yes tls-cert-bundle: "/etc/ssl/cert.pem" tls-port: 853 tls-service-pem: "/var/unbound/sslcert.crt" tls-service-key: "/var/unbound/sslcert.key" interface: 192.168.2.1 interface: 192.168.2.1@853 interface: 192.168.6.1 interface: 192.168.6.1@853 interface: 192.168.4.1 interface: 192.168.4.1@853 interface: 192.168.8.1 interface: 192.168.8.1@853 interface: fe80::201:2eff:fe78:9c5f%re1 interface: fe80::201:2eff:fe78:9c5f%re1@853 interface: fe80::201:2eff:fe78:9c5f%re1.6 interface: fe80::201:2eff:fe78:9c5f%re1.6@853 interface: fe80::201:2eff:fe78:9c5f%re1.4 interface: fe80::201:2eff:fe78:9c5f%re1.4@853 interface: fe80::201:2eff:fe78:9c5f%re1.8 interface: fe80::201:2eff:fe78:9c5f%re1.8@853 interface: 127.0.0.1 interface: 127.0.0.1@853 interface: ::1 interface: ::1@853 outgoing-interface: <*** REDACTED ***> outgoing-interface: <*** REDACTED ***> private-address: 127.0.0.0/8 private-address: 10.0.0.0/8 private-address: ::ffff:a00:0/104 private-address: 172.16.0.0/12 private-address: ::ffff:ac10:0/108 private-address: 169.254.0.0/16 private-address: ::ffff:a9fe:0/112 private-address: 192.168.0.0/16 private-address: ::ffff:c0a8:0/112 private-address: fd00::/8 private-address: fe80::/10 include: /var/unbound/access_lists.conf include: /var/unbound/host_entries.conf include: /var/unbound/dhcpleases_entries.conf include: /var/unbound/domainoverrides.conf forward-zone: name: "." forward-addr: 8.8.8.8 forward-addr: 8.8.4.4 server: log-servfail: yes private-domain: "pfsense.mydomain.com" include: /var/unbound/remotecontrol.conf
both "Register DHCP leases in the DNS Resolver" and "Register DHCP static mappings in the DNS Resolver" are enabled
Lease time is default (24hrs, I think?)
-
I also had the “ Register DHCP leases in the DNS Resolver” option enabled when seeing this issue and disabling this option appears to have stopped the crash.
-
@jkv +1
No packages at all... -
@fry-kun I suspect you're seeing crashes approximately whenever certain devices renew their DHCP lease. Since the timing of this has more to do with when that device was turned on you would expect to see somewhat random delays between such renewals.
Question for those experiencing these crashes, do you have both "Register DHCP static mappings in the DNS Resolver" enabled AND DHCP Static Mappings where at least one mapping has something filled in in the hostname field?
-
@salander27-0 said in Unbound crashes periodically with signal 11:
Question for those experiencing these crashes, do you have both "Register DHCP static mappings in the DNS Resolver" enabled AND DHCP Static Mappings where at least one mapping has something filled in in the hostname field?
Affirmative
-
@salander27-0 Yes, same here on my system..both were enabled and have quite some manual mappings in my system. After upgrade to 2.5 it seems unbound crashed about every ten minutes, after disabling register DHCP mapping it stopped crashing.
My System is based on a standalone AMD GX-412TC SOC having run pfSense stable for years now.
-
@salander27-0 I can confirm this as well, I have this option set with a handful of static mappings with hostname specified.