Unbound crashes periodically with signal 11
-
I edited the title of the thread to more accurately describe the issue.
It would also be helpful to know the hardware in the cases where this is happening (e.g. SG-3100, SG-5100, whitebox/custom hardware running CE, etc)
-
@jimp What is the repo URL for the snapshot repo that I can find that updated package in? I checked
pkg+https://packages-beta.netgate.com/packages/pfSense_master_amd64-core
andpkg+https://packages-beta.netgate.com/packages/pfSense_master_amd64-pfSense_devel
and both still had unbound-1.13.0_2. -
@jimp In my case, it's a custom Mini-ITX box I made with a Gigabyte B-150N motherboard (dual gigabit Intel NIC), and this is what the dashboard says about it:
CPU Type Intel(R) Celeron(R) CPU G3900 @ 2.80GHz 2 CPUs: 1 package(s) x 2 core(s) AES-NI CPU Crypto: Yes (active) Hardware crypto AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS Kernel PTI Enabled
It's been running pfSense successfully for more years than I remember.
I'm getting occasional unbound crashes, and turned on the watchdog to restart the service when it dies.
ETA: I'm running the Community Edition.
-
@salander27-0 said in Unbound crashes periodically with signal 11:
@jimp What is the repo URL for the snapshot repo that I can find that updated package in? I checked
pkg+https://packages-beta.netgate.com/packages/pfSense_master_amd64-core
andpkg+https://packages-beta.netgate.com/packages/pfSense_master_amd64-pfSense_devel
and both still had unbound-1.13.0_2.In my previous reply I said "but a new one hasn't built yet which includes it." -- check later tonight/tomorrow AM.
-
@jimp I am running pfSense CE inside a Proxmox (6.2-10) VM on a Qotom-Q555G6-S05 (i5 7200u).
I only installed the service watchdog package after this issue started occurring as suggested earlier on this thread. In the meantime, I have reverted to a backup of my VM pre-update running pfSense 2.4.5-1.
-
@jimp Sorry, I misunderstood what you saying. I'll check on a built package later.
Also, looks like people are posting on Reddit too.
-
This was happening to me as well. I unchecked "DHCP registration" in the DNS Resolver config and for now it has eliminated the crash.
There was an issue before with this setting triggering an "unable to HUP" type error report, but I don't recall it causing a crash.
-
I got tired of seeing delayed DNS queries (because watchdog doesn't restart the service immediately), so I'm currently running a bash loop:
while true; do /usr/local/sbin/unbound -vd -c /var/unbound/unbound.conf; done
-v makes Unbound print a message while starting, so I have a record of all restarts for the last 3 hours, here are the number of seconds between those (in case it helps):
249
213
1982
266
143
45
970
647
1312
4065
174
60Doesn't seem to be consistent in my case -- but I also have an amount of devices on my network, maybe more devices makes things more noisy..?
-
@jimp Do you still want more configuration examples?
Here's mine:
Packages:
Acme, Avahi, RRD Summary, Service_Watchdog, Shellcmd, System_Patchesserver: chroot: /var/unbound username: "unbound" directory: "/var/unbound" pidfile: "/var/run/unbound.pid" use-syslog: yes port: 53 verbosity: 1 hide-identity: yes hide-version: yes harden-glue: yes do-ip4: yes do-ip6: no do-udp: yes do-tcp: yes do-daemonize: yes module-config: "validator iterator" unwanted-reply-threshold: 0 num-queries-per-thread: 512 jostle-timeout: 200 infra-host-ttl: 900 infra-cache-numhosts: 10000 outgoing-num-tcp: 10 incoming-num-tcp: 10 edns-buffer-size: 4096 cache-max-ttl: 86400 cache-min-ttl: 0 harden-dnssec-stripped: yes msg-cache-size: 4m rrset-cache-size: 8m num-threads: 4 msg-cache-slabs: 4 rrset-cache-slabs: 4 infra-cache-slabs: 4 key-cache-slabs: 4 outgoing-range: 4096 auto-trust-anchor-file: /var/unbound/root.key prefetch: no prefetch-key: no use-caps-for-id: no serve-expired: no aggressive-nsec: no statistics-interval: 0 extended-statistics: yes statistics-cumulative: yes tls-cert-bundle: "/etc/ssl/cert.pem" tls-port: 853 tls-service-pem: "/var/unbound/sslcert.crt" tls-service-key: "/var/unbound/sslcert.key" interface: 192.168.2.1 interface: 192.168.2.1@853 interface: 192.168.6.1 interface: 192.168.6.1@853 interface: 192.168.4.1 interface: 192.168.4.1@853 interface: 192.168.8.1 interface: 192.168.8.1@853 interface: fe80::201:2eff:fe78:9c5f%re1 interface: fe80::201:2eff:fe78:9c5f%re1@853 interface: fe80::201:2eff:fe78:9c5f%re1.6 interface: fe80::201:2eff:fe78:9c5f%re1.6@853 interface: fe80::201:2eff:fe78:9c5f%re1.4 interface: fe80::201:2eff:fe78:9c5f%re1.4@853 interface: fe80::201:2eff:fe78:9c5f%re1.8 interface: fe80::201:2eff:fe78:9c5f%re1.8@853 interface: 127.0.0.1 interface: 127.0.0.1@853 interface: ::1 interface: ::1@853 outgoing-interface: <*** REDACTED ***> outgoing-interface: <*** REDACTED ***> private-address: 127.0.0.0/8 private-address: 10.0.0.0/8 private-address: ::ffff:a00:0/104 private-address: 172.16.0.0/12 private-address: ::ffff:ac10:0/108 private-address: 169.254.0.0/16 private-address: ::ffff:a9fe:0/112 private-address: 192.168.0.0/16 private-address: ::ffff:c0a8:0/112 private-address: fd00::/8 private-address: fe80::/10 include: /var/unbound/access_lists.conf include: /var/unbound/host_entries.conf include: /var/unbound/dhcpleases_entries.conf include: /var/unbound/domainoverrides.conf forward-zone: name: "." forward-addr: 8.8.8.8 forward-addr: 8.8.4.4 server: log-servfail: yes private-domain: "pfsense.mydomain.com" include: /var/unbound/remotecontrol.conf
both "Register DHCP leases in the DNS Resolver" and "Register DHCP static mappings in the DNS Resolver" are enabled
Lease time is default (24hrs, I think?)
-
I also had the โ Register DHCP leases in the DNS Resolverโ option enabled when seeing this issue and disabling this option appears to have stopped the crash.
-
@jkv +1
No packages at all... -
@fry-kun I suspect you're seeing crashes approximately whenever certain devices renew their DHCP lease. Since the timing of this has more to do with when that device was turned on you would expect to see somewhat random delays between such renewals.
Question for those experiencing these crashes, do you have both "Register DHCP static mappings in the DNS Resolver" enabled AND DHCP Static Mappings where at least one mapping has something filled in in the hostname field?
-
@salander27-0 said in Unbound crashes periodically with signal 11:
Question for those experiencing these crashes, do you have both "Register DHCP static mappings in the DNS Resolver" enabled AND DHCP Static Mappings where at least one mapping has something filled in in the hostname field?
Affirmative
-
@salander27-0 Yes, same here on my system..both were enabled and have quite some manual mappings in my system. After upgrade to 2.5 it seems unbound crashed about every ten minutes, after disabling register DHCP mapping it stopped crashing.
My System is based on a standalone AMD GX-412TC SOC having run pfSense stable for years now.
-
@salander27-0 I can confirm this as well, I have this option set with a handful of static mappings with hostname specified.
-
confirmed
-
@jimp I have successfully downloaded and installed unbound-1.13.1 from the devel repo now that the package has been built. No crashes yet, but none of the DHCP leases of my static leases (with hostname) have expired yet so I wouldn't expect crashes yet. I will update in a few hours once those leases start expiring.
OBSOLETE PLEASE SEE https://forum.netgate.com/post/966915
For anyone else who wishes to try unbound 1.13.1 (normal caveats about this being unsupported and at your own risk) I simplified the install command to the following:
pkg add -f https://files01.netgate.com/packages/pfSense_master_amd64-pfSense_devel/All/unbound-1.13.1.txz
You can roll back with:
pkg install -f unbound
After either of these commands you will need to restart the unbound service to pick up on the new binary.
OBSOLETE PLEASE SEE https://forum.netgate.com/post/966915
-
@salander27-0 been running 1.13.1 for ~2hrs, no crashes yet!
1.13.0 crashed way more often, as evidenced in my earlier message -
Alright, it's been about 2 and a half hours since I installed unbound-1.13.1 on my system. I have stress-tested it by reducing the DHCP lease time to 120 seconds and have since seen hundreds of DHCP renewals (and subsequent unbound HUPs). Were this unbound 1.13.0 I would have likely seen dozens of crashes, however unbound 1.13.1 has been completely stable in that time.
At this point I believe the best course of option is for additional people to test to see if their issue is resolved with the updated version. If you are willing to take system backups and accept the (IMO very low) risk that you may need to reinstall pfSense completely if something goes wrong (so don't do this on your prod systems) then please follow the instructions in my previous comment to install unbound-1.13.1.
-
@salander27-0 I'm having similar results that you report. DHCP registration is turned back on, and no crashes so far. 1.13.1 seems to be an upstream solution.
-
I had a stable unbound service again after flushing all DHCP leases.
SOLUTION (in my case): FLUSH DHCP Leases
Details/ Follow-up is here:
https://forum.netgate.com/topic/161092/2-5-0-dns-service-stopping-randomly/5?_=1613861976462(sorry, if we may have opened a very similar thread to this topic)
-
@salander27-0 Thanks! This was really helpful. Seems to be working perfectly now.
-
@khuynh very well. Glad to help. Hit โlikeโ on the solution and spread the news.
-
@salander27-0 Thanks! That fixed it. I forced short leases to cause a lot of renewals, and since I installed unbound-1.13.1, I've had no crashes going on 18 hours now.
-
@fivetoedslothbear Yeah, I'm just past 24 hours myself without any crashes.
Also, to anyone who installed 1.13.1 please continue to follow this thread as you may need to manually install the patched unbound from the stable repositories if/when a patched version is pushed.
-
@salander27-0 Been having the same issue for the last 24 hours -- will try this solution and see if it works for me. I appreciate it mate. Have a good one.
-
I see that also on two 2.5.0 CE machines.
Disable "Register DHCP leases in the DNS Resolver" now and will have an eye on it. -
@salander27-0 :I tried this. Will report back. However, I never had "Register DHCP leases in the DNS Resolver" set.
-
Since the new version of Unbound fixes it, it's unlikely to actually be related to just that one setting (DHCP lease registration), but that is the fastest way to trigger it for some people.
I imagine others are/were hitting it as well in different ways. So there isn't a need to keep tracking potential causes now that we know the upgrade fixes it.
-
@jimp So is the plan then to push 1.13.1 to the stable repo or to try to bisect through the 1.13.1 release in order to find out which patch specifically fixes the issue and just apply that patch to the version in stable?
-
@salander27-0 said in Unbound crashes periodically with signal 11:
@jimp So is the plan then to push 1.13.1 to the stable repo or to try to bisect through the 1.13.1 release in order to find out which patch specifically fixes the issue and just apply that patch to the version in stable?
We'll bring in 1.13.1, there isn't a compelling case to do all the legwork to pick in partial changes at this point. 1.13.1 is a minor patch/bug fix release and the impact is low other than the fix for this which is highly beneficial.
-
@jimp Speak of the devil, looks like it's already been added to the stable repos.
For anyone coming into this thread now, you can run the following command to pull unbound 1.13.1 from the stable/2.5.x repo (I am unsure of how to update system packages from the UI, hopefully someone can chime in there):
pkg upgrade -f unbound
(you should see that it is installing
unbound-1.13.1
)
Make sure to restart unbound after this package installation.It is probably a good idea for those who have installed the devel package to do this as well just to ensure that your systems are not a mix of devel and stable packages.
-
There isn't a good way to do that from the GUI, but you could run
pkg upgrade -fy unbound
from Diagnostics > Command Prompt. Be sure to restart the Unbound service from Status > Services after.From the CLI the easiest way to do both is
pkg upgrade -fy unbound; pfSsh.php playback svc restart unbound
-
@jimp : I can confirm that the new version fixes it for us.
-
@salander27-0 fixed it for me too... thank you
-
thank you Jim.
This open and direct communication is really awesome! -
Fixed for me too, 24hrs with no unbound restarts...
-
Hi
After installing unbound-1.13.1 my unbound is still restarting from time to time.
My system have been unstable for a long time, but I haven't had the time to dig into it. Last week I figured out that it was the unbound which is restarting. So I updated to version 2.5. But still having issues.
I have "DHCP Registration", "Static DHCP" and "OpenVPN Clients" enabled in my DNS Resolver settings.
I will try to test further and report back
Kind regards
Jens M. Kofoed -
@salander27-0 said in Unbound crashes periodically with signal 11:
(you should see that it is installing unbound-1.13.1)
hmmm - running 21.02, figured hey why not upgrade unbound. Even though not seeing this issue.. But I don't see it updating to 1.13.1
[21.02-RELEASE][admin@sg4860.local.lan]/root: pkg upgrade -fy unbound Updating pfSense-core repository catalogue... pfSense-core repository is up to date. Updating pfSense repository catalogue... pfSense repository is up to date. All repositories are up to date. The following 1 package(s) will be affected (of 0 checked): Installed packages to be REINSTALLED: unbound-1.13.0_2 [pfSense] Number of packages to be reinstalled: 1 1 MiB to be downloaded. [1/1] Fetching unbound-1.13.0_2.txz: 100% 1 MiB 1.2MB/s 00:01 Checking integrity... done (0 conflicting) [1/1] Reinstalling unbound-1.13.0_2... ===> Creating groups. Using existing group 'unbound'. ===> Creating users Using existing user 'unbound'. [1/1] Extracting unbound-1.13.0_2: 100% [21.02-RELEASE][admin@sg4860.local.lan]/root: pfSsh.php playback svc restart unbound Attempting to issue restart to unbound service... unbound has been restarted. [21.02-RELEASE][admin@sg4860.local.lan]/root:
upon restart and looking in log - still seems to be 1.30.0
Feb 23 08:51:43 unbound 90907 [90907:0] info: start of service (unbound 1.13.0).
edit:
Didn't seem to update to 1.13.1[21.02-RELEASE][admin@sg4860.local.lan]/root: unbound-control -c /var/unbound/unbound.conf status version: 1.13.0 verbosity: 1 threads: 4 modules: 2 [ validator iterator ] uptime: 502 seconds options: control(ssl) unbound (pid 79734) is running... [21.02-RELEASE][admin@sg4860.local.lan]/root:
-
@jimp This (upgrade of unbound to 1.13.1) sad fully does NOT fix the issue on my system. Disabling DHCP leases fixed the restarts... but this package upgrade to 1.13.1 (and restart and re-enabling DHCP leases) brought the old behaviour back (restart of unbound about every 10 minutes in my case).
So actually 1.13.1 at least seems to not fully fix the problem for all systems.
UPDATE: but again also disabling DHCP leases again for/with 1.13.1 unbound fixes it again