2.5.0 DNS Service stopping randomly
-
Hi there,
just did the upgrade to 2.5 and my DNS resolver is stopping always around 2-3h after manual starting.Anyone facing same issues? How can I activate the debug log for unbound - was not able to find this on the first view..
I am not very experienced with unbound - so looking forward for your help :-)Here are the latest logs:
Last 50 DNS Resolver Log Entries. (Maximum 50) Feb 19 15:22:12 unbound 73198 [73198:0] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0 Feb 19 15:22:12 unbound 73198 [73198:0] info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 15:22:12 unbound 73198 [73198:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0 Feb 19 15:22:12 unbound 73198 [73198:0] info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 15:22:12 unbound 73198 [73198:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 Feb 19 15:22:12 unbound 73198 [73198:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 15:22:12 unbound 73198 [73198:0] info: service stopped (unbound 1.13.0). Feb 19 15:21:44 unbound 73198 [73198:0] info: start of service (unbound 1.13.0). Feb 19 15:21:44 unbound 73198 [73198:0] notice: init module 0: iterator Feb 19 15:21:44 unbound 73198 [73198:0] notice: Restart of unbound 1.13.0. Feb 19 15:21:44 unbound 73198 [73198:0] info: 0.000000 0.000001 4 Feb 19 15:21:44 unbound 73198 [73198:0] info: lower(secs) upper(secs) recursions Feb 19 15:21:44 unbound 73198 [73198:0] info: [25%]=2.5e-07 median[50%]=5e-07 [75%]=7.5e-07 Feb 19 15:21:44 unbound 73198 [73198:0] info: histogram of recursion processing times Feb 19 15:21:44 unbound 73198 [73198:0] info: average recursion processing time 0.000000 sec Feb 19 15:21:44 unbound 73198 [73198:0] info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0 Feb 19 15:21:44 unbound 73198 [73198:0] info: server stats for thread 3: 33 queries, 29 answers from cache, 4 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 15:21:44 unbound 73198 [73198:0] info: 0.000000 0.000001 7 Feb 19 15:21:44 unbound 73198 [73198:0] info: lower(secs) upper(secs) recursions Feb 19 15:21:44 unbound 73198 [73198:0] info: [25%]=2.5e-07 median[50%]=5e-07 [75%]=7.5e-07 Feb 19 15:21:44 unbound 73198 [73198:0] info: histogram of recursion processing times Feb 19 15:21:44 unbound 73198 [73198:0] info: average recursion processing time 0.000000 sec Feb 19 15:21:44 unbound 73198 [73198:0] info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0 Feb 19 15:21:44 unbound 73198 [73198:0] info: server stats for thread 2: 18 queries, 11 answers from cache, 7 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 15:21:44 unbound 73198 [73198:0] info: 0.000000 0.000001 3 Feb 19 15:21:44 unbound 73198 [73198:0] info: lower(secs) upper(secs) recursions Feb 19 15:21:44 unbound 73198 [73198:0] info: [25%]=0 median[50%]=0 [75%]=0 Feb 19 15:21:44 unbound 73198 [73198:0] info: histogram of recursion processing times Feb 19 15:21:44 unbound 73198 [73198:0] info: average recursion processing time 0.000000 sec Feb 19 15:21:44 unbound 73198 [73198:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0 Feb 19 15:21:44 unbound 73198 [73198:0] info: server stats for thread 1: 5 queries, 2 answers from cache, 3 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 15:21:44 unbound 73198 [73198:0] info: 0.000000 0.000001 16 Feb 19 15:21:44 unbound 73198 [73198:0] info: lower(secs) upper(secs) recursions Feb 19 15:21:44 unbound 73198 [73198:0] info: [25%]=2.5e-07 median[50%]=5e-07 [75%]=7.5e-07 Feb 19 15:21:44 unbound 73198 [73198:0] info: histogram of recursion processing times Feb 19 15:21:44 unbound 73198 [73198:0] info: average recursion processing time 0.000000 sec Feb 19 15:21:44 unbound 73198 [73198:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 Feb 19 15:21:44 unbound 73198 [73198:0] info: server stats for thread 0: 66 queries, 50 answers from cache, 16 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 15:21:44 unbound 73198 [73198:0] info: service stopped (unbound 1.13.0). Feb 19 15:17:55 unbound 73198 [73198:0] info: start of service (unbound 1.13.0). Feb 19 15:17:55 unbound 73198 [73198:0] notice: init module 0: iterator Feb 19 15:17:55 unbound 73198 [73198:0] notice: Restart of unbound 1.13.0. Feb 19 15:17:55 unbound 73198 [73198:0] info: 64.000000 128.000000 11 Feb 19 15:17:55 unbound 73198 [73198:0] info: 32.000000 64.000000 3 Feb 19 15:17:55 unbound 73198 [73198:0] info: 16.000000 32.000000 1 Feb 19 15:17:55 unbound 73198 [73198:0] info: lower(secs) upper(secs) recursions Feb 19 15:17:55 unbound 73198 [73198:0] info: [25%]=61.3333 median[50%]=84.3636 [75%]=106.182 Feb 19 15:17:55 unbound 73198 [73198:0] info: histogram of recursion processing times Feb 19 15:17:55 unbound 73198 [73198:0] info: average recursion processing time 64.948829 sec Feb 19 15:17:55 unbound 73198 [73198:0] info: server stats for thread 3: requestlist max 4 avg 2.4 exceeded 0 jostled 0
-
Dear all
I have the same issue after updating to 2.5.0.
The unbound service stopps randomly after several times per hour.I use "DNS forwarder" module mostly on default settings. I have increased DNS chache to 10 MB and set particular ingoing and outgoing network interfaces. Beside that no changes to the previous pfsense version.
There isn't much in the logs... All what I can see from syslog every time it stopps is below output. The only thing I can directly relate to is the kernel message at 15:53:37. However, I see the same messages at every restart.
Feb 19 15:53:06 arpwatch 90833 bogon 0.0.0.0 70:xx:xx:xx:xx:xx Feb 19 15:53:37 kernel pid 4137 (unbound), jid 0, uid 59: exited on signal 11 Feb 19 15:54:00 php 45674 servicewatchdog_cron.php: Service Watchdog detected service unbound stopped. Restarting unbound (DNS Resolver)
As a temporary workaorund I have installed watchdog and added unbound to be restarted automatically. Further, I'll put logging to Level 5 and would report back here if I can find more information.
Any hint to resolve this issue would be welcome.
Kind regards,
Oliver -
@yis said in 2.5.0 DNS Service stopping randomly:
Hi there,
..How can I activate the debug log for unbound - was not able to find this on the first view..
you can change the log level in Services -> DNS Resolver -> Advanced Settings -> "Log Level", 5th last from the bottom.
-
I am having the same problem. For now, I am using DNS forwarder until the issue with Unbound is fixed.
-
Dear all
Since I have Loglevel 5 enabled, I now get below log entries on unbound service terminations:
Feb 19 16:36:24 php-fpm 384 /services_unbound_advanced.php: The command '/usr/sbin/arp -s '192.168.my.pfsense-server' '00:0c:xx:xx:xx:xx'' returned exit code '1', the output was 'arp: writing to routing socket: Operation not permitted' Feb 19 16:42:55 kernel pid 15151 (unbound), jid 0, uid 59: exited on signal 11 Feb 19 16:43:00 php 23895 servicewatchdog_cron.php: Service Watchdog detected service unbound stopped. Restarting unbound (DNS Resolver)
...hmm
-
FYI this is a dupe of https://forum.netgate.com/topic/160005/unbound-crashes-periodically-with-signal-11/31
-
@yis known issue 100s are experiencing. Right now only good solution is to untick "Register DHCP leases in the DNS Resolver" and unbound will stop crashing.
-
@salander27-0 said in 2.5.0 DNS Service stopping randomly:
FYI this is a dupe of https://forum.netgate.com/topic/160005/unbound-crashes-periodically-with-signal-11/31
what a worthless comment. "Here go read 50 replies and waste 20 minutes until you find the solution"...how about just give the solution and not point to some other massively long thread.
-
@ccnewb At this point there is no "solution". There are several workarounds (disabling "Register DHCP leases in the DNS Resolver" counts as a workaround, not a solution) but the exact cause is still unknown. There has been more troubleshooting and analysis in the other thread and a Netgate dev there is actively soliciting more information so the cause can be identified. There is little point in having multiple threads going about the same issue and duplicating effort to identify the cause.
Also, you should strive to be a bit more friendly when responding in this forum. pfSense-CE is community-supported software after all.
-
I had same issue when I did and update to 2.5.0... At first I didn't even noticed there is a problem... Soon after couple of my friends call me to tell they have problem with internet... Finally I saw unbound was stopped somehow and couple hours after started it sops again... Finally I put it in service watchdog and there are no problem anymore since then...
-
Thank you all for the responses. I did investigate a bit further on the ARP messages that caught my attention shortly before the unbound services crashes. To my surprise I couldn't find a device with the particular hardware address among my devices.
So I flushed ARP cache and DHCP cache. Now I have working non-crashing unbound service.
SOLUTION:
- FLUSH DHCP LEASES
Allow me to add my 10 cents about the watchdog workaround I have used too. A device directly connected to the internet should never have crashing services that are automatically restarted. It could be a huge security hole. Further, on any system, to fix a permanently restarting service by automatically starting it is just pure wrong,
I like to mention, that the DNS resolver with pfsense 2.5.0 feels much faster than it was with the previous version. Thank you, good peolple from Netgate!
-
Wow - I am off for two days and this topic was growing great!
Thank you all for your ideas and supportings.
I will check the point with DHCP FLUSH this evening and monitor this the next days -
ok - I flushed DHCP leases last night - this morning the unbound service is dead again.
Going to start debug logs now, will report back...EDIT: this is the error I found in logs..
debug: ip4 10.1.1.1 port 53 (len 16) debug: attempt to get extra 3 targets debug: servselect ip4 10.1.1.1 port 53 (len 16) debug: rtt=120000 debug: No more query targets, attempting last resort debug: configured stub or forward servers failed -- returning SERVFAIL debug: store error response in message cache debug: return error response SERVFAIL debug: mesh_run: iterator module exit state is module_finished
will try to figure out, what this means
-
@yis Take a look at this suggestion.
This new version of unbound is reportedly more stable based on the initial testing.
-
after flushing the DHCP leases I had ubound service running for about 3 days until it started to crash again. So unfortunately flushing the DHCP leases isn't - depending on your systems load - a long lasting solution (although I like the word 'flush').
I did follow the instructions given by the great and honorable member @jimp (see Posting) to update the ubound service. I'll keep monitoring but will report back only, if I face further crashes.
jimps proposed solution was:
Go to Diagnostics > Command Prompt
pkg upgrade -fy unbound; pfSsh.php playback svc restart unbound
-
@noviceiii said in 2.5.0 DNS Service stopping randomly:
after flushing the DHCP leases I had ubound service running for about 3 days until it started to crash again. So unfortunately flushing the DHCP leases isn't - depending on your systems load - a long lasting solution (although I like the word 'flush').
I did follow the instructions given by the great and honorable member @jimp (see Posting) to update the ubound service. I'll keep monitoring but will report back only, if I face further crashes.
jimps proposed solution was:
Go to Diagnostics > Command Prompt
pkg upgrade -fy unbound; pfSsh.php playback svc restart unbound
flushing dhcp is not the solution.
and I don't want to write down every single dns from dhcp-lease - so i will upgrade unbound now and will have a look on this. -
I can confirm problem as resolved after upgrading unbound.
working stable again - thank you for all you support! -
@yis said in 2.5.0 DNS Service stopping randomly:
I can confirm problem as resolved after upgrading unbound.
working stable again - thank you for all you support!Thanks for this, I am getting 100's of e-mail from Service Watchdog
Service Watchdog detected service unbound stopped. Restarting unbound (DNS Resolver)
Browsing is often hit & miss Chrome will was unable to resolve and then once the service starts again load the page, I'll backup and see how a unbound upgrade goes for reference and anyone else trying this mine did
unbound: 1.13.0_2 -> 1.13.1 [pfSense]
-
the new unbound is not resolving the issue :/
Mar 14 13:23:40 unbound 33517 [33517:0] info: service stopped (unbound 1.13.1).
Mar 14 13:23:46 unbound 33517 [33517:0] info: service stopped (unbound 1.13.1).
Mar 14 13:29:28 unbound 33517 [33517:0] info: service stopped (unbound 1.13.1).
Mar 14 13:34:54 unbound 33517 [33517:0] info: service stopped (unbound 1.13.1).
Mar 14 13:36:11 unbound 33517 [33517:0] info: service stopped (unbound 1.13.1).
Mar 14 13:36:22 unbound 33517 [33517:0] info: service stopped (unbound 1.13.1). -
@ccnewb Yep I'm also in the same boat
I"ve uninstalled PFblocker and made sure register in DNS when DHCP lease is not checked same deal