Unbound DNS Resolver crashing randomly
-
Unbound DNS Resolver has crashed 5 times in the last 30 mins. No error in system logs. Service seems to be running but need to restart it to get clients browsing again.
Anyone else experienced this?
-
Can't say I have. Anything in the resolver log?
-
Nothing. That's the problem.
-
When it's in the crashed state, do pages not resolve ("xyz.com cannot be found") or do they load differently, like blank or what not?
-
Yup, browser complains of sites not been resolved due to DNS resolution issue.
-
This is 2.2? Did you update from 2.1.5 with unbound package on that?
-
try increasing verbosity of logging to 2 on services: dns resolver: advanced settings
try attaching it to your next post after a crash (might be good to clear resolver log and then restart it … can be done in Status: System logs: Resolver )
-
I have also noticed crashes in Unbound with no errors being logged… I have even tried increasing Log Verbosity to 5...
Wouldn't an auto-restart function be nice? Or have it revert to a previous conf file, or atleast revert to forwarding mode when it crashes?
Maybe the Service Watchguard could be used...
-
For what it's worth, I saw this on rare occasion during RC testing. In my case, the crash was usually shortly after reboot. Since GA, it's been stable for me, but I also haven't been rebooting.
-
-
I did install Service Watchdog after the repeated crashing. So far it looks to be fine. But the underlying issue needs to be fixed.
-
To combat DNS cache poisoning issues, I'm now exclusively using unbound as my resolver, and I just saw this happen this morning.
I restarted the service and setup Service watchdog. I also just set up email notifications as well so if it happens, I'll check logs for that time and maybe one of us will be able to get something useful from logs. -
Might it be crashing because someone outside the network is hammering it maliciously? Not sure if thats possibly the case?
I'm reading that:A denial of service flaw was found in the way BIND followed DNS delegations. A remote attacker could use a specially crafted zone containing a large number of referrals which, when looked up and processed, would cause named to use excessive amounts of memory or crash. (yes I know - Thats bind. Might it be happening with unbound?)
FreeBSD Security Advisory - By causing queries to be made against a maliciously-constructed zone or against a malicious DNS server, an attacker who is able to cause specific queries to be sent to a nameserver can trick unbound(8) resolver into following an endless series of delegations, which consumes a lot of resources. (this is unbound)
Anyway - Just wondering if this is less of a general stability issue and more of a someone is trying to hack my DNS issue.
At any rate, I'd rather my DNS get periodically restarted than to be misdirected. Wonder if the advanced setting might offer a way to prevent this from happening if this is the case seeing as how Trel is experiencing this and its fairly clear his DNS previously was being screwed with.
-
try increasing verbosity of logging to 2 on services: dns resolver: advanced settings
try attaching it to your next post after a crash (might be good to clear resolver log and then restart it … can be done in Status: System logs: Resolver )
I wonder if this is what I experienced the other day as well, loads of webpages even some in the google cache were becoming unavailable.
-
Happened again..
This webpage is not available
The server at www.samsung.com can't be found, because the DNS lookup failed. DNS is the network service that translates a website's name to its Internet address. This error is most often caused by having no connection to the Internet or a misconfigured network. It can also be caused by an unresponsive DNS server or a firewall preventing Internet from accessing the network.
-
Still - No one knows anything about your settings in dns resolver or system > general….
-
Default settings. This is a clean install. Just did a plain vanilla install this morning to rule out any user entered settings killing it.
Definite issue in the resolver. Noticed this happening frequently while using eBay android app. Never saw such issues in 2.1.5
-
Cool - 64bit? Pure hardware. No VM?
-
Yup amd64 on i3
-
This issue has started to become a nuisance. Kids have started to complain about it happening every 30 mins. Sometimes twice every 15 mins. Did a clean install again but it's still the same.
No errors logged and service is up the whole time. Only way temporary solution is to do a manual service restart.Is anyone working on fixing this?
-
Since noone can reproduce it, pretty much doubt anyone's working on it. Maybe you have some lolcats in there?! :o
-
Could be…
-
I wonder if a packet dump of DNS traffic on the WAN port is in order.
-
Others seeing similar issue as well
https://forum.pfsense.org/index.php?topic=88272.0
-
…yeah, but nobody want's to play with me anymore ;-) ...not even doktormotor :-D
-
But mine works.
And in response to the "default" install, I set explicit/specific in the page Services: DNS Resolver(General settings):Enabled True.
Network Interfaces : LAN's & Localhost
Outgoing Network Interfaces : All
All others choices are set False. -
I'm guessing it's not really crashed from the sounds of it (read: it's still running). This sounds like the issue in the "lolcats" thread doktornotor linked.
Go to Services>DNS Resolver, Advanced, make sure you have "Harden Glue" and "Harden DNSSEC data" both enabled.
-
@cmb:
I'm guessing it's not really crashed from the sounds of it (read: it's still running). This sounds like the issue in the "lolcats" thread doktornotor linked.
Go to Services>DNS Resolver, Advanced, make sure you have "Harden Glue" and "Harden DNSSEC data" both enabled.
Yes. This was the issue. This was the new symptom after enabling DNSSEC (without Harden Glue).
I posted this before I realized it was a symptom of the same issue when DNSSEC was turned on. -
I'm using version 2.2.2 of pfsense but the problem also occurred with version 2.2
Apparently when the unbound is on the machine after a few days of use begins to show great instability.
I lost the connection to the network interface lan in pfsense several times during the afternoon. After disabling the unbound problems ceased.
I initially uses the unbound to rewrite the domain of youtube.com, the process worked for about a week correctly, but stopped suddenly in one day. To solve the problem it was necessary to restart the service unbound and everything worked properly for a while. Referring again to happen after a few minutes.
The problem became serious when the next day was no longer possible to log in pfsense the web interface. I suspected that someone had managed to invade pfsense and damage files somehow.
I reinstalled the machine and everything was ok for about seven days, but this afternoon it started again.
I lost communication with the LAN interface of my pfsense, unplug and plug the network cable lan solve the problem, but soon returned to happen.
Finally I turned off the unbound and everything has stabilized.
I suspect the unbound is tipping the entire operating system somehow in pfsense. -
I am having the same issue. Unbound doesn't "crash" it just ramps up to 100% CPU and becomes unusable. Sometimes it goes away by itself, other times I have to restart the service to make it usable again.
Here is a link to the post I made: https://forum.pfsense.org/index.php?topic=93846.msg520894#msg520894
Interesting problem indeed….
-
@hda thank you for your post. I have Active Directory locally yet I still wanted to use DNS Resolver because all of my OpenVPN clients and DHCP reservations would all resolve. I had the hardest time... it would work for a day or so and then no longer resolve properly via nslookup or the host commands. I had to bounce the resolver service every time.
Seeing your post made me think that I should be more explicit and specifically pick the interface LAN & Localhost instead of All from the dropdown. I've now been up for 2-3 days with zero issues. Lesson learned that being explicit is not just for programming but for every part of life :)
Thanks again!
-
@JZng you are an all powerful necromancer, but in a good way at least.
-
@harvy66 I try brother :) didn't mean to wake the dead but what an important concept to pay forward. Leaving All in DNS Resolver > Network Interfaces made the pfSense resolver fail at random.
-
I'm getting similar issues. Looking in my DNS log i'm having the following show up:
Jan 7 11:03:22 unbound 70295:0 fatal error: Could not read config file: /unbound.conf. Maybe try unbound -dd, it stays on the commandline to see more errors, or unbound-checkconf Jan 7 11:03:22 unbound 70295:0 notice: Restart of unbound 1.8.1. Jan 7 11:03:22 unbound 70295:0 info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out Jan 7 11:03:22 unbound 70295:0 info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0 Jan 7 11:03:22 unbound 70295:0 info: server stats for thread 3: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting Jan 7 11:03:22 unbound 70295:0 info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out Jan 7 11:03:22 unbound 70295:0 info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0 Jan 7 11:03:22 unbound 70295:0 info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting Jan 7 11:03:22 unbound 70295:0 info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out Jan 7 11:03:22 unbound 70295:0 info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0 Jan 7 11:03:22 unbound 70295:0 info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimitin
I've got the following settings in my advanced dns resolver:
and when I check the DNS resolver status page, I get the message that the resolver is stopped or disabled.
Any ideas how i can resolve this please? My wife is going mad! lol
-
Do what the logs files says.
edit :
IE : goto console mode, option 8 and enterunbound-checkconf