[SOLVED] Intermittent DNS Problem 23.01
-
Hi,
After upgrading to 23.01, suddenly dns stops working randomly and comes back alive on it own after few minutes. Issue is going on since two days right after upgrade.
PS C:\Users\me> nslookup google.ca Server: firewall.example.net Address: 192.168.40.1 *** firewall.example.net can't find google.ca: Server failed PS C:\Users\me> ping google.ca Ping request could not find host google.ca. Please check the name and try again. PS C:\Users\me> ping 1.1.1.1 Pinging 1.1.1.1 with 32 bytes of data: Reply from 1.1.1.1: bytes=32 time=77ms TTL=58 Reply from 1.1.1.1: bytes=32 time=4ms TTL=58 Reply from 1.1.1.1: bytes=32 time=12ms TTL=58 Reply from 1.1.1.1: bytes=32 time=6ms TTL=58 Ping statistics for 1.1.1.1: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 4ms, Maximum = 77ms, Average = 24ms
DNS Logs
Feb 19 00:35:53 unbound 15568 [15568:0] info: generate keytag query _ta-4f66. NULL IN Feb 19 00:35:53 unbound 15568 [15568:0] info: start of service (unbound 1.17.1). Feb 19 00:35:53 unbound 15568 [15568:0] notice: init module 1: iterator Feb 19 00:35:53 unbound 15568 [15568:0] notice: init module 0: validator Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.262144 0.524288 1 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.131072 0.262144 7 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.065536 0.131072 3 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.032768 0.065536 6 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.016384 0.032768 3 Feb 19 00:35:49 unbound 17934 [17934:0] info: lower(secs) upper(secs) recursions Feb 19 00:35:49 unbound 17934 [17934:0] info: [25%]=0.0436907 median[50%]=0.0873813 [75%]=0.187246 Feb 19 00:35:49 unbound 17934 [17934:0] info: histogram of recursion processing times Feb 19 00:35:49 unbound 17934 [17934:0] info: average recursion processing time 0.113825 sec Feb 19 00:35:49 unbound 17934 [17934:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 20 recursion replies sent, 0 replies dropped, 0 states jostled out Feb 19 00:35:49 unbound 17934 [17934:0] info: server stats for thread 3: requestlist max 2 avg 0.1 exceeded 0 jostled 0 Feb 19 00:35:49 unbound 17934 [17934:0] info: server stats for thread 3: 26 queries, 6 answers from cache, 20 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.262144 0.524288 6 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.131072 0.262144 11 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.065536 0.131072 20 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.032768 0.065536 30 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.016384 0.032768 6 Feb 19 00:35:49 unbound 17934 [17934:0] info: lower(secs) upper(secs) recursions Feb 19 00:35:49 unbound 17934 [17934:0] info: [25%]=0.0461483 median[50%]=0.0671744 [75%]=0.126976 Feb 19 00:35:49 unbound 17934 [17934:0] info: histogram of recursion processing times Feb 19 00:35:49 unbound 17934 [17934:0] info: average recursion processing time 0.101706 sec Feb 19 00:35:49 unbound 17934 [17934:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 73 recursion replies sent, 0 replies dropped, 0 states jostled out Feb 19 00:35:49 unbound 17934 [17934:0] info: server stats for thread 2: requestlist max 2 avg 0.273973 exceeded 0 jostled 0 Feb 19 00:35:49 unbound 17934 [17934:0] info: server stats for thread 2: 103 queries, 30 answers from cache, 73 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 00:35:49 unbound 17934 [17934:0] info: 1.000000 2.000000 1 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.524288 1.000000 2 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.262144 0.524288 3 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.131072 0.262144 13 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.065536 0.131072 16 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.032768 0.065536 14 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.016384 0.032768 8 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.008192 0.016384 1 Feb 19 00:35:49 unbound 17934 [17934:0] info: lower(secs) upper(secs) recursions Feb 19 00:35:49 unbound 17934 [17934:0] info: [25%]=0.0456411 median[50%]=0.090112 [75%]=0.176443 Feb 19 00:35:49 unbound 17934 [17934:0] info: histogram of recursion processing times Feb 19 00:35:49 unbound 17934 [17934:0] info: average recursion processing time 0.142320 sec Feb 19 00:35:49 unbound 17934 [17934:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 58 recursion replies sent, 0 replies dropped, 0 states jostled out Feb 19 00:35:49 unbound 17934 [17934:0] info: server stats for thread 1: requestlist max 2 avg 0.155172 exceeded 0 jostled 0 Feb 19 00:35:49 unbound 17934 [17934:0] info: server stats for thread 1: 71 queries, 13 answers from cache, 58 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.065536 0.131072 4 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.032768 0.065536 1 Feb 19 00:35:49 unbound 17934 [17934:0] info: 0.016384 0.032768 2 Feb 19 00:35:49 unbound 17934 [17934:0] info: lower(secs) upper(secs) recursions Feb 19 00:35:49 unbound 17934 [17934:0] info: [25%]=0.03072 median[50%]=0.073728 [75%]=0.1024 Feb 19 00:35:49 unbound 17934 [17934:0] info: histogram of recursion processing times Feb 19 00:35:49 unbound 17934 [17934:0] info: average recursion processing time 0.067463 sec Feb 19 00:35:49 unbound 17934 [17934:0] info: mesh has 0 recursion states (0 with reply, 0 detached), 0 waiting replies, 7 recursion replies sent, 0 replies dropped, 0 states jostled out Feb 19 00:35:49 unbound 17934 [17934:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 Feb 19 00:35:49 unbound 17934 [17934:0] info: server stats for thread 0: 11 queries, 4 answers from cache, 7 recursions, 0 prefetch, 0 rejected by ip ratelimiting Feb 19 00:35:49 unbound 17934 [17934:0] info: service stopped (unbound 1.17.1). Feb 19 00:35:46 unbound 17934 [17934:0] info: control cmd: stats_noreset Feb 19 00:35:41 unbound 17934 [17934:0] info: control cmd: stats_noreset Feb 19 00:35:41 unbound 17934 [17934:3] info: Verified that unsigned response is INSECURE Feb 19 00:35:41 unbound 17934 [17934:3] info: NSEC3s for the referral proved no DS. Feb 19 00:35:41 unbound 17934 [17934:3] info: resolving trafficmanager.net. DS IN Feb 19 00:35:41 unbound 17934 [17934:3] info: Verified that unsigned response is INSECURE Feb 19 00:35:41 unbound 17934 [17934:3] info: NSEC3s for the referral proved no DS. Feb 19 00:35:41 unbound 17934 [17934:3] info: resolving microsoft.com. DS IN Feb 19 00:35:41 unbound 17934 [17934:3] info: query response was ANSWER Feb 19 00:35:41 unbound 17934 [17934:3] info: reply from <.> 8.8.4.4#853 Feb 19 00:35:41 unbound 17934 [17934:3] info: response for vortex.data.microsoft.com. A IN Feb 19 00:35:41 unbound 17934 [17934:3] info: resolving vortex.data.microsoft.com. A IN Feb 19 00:35:41 unbound 17934 [17934:3] info: query response was CNAME Feb 19 00:35:41 unbound 17934 [17934:3] info: reply from <.> 8.8.4.4#853 Feb 19 00:35:41 unbound 17934 [17934:3] info: response for vortex.data.microsoft.com. A IN Feb 19 00:35:41 unbound 17934 [17934:3] info: resolving vortex.data.microsoft.com. A IN Feb 19 00:35:40 unbound 17934 [17934:2] info: Verified that unsigned response is INSECURE Feb 19 00:35:40 unbound 17934 [17934:2] info: NSEC3s for the referral proved no DS. Feb 19 00:35:40 unbound 17934 [17934:2] info: resolving azure.com. DS IN Feb 19 00:35:40 unbound 17934 [17934:2] info: Verified that unsigned response is INSECURE Feb 19 00:35:40 unbound 17934 [17934:2] info: NSEC3s for the referral proved no DS. Feb 19 00:35:40 unbound 17934 [17934:2] info: resolving trafficmanager.net. DS IN Feb 19 00:35:40 unbound 17934 [17934:2] info: Verified that unsigned response is INSECURE Feb 19 00:35:40 unbound 17934 [17934:2] info: NSEC3s for the referral proved no DS. Feb 19 00:35:40 unbound 17934 [17934:2] info: resolving microsoft.com. DS IN Feb 19 00:35:40 unbound 17934 [17934:2] info: query response was ANSWER Feb 19 00:35:40 unbound 17934 [17934:2] info: reply from <.> 8.8.4.4#853 Feb 19 00:35:40 unbound 17934 [17934:2] info: response for watson.events.data.microsoft.com. A IN Feb 19 00:35:40 unbound 17934 [17934:2] info: resolving watson.events.data.microsoft.com. A IN Feb 19 00:35:40 unbound 17934 [17934:2] info: resolving watson.events.data.microsoft.com. A IN Feb 19 00:35:40 unbound 17934 [17934:2] info: resolving watson.events.data.microsoft.com. A IN Feb 19 00:35:39 unbound 17934 [17934:2] info: Verified that unsigned response is INSECURE Feb 19 00:35:39 unbound 17934 [17934:2] info: NSEC RRset for the referral proved no DS.
Thanks in advance
-
@manjotsc Do you have DNS Resolver set to forward, with DNSSEC on? Try turning it off. I had what sounds like that issue and it seems to have fixed it up. Was working fine in earlier versions to Quad9, but: https://forum.netgate.com/topic/120105/enable-dnssec-support-and-opendns/3
And
https://docs.netgate.com/pfsense/en/latest/troubleshooting/dns.html#check-dns-serviceAlso check IPv6 is working.
-
@steveits I have DNS Resolver set to forward with DNSSec ON but with google dns instead of quad9 and I have disabled IPV6, was working fine before 23.01
-
@manjotsc Uncheck DNSSEC and I bet the issue goes away. See the links I posted.
-
@manjotsc said in Intermittent DNS Problem 23.01:
@steveits I have DNS Resolver set to forward with DNSSec ON but with google dns instead of quad9 and I have disabled IPV6, was working fine before 23.01
When you use forwarding, enabling DNSSEC in the DNS Resolver on pfSense is not needed. That's because when forwarding, whether DNSSEC is performed or not performed is totally controlled by the server you are forwarding to. That forwarding destination server does what it is configured by its admin to do. It does not follow instructions from clients. And enabling DNSSEC when forwarding seems to trip up
unbound
sometimes. -
@manjotsc Another link for you…per https://support.quad9.net/hc/en-us/articles/4433380601229-Setup-pfSense-and-DNS-over-TLS
“Disable Enable DNSSEC Support if enabled.
DNSSEC is already enforced by Quad9, and enabling DNSSEC at the forwarder level can cause false DNSSEC failures.” -
When forwarding, DNSSEC should be de activated as it makes no sense.
I actually wonder why this isn't even enforced in the GUI.But this :
@manjotsc said in Intermittent DNS Problem 23.01:
Feb 19 00:35:49 unbound 17934 [17934:0] info: service stopped (unbound 1.17.1).
doesn't look like unbound is 'tripping up'.
It was ordered 'from above' to restart
The question that pops in mind now is : who would restart unbound and why ?
A next question would be : why would my unbound restart xxx times a dayetc etc.
-
The three things that immediately come to mind when investigating frequent
unbound
restarts are:- DHCP client registrations in DNS is enabled under the DHCP Server settings tab. This restarts
unbound
each time a DHCP client renews its lease. - One of your physical interfaces is flapping (going offline and then online repeatedly). Most often this is the WAN interface. But when any interface
unbound
is listening on for requests bounces,unbound
can stop running and need a restart. - Using pfBlockerNG or pfBlockerNG-devel with the DNSBL configuration. This can result in
unbound
restarts when updating lists, although this is minimized somewhat in the latest package with changes to Python mode.
- DHCP client registrations in DNS is enabled under the DHCP Server settings tab. This restarts
-
@gertjan said in Intermittent DNS Problem 23.01:
why would my unbound restart xxx times a day
Exactly...
I updated to 23.01 yesterday morning... If I look to how long my unbound has been running
[23.01-RELEASE][admin@sg4860.local.lan]/root: unbound-control -c /var/unbound/unbound.conf status version: 1.17.1 verbosity: 1 threads: 4 modules: 2 [ validator iterator ] uptime: 91248 seconds options: control(ssl) unbound (pid 93928) is running... [23.01-RELEASE][admin@sg4860.local.lan]/root:
91k seconds, lets do the math.. 25.34666 Hours, what do you know.. As long as my pfsense has been up ;)
Uptime 1 Day 01 Hour 24 Minutes 27 Seconds
If your unbound is constantly restarting - you going to have a bad time of it..
-
@johnpoz said in Intermittent DNS Problem 23.01:
If your unbound is constantly restarting - you going to have a bad time of it..
I'll 100% agree with John here. I've never had an unscheduled
unbound
restart on my firewall.From perusing all the
unbound
/DNS trouble posts here on the forum, it has pretty much always come down to the fact the user began making changes to the out-of-the-boxunbound
/DNS Resolver setup, or they installed an add-on package that makes somewhat radical changes to the stockunbound
setup for them. Then things broke orunbound
became unstable by restarting frequently. -
@bmeeks said in Intermittent DNS Problem 23.01:
the user began making changes to the out-of-the-box unbound/DNS Resolver setup
Isn't this one enabled out of the pfSense box :
and, out of the box, you two, @bmeeks and @johnpoz, - and me - have disabled this option ;)
-
@gertjan said in Intermittent DNS Problem 23.01:
@bmeeks said in Intermittent DNS Problem 23.01:
the user began making changes to the out-of-the-box unbound/DNS Resolver setup
Isn't this one enabled out of the pfSense box :
and, out of the box, you two, @bmeeks and @johnpoz, - and me - have disabled this option ;)
No, I don't think it is enabled out-of-the-box. But then I don't use DHCP in pfSense since I have Active Directory.
-
@gertjan I believe is or was default yes.. I wish they would change that really.. At least until there is a way to not restart every time a lease is touched, etc. Might have to do a clean install of 2.6 or 2.7 again to check if in fact default.
In a small network, and not a lot of dhcp leases, etc and unbound restarting in faction of second sort of thing - other than the loss of cache prob not going to be too much of an issue.
But yeah this seems to be common issue with users having problems.
But if they did default it to off, prob just have just as many questions if not more to why they can not resolve some dhcp client, etc.
IMHO they should also default dnssec to off when the user clicks on do forwarding option. Or at least make a note on the setting that it can be problematic if forwarding, should be unchecked if forwarding.
I have made a few changes to unbound settings that is for sure. I serve 0, I set min cache to 3600 - yeah its bad practice to mess with the ttl set by the owner of the record.. Its also bad practice to set ttls of 30 seconds, or 5 minutes as well.. So screw um ;) I have yet to ever see any issue with setting the floor for ttl to 1 hour..
I also do qname minimization, I do prefetching of records, I have set to static zone vs transparent - also imho static should be the default vs transparent for the zone type. The zone type normally wouldn't matter, but I see no point in trying to resolve host.mydomain.tld when I do not have a local host.mydomain.tld - public internet sure isn't going to known anything about my local domain hosts ;) So it just keeps noise off the public internet - doing my part, just like I don't let traffic to rfc1918 out my wan via a floating rule. Would it hurt anything if did, no just pointless and again doing my part to keep noise off the internet.
But then again been working with dns for really since has been even a thing ;) so am fairly confident in my ability to troubleshoot anything that might be going on with dns.. I also fully understand exactly what these settings do and made a conscious choice to set them, etc. Other than if dhcp registration is default or not - most users would most likely have no problems if they just left the default, or at least fewer problems.
But if for whatever reason unbound is restarting X times a day/hour etc.. More than likely it will be problematic at some point for the user.
-
No that's never been enabled by default AFAIK. We're forever having to tell people to enable it because they expect to be able to connect to things with just the hostname.
The Unbound restarting situation with DHCP is.... suboptimal! It's definitely something we have on the list for 23.05.
-
@gertjan said in Intermittent DNS Problem 23.01:
Isn't this one enabled out of the pfSense box
I'm pretty certain DHCP Registration is not enabled by default. A lot of people do set it though. The "Note" there is a bit unclear as it does stop/start not reload its config.
-
-
@manjotsc Great!
I'm just going to reference this other thread on the same topic, same solution.