Sudden high latency with DNS, local resolver
-
Netgate SG-3100
pfSense 2.4.5This just started a week or two ago, and I've exhausted my troubleshooting knowledge. It's gotten noticeable on the network, and I'm curious what I may be missing.
I'm seeing 140ms to 300ms responses to DNS queries locally, while I get low latency responses from my upstream name servers (Cloudflare).
I've flushed the cache, disabled / removed pfBlockerNG. Heck, I even rebooted. But we still get slow responses on network, and we're seeing latency issues in our Unifi controller (for our wireless).
Any ideas?
-
First thing to clarify. Do you have pfSense-2.4.5 installed, or have you updated to the more recent 2.4.5_p1? If you have not updated, do that first. There was a bug fixed in 2.4.5_p1 with regards to
pf
tables and latency.If you have already updated to 2.4.5_p1, then another thing to consider is how you have the DNS Resolver configured on pfSense. Is it in resolver mode or forwarding mode? And if in resolver mode, do you also have DNSSEC enabled?
Resolver mode will query the root servers and work down through the DNS tree to find authoritative DNS servers for a domain. That is going to take some time on the first search for a domain. Subsequent requests for that same domain will be served from the cache (until the TTL value expires), and thus will come back substantially faster.
Forwarding mode will send the query to another DNS server you configure as the forwarding server, and that server will likely have most entries already cached if it's one of the big boys like Cloudfare, Google, etc. So a lookup there will be returned quickly from the forwarder's cache
-
Thanks! Yeah, 2.4.5_p1. DNS Resolver, with 1.1.1.1 / 1.0.0.1 as roots. DNSSEC enabled. Everything was great for months...and yeah, was expecting some latency after flushing the cache. But these slow responses are for domains that are already cached (or have been accessed by other devices on network).
-
Have you confirmed that
unbound
is seeing cache hits? You can look through this thread to see how to access some cache stats that theunbound
resolver maintains: https://forum.netgate.com/topic/157590/unbound-cache-hit-rate-is-anaemic/10.There are also some other good troubleshooting steps in that thread.
-
@bmeeks I'll check it out, thanks!
-
@bmeeks I'm going to guess this is an issue...there are a TON if these entries, just for today alone. As you can see, it's just about constant. This is just a small sample. I would think I'd only see this hourly (when pfBlockerNG updates), right? (I'm planning on changing that update to weekly)
Nov 4 13:15:03 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 13:17:13 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 13:30:53 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 13:33:40 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 13:44:59 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 13:47:46 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 13:49:15 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 13:49:25 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:01:43 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:01:53 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:02:04 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:02:15 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:02:25 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:02:36 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:11:40 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:23:54 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:24:18 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:24:49 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:45:00 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:47:47 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:49:16 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 14:49:27 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 15:01:43 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 15:01:55 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 15:02:05 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 15:02:16 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 15:02:27 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 15:09:44 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1). Nov 4 15:10:10 ngsg3100 unbound: [32107:0] info: start of service (unbound 1.10.1).
I'm showing cache hits...
unbound-control -c /var/unbound/unbound.conf stats_noreset | grep total.num total.num.queries=1225 total.num.queries_ip_ratelimited=0 total.num.cachehits=758 total.num.cachemiss=467 total.num.prefetch=2 total.num.expired=0 total.num.recursivereplies=467
-
Two things cause a bunch of
unbound
restarts: installing pfBlockerNG-devel and using the DNSBL functionality; and having automatic DNS registration enabled for the DHCP server. My first suspect is the latter.You need to turn that option off if it is enabled. You will find it under the DHCP Server settings. When that option is enabled, the DHCP service will restart
unbound
every time a client's DHCP lease is renewed. This is exacerbated when the client DHCP lease time is very short. -
Under Services > DNS Resolver > General Settingsand while you are on that page, remove also the checks from DNSSEC , as DNSSEC has no meaning when forwading.
You should also check the DHCP server logs, to find out what device is chain-gunning DHCP requests.
If you have big networks - many LAN based devices - or lots of devices using Wifi that looses their radio network often, many DHCP transactions could be normal. -
@Gertjan Thanks! The restarts have all but ceased now after making the changes you & @bmeeks recommended. I also enabled Query Name Minimization (not strict). So far, things seem to be a bit better.
I don't know if I'm sold on disabling the DHCP leases because I do like to have the local name resolution on-network. I'll look in to other alternatives.
re: DNSSEC...so I will admit, DNS is not my string suit anymore. I last managed it on a Win2k server many moons ago. If I disable DNSSEC, am I still "protected" (quotes intentional) if I use SSL/TLS for outgoing queries (and have the appropriate entry in my custom options for port 853)
-
@bwalkco said in Sudden high latency with DNS, local resolver:
re: DNSSEC...so I will admit, DNS is not my string suit anymore. I last managed it on a Win2k server many moons ago. If I disable DNSSEC, am I still "protected" (quotes intentional) if I use SSL/TLS for outgoing queries (and have the appropriate entry in my custom options for port 853)
When you enable forwarding, the upstream forwarder is totally in charge of whether DNSSEC is used or not. No matter what you send to it, it will "do its own thing" when resolving a request from you. So really no need to add the extra hassle of DNSSEC in that instance IMHO.
When you resolve locally, however, then DNSSEC is a very good thing to enable.
-
@bwalkco said in Sudden high latency with DNS, local resolver:
I don't know if I'm sold on disabling the DHCP leases because I do like to have the local name resolution on-network.
IMHO : never touch the LAN IP settings of a device. At most, change the host name, as your new PC would probably be called "DEFHT24MBBT" instead of the more logic "Office-Selling-2". A Window initial setup will ask for this name (and will never ask a user to init some IP stuff).
Take note of the MAC address of this new device - but you could do this on pfSense, by looking at the DHCP logs.On the DHCP-server side, using the now known MAC of the new device, assign it a static DHCP lease, with description, a (another ?) host name etc.
This way, you have your network all assigned on one place.Nice advantage : these "static DHCP lease" are loaded upon boot of pfSense, boot of the DHCP server, to be more precise.
No more DNS server (unbound) restarting needed. No more cache resets, no more DNS outages. -
I have a different view of DHCP DNS registration. When you manage relatively small networks, then static IPs (or MAC reservations) and manual DNS records work okay. But as you scale up to hundreds and then thousands of PCs, that becomes increasingly hard to manage. Especially when those PCs are scattered around geographically. To me, workable automatic DNS registration from the DHCP client is quite useful, and I am sad that functionality is currently not workable in pfSense due to the
unbound
restart behavior.I worked for years in a Fortune 500 US company with over 25,000 employees and thousands of PCs all running Windows scattered across four states in the south. We had a central internal Help Desk for support. The Help Desk connected to a user's PC via RDP (the company had its own private WAN/LAN arrangement using dedicated infrastructure it owned and some it leased). The connection was made by hostname. When our support folks imaged (installed Windows and corporate apps) a new PC prior to shipping it out, they assigned a hostname using an in-house scheme and put an icon on the desktop with that hostname. Now, no matter what corporate office or field location that PC went to, when it got its DHCP IP address for that office it would dutifully register its hostname in DNS with its IP address. Now the Help Desk could easily locate and connect to the PC by hostname. The hostname was displayed in plain sight on the user's desktop. Can you imagine being on the phone and trying to talk a typical user through finding and then telling you the IP address their workstation has so you could connect to it?
-
-