[Solved]Unbound stops resolving intermittently
-
I have a problem where unbound stops resolving DNS. I have to restart unbound or restart unbound and my VPN client service to fix it. Today I had 2 of these where I got the dreaded "internet is broke" yells from my wife. But usually I get these once in a couple of weeks or so. When this happens, my work laptop which uses my ISP gateway works normally, just all other devices which use the VPN gateway, fail to get online.
I am not a networking guy and everything I have set up is from how-tos that I have followed over the years. Now that we have established that I hanging by a thin thread on the Newb scale, I would really appreciate if someone could help me survive my wife in these times of shelter-in-place lockdowns.
I may be a networking newb, but I do know that I need to research before posting. I have read these related threads :
https://forum.netgate.com/topic/144565/unbound-stops-resolving-externallyhttps://forum.netgate.com/topic/92402/dns-resolver-sometimes-not-resolving-hosts
https://forum.netgate.com/topic/130800/new-pfsense-install-unbound-regularly-stops-resolving-internal-hostnames
https://forum.netgate.com/topic/150093/blocking-port-53-issues-resolving-host-names
There didn't seem to be any definite solution to my problem except in 1 thread where user @johnpoz recommended unchecking the DHCP Registeration. I will do that right now and see if that helps.
My setup is:
- I have a VPN client gateway. All devices except 2 (Roku & work laptop) go through this gateway
- I use DNS Resolver/unbound with NO DNS servers listed under System>General setup -- this was recommended in my VPN's setup tutorial so that I am only using their own DNS server to avoid DNS leaks. Over the years, I dropped their DNS server too in favor of resolving it myself using pfsense/unbound only
- DNS Forwarder is disabled
I was blocking access to all other DNS except my pfsense until yesterday, using this firewall rule:
block IPV4 UDP LAN net * * 53(DNS) * Block all other DNS
All Firewall Rules under LAN
2 /1.37 GiB * * * LAN Address 443/80/22 * * Anti-Lockout Rule 26 /3.77 GiB IPv4 UDP * * LAN net 53 (DNS) * none Allow DNS to pfSense 0 /0 B IPv4 UDP LAN net * * 53 (DNS) * none Block All other DNS 17 /925.34 GiB IPv4 * wan_devices * * * WAN_DHCP none This rule allows devices in the wan_devices to bypass the VPN 49 /1.10 TiB IPv4 * LAN net * * * VPN_INTF none Default allow LAN to any rule
I disabled the "Block all other DNS" rule because a caddy2 server on my local network wasn't able to fetch the DNS records from my domain name that I own --- for setting up lets encrypt certs. That is another problem that I have for another thread.
System --> General setup --> DNS Server Settings
- DNS Servers -- none listed
- DNS Server override -- unchecked
- Disable DNS Forwarder -- unchecked
Unbound settings:
- Enable DNS resolver -- checked
- Network Interfaces -- All
- Outbound Network Interfaces -- VPN_INTF
- DNSSEC - checked
- DNS Query Forwarding -- unchecked
- DHCP Registration -- checked
- Static DHCP -- checked
- OpenVPN clients -- checked
- Custom options --- server:include: /var/unbound/pfb_dnsbl.*conf
Here's my DNS Resolver log: it seems that there are a few starts/restarts of unbound
May 26 15:32:20 unbound 24536:0 info: start of service (unbound 1.9.1). May 26 15:32:20 unbound 24536:0 notice: init module 1: iterator May 26 15:32:20 unbound 24536:0 notice: init module 0: validator May 26 15:32:18 unbound 24536:0 notice: Restart of unbound 1.9.1. May 26 15:32:18 unbound 24536:0 info: 0.524288 1.000000 1 May 26 15:32:18 unbound 24536:0 info: 0.262144 0.524288 3 May 26 15:32:18 unbound 24536:0 info: 0.131072 0.262144 2 May 26 15:32:18 unbound 24536:0 info: 0.065536 0.131072 12 May 26 15:32:18 unbound 24536:0 info: 0.032768 0.065536 12 May 26 15:32:18 unbound 24536:0 info: 0.016384 0.032768 11 May 26 15:32:18 unbound 24536:0 info: 0.008192 0.016384 1 May 26 15:32:18 unbound 24536:0 info: lower(secs) upper(secs) recursions May 26 15:32:18 unbound 24536:0 info: [25%]=0.0305338 median[50%]=0.057344 [75%]=0.106496 May 26 15:32:18 unbound 24536:0 info: histogram of recursion processing times May 26 15:32:18 unbound 24536:0 info: average recursion processing time 0.099084 sec May 26 15:32:18 unbound 24536:0 info: server stats for thread 1: requestlist max 1 avg 0.0465116 exceeded 0 jostled 0 May 26 15:32:18 unbound 24536:0 info: server stats for thread 1: 451 queries, 409 answers from cache, 42 recursions, 1 prefetch, 0 rejected by ip ratelimiting May 26 15:32:18 unbound 24536:0 info: 0.524288 1.000000 3 May 26 15:32:18 unbound 24536:0 info: 0.262144 0.524288 4 May 26 15:32:18 unbound 24536:0 info: 0.131072 0.262144 6 May 26 15:32:18 unbound 24536:0 info: 0.065536 0.131072 21 May 26 15:32:18 unbound 24536:0 info: 0.032768 0.065536 18 May 26 15:32:18 unbound 24536:0 info: 0.016384 0.032768 20 May 26 15:32:18 unbound 24536:0 info: 0.008192 0.016384 1 May 26 15:32:18 unbound 24536:0 info: 0.002048 0.004096 1 May 26 15:32:18 unbound 24536:0 info: 0.001024 0.002048 2 May 26 15:32:18 unbound 24536:0 info: 0.000000 0.000001 2 May 26 15:32:18 unbound 24536:0 info: lower(secs) upper(secs) recursions May 26 15:32:18 unbound 24536:0 info: [25%]=0.0274432 median[50%]=0.0564338 [75%]=0.110787 May 26 15:32:18 unbound 24536:0 info: histogram of recursion processing times May 26 15:32:18 unbound 24536:0 info: average recursion processing time 0.099151 sec May 26 15:32:18 unbound 24536:0 info: server stats for thread 0: requestlist max 4 avg 0.202532 exceeded 0 jostled 0 May 26 15:32:18 unbound 24536:0 info: server stats for thread 0: 529 queries, 451 answers from cache, 78 recursions, 1 prefetch, 0 rejected by ip ratelimiting May 26 15:32:18 unbound 24536:0 info: service stopped (unbound 1.9.1). May 26 15:19:07 unbound 24536:1 info: generate keytag query _ta-4f66. NULL IN May 26 15:19:07 unbound 24536:0 info: start of service (unbound 1.9.1). May 26 15:19:07 unbound 24536:0 notice: init module 1: iterator May 26 15:19:07 unbound 24536:0 notice: init module 0: validator May 26 15:19:05 unbound 24536:0 notice: Restart of unbound 1.9.1. May 26 15:19:05 unbound 24536:0 info: 2.000000 4.000000 1 May 26 15:19:05 unbound 24536:0 info: 0.524288 1.000000 1 May 26 15:19:05 unbound 24536:0 info: 0.262144 0.524288 2 May 26 15:19:05 unbound 24536:0 info: 0.131072 0.262144 3 May 26 15:19:05 unbound 24536:0 info: 0.065536 0.131072 4 May 26 15:19:05 unbound 24536:0 info: 0.032768 0.065536 5 May 26 15:19:05 unbound 24536:0 info: 0.016384 0.032768 2 May 26 15:19:05 unbound 24536:0 info: 0.008192 0.016384 2 May 26 15:19:05 unbound 24536:0 info: 0.000000 0.000001 1 May 26 15:19:05 unbound 24536:0 info: lower(secs) upper(secs) recursions May 26 15:19:05 unbound 24536:0 info: [25%]=0.0344064 median[50%]=0.073728 [75%]=0.207531 May 26 15:19:05 unbound 24536:0 info: histogram of recursion processing times May 26 15:19:05 unbound 24536:0 info: average recursion processing time 0.238173 sec May 26 15:19:05 unbound 24536:0 info: server stats for thread 1: requestlist max 3 avg 0.619048 exceeded 0 jostled 0 May 26 15:19:05 unbound 24536:0 info: server stats for thread 1: 44 queries, 23 answers from cache, 21 recursions, 0 prefetch, 0 rejected by ip ratelimiting May 26 15:19:05 unbound 24536:0 info: 0.262144 0.524288 2 May 26 15:19:05 unbound 24536:0 info: 0.131072 0.262144 1 May 26 15:19:05 unbound 24536:0 info: 0.065536 0.131072 2 May 26 15:19:05 unbound 24536:0 info: 0.032768 0.065536 1 May 26 15:19:05 unbound 24536:0 info: 0.016384 0.032768 1 May 26 15:19:05 unbound 24536:0 info: lower(secs) upper(secs) recursions May 26 15:19:05 unbound 24536:0 info: [25%]=0.057344 median[50%]=0.114688 [75%]=0.294912 May 26 15:19:05 unbound 24536:0 info: histogram of recursion processing times May 26 15:19:05 unbound 24536:0 info: average recursion processing time 0.160643 sec May 26 15:19:05 unbound 24536:0 info: server stats for thread 0: requestlist max 2 avg 0.571429 exceeded 0 jostled 0 May 26 15:19:05 unbound 24536:0 info: server stats for thread 0: 22 queries, 15 answers from cache, 7 recursions, 0 prefetch, 0 rejected by ip ratelimiting May 26 15:19:05 unbound 24536:0 info: service stopped (unbound 1.9.1). May 26 15:18:38 unbound 24536:0 info: generate keytag query _ta-4f66. NULL IN May 26 15:18:38 unbound 24536:0 info: start of service (unbound 1.9.1). May 26 15:18:38 unbound 24536:0 notice: init module 1: iterator May 26 15:18:38 unbound 24536:0 notice: init module 0: validator May 26 15:18:36 unbound 24536:0 notice: Restart of unbound 1.9.1. May 26 15:18:36 unbound 24536:0 info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0 May 26 15:18:36 unbound 24536:0 info: server stats for thread 1: 1 queries, 1 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting May 26 15:18:36 unbound 24536:0 info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 May 26 15:18:36 unbound 24536:0 info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting May 26 15:18:36 unbound 24536:0 info: service stopped (unbound 1.9.1). May 26 15:18:36 unbound 24536:0 info: start of service (unbound 1.9.1). May 26 15:18:36 unbound 24536:0 notice: init module 1: iterator May 26 15:18:36 unbound 24536:0 notice: init module 0: validator May 26 15:18:32 unbound 23924:0 info: 0.262144 0.524288 5 May 26 15:18:32 unbound 23924:0 info: 0.131072 0.262144 1 May 26 15:18:32 unbound 23924:0 info: 0.065536 0.131072 5 May 26 15:18:32 unbound 23924:0 info: 0.032768 0.065536 1 May 26 15:18:32 unbound 23924:0 info: 0.000000 0.000001 1 May 26 15:18:32 unbound 23924:0 info: lower(secs) upper(secs) recursions May 26 15:18:32 unbound 23924:0 info: [25%]=0.08192 median[50%]=0.124518 [75%]=0.353894 May 26 15:18:32 unbound 23924:0 info: histogram of recursion processing times May 26 15:18:32 unbound 23924:0 info: average recursion processing time 0.172723 sec May 26 15:18:32 unbound 23924:0 info: server stats for thread 1: requestlist max 5 avg 1.38462 exceeded 0 jostled 0 May 26 15:18:32 unbound 23924:0 info: server stats for thread 1: 31 queries, 18 answers from cache, 13 recursions, 0 prefetch, 0 rejected by ip ratelimiting May 26 15:18:32 unbound 23924:0 info: 2.000000 4.000000 1 May 26 15:18:32 unbound 23924:0 info: 0.065536 0.131072 2 May 26 15:18:32 unbound 23924:0 info: 0.000000 0.000001 1 May 26 15:18:32 unbound 23924:0 info: lower(secs) upper(secs) recursions May 26 15:18:32 unbound 23924:0 info: [25%]=1e-06 median[50%]=0.098304 [75%]=0.131072 May 26 15:18:32 unbound 23924:0 info: histogram of recursion processing times May 26 15:18:32 unbound 23924:0 info: average recursion processing time 0.718361 sec May 26 15:18:32 unbound 23924:0 info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 May 26 15:18:32 unbound 23924:0 info: server stats for thread 0: 10 queries, 6 answers from cache, 4 recursions, 0 prefetch, 0 rejected by ip ratelimiting
dpinger/Gateway logs : the first 3 lines are after I restarted unbound and the vpn client services
May 26 15:18:33 dpinger VPN_INTF $VPN_REMOTE_GATEWAY: Alarm latency 13140us stddev 1072us loss 33% May 26 15:18:31 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr $VPN_REMOTE_GATEWAY bind_addr $VPN_REMOTE_IP identifier "VPN_INTF " May 26 15:18:31 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr $ISP_GATEWAY bind_addr $ISP_IP identifier "WAN_DHCP " May 26 15:17:43 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:43 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:42 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:42 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:41 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:41 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:40 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:39 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:39 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:38 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:38 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:37 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:37 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:36 dpinger VPN_INTF 10.8.3.1: sendto error: 55 May 26 15:17:36 dpinger VPN_INTF 10.8.3.1: sendto error: 55 code_text
Questions
- Can someone please help me figure this out so that it doesn't kill the internet in the house?
- Should I use only pfsense for all DNS resolution or should I add Google/Cloudflare/OpenDNS servers under System--> General Setup --> DNS Server settings in case unbound fails to resolve something?
If you need any other information, please let me know.
-
Any update here on what could possibly be the issue here ?
-
Hi,
Several things to test :
When WAN goes bad, VPN goes bad. The two dpinger lines show clearly both do not receive ICMP(ping) answer any more. The final result will be that dpinger restart these interfaces.
This restart will also restart many processes and packages.
What you'll see is some kind of snow ball effect.@Inxsible said in Unbound stops resolving intermittently:
OpenVPN clients -- checked
You have set up the OpenVPN server ? In any case, uncheck this.
@Inxsible said in Unbound stops resolving intermittently:
DHCP Registration -- checked
Uncheck this. If you have some devices that you address by their host name, set them using using static MAC leases.
Every new lease that comes in will restart unbound, the resolver.Shutdown your VPN. Use the WAN as the onty way out. The issue persists ? The issue is WAN uplink based. If ,not, it's VPN based.
@Inxsible said in Unbound stops resolving intermittently:
26 /3.77 GiB IPv4 UDP * * LAN net 53 (DNS) * none Allow DNS to pfSense
DNS traffic can also use TCP .
@Inxsible said in Unbound stops resolving intermittently:
I disabled the "Block all other DNS" rule because a caddy2 server on my local network wasn't able to fetch the DNS records from my domain name that I own
If that caddy2 device uses a resolver itself, then true, it won't use the local pfSense resolver. Check if that can be changed.
-
@Gertjan said in Unbound stops resolving intermittently:
When WAN goes bad, VPN goes bad. The two dpinger lines show clearly both do not receive ICMP(ping) answer any more. The final result will be that dpinger restart these interfaces.
This restart will also restart many processes and packages.
What you'll see is some kind of snow ball effect.As I mentioned, the first 3 lines of the dpinger logs were after I restarted unbound and the OpenVPN client services.
@Gertjan said in Unbound stops resolving intermittently:
@Inxsible said in Unbound stops resolving intermittently:
OpenVPN clients -- checked
You have set up the OpenVPN server ? In any case, uncheck this.
Yes, I also have a personal OpenVPN server, which is not currently being used because I am working from home. I will uncheck that option now.
@Gertjan said in Unbound stops resolving intermittently:
@Inxsible said in Unbound stops resolving intermittently:
DHCP Registration -- checked
Uncheck this. If you have some devices that you address by their host name, set them using using static MAC leases.
Every new lease that comes in will restart unbound, the resolver.Will do. Thank you.
@Gertjan said in Unbound stops resolving intermittently:
Shutdown your VPN. Use the WAN as the onty way out. The issue persists ? The issue is WAN uplink based. If ,not, it's VPN based.
@Inxsible said in Unbound stops resolving intermittently:26 /3.77 GiB IPv4 UDP * * LAN net 53 (DNS) * none Allow DNS to pfSense
DNS traffic can also use TCP .
Ok. My issue occurs intermittently, which is why I wasn't comfortable switching off the VPN client completely. But if this persists, then maybe I will do that for a few days and see if I still lose Internet connectivity. After testing for these, if the issue is with the WAN uplink or with the VPN uplink, what would be the resolution? How would I actually fix it, if I still end up losing network connectivity?
@Gertjan said in Unbound stops resolving intermittently:
@Inxsible said in Unbound stops resolving intermittently:
I disabled the "Block all other DNS" rule because a caddy2 server on my local network wasn't able to fetch the DNS records from my domain name that I own
If that caddy2 device uses a resolver itself, then true, it won't use the local pfSense resolver. Check if that can be changed.
I didn't set up a resolver in the caddy2 LXC container, but I will double check.
Thanks for your help @Gertjan.
-
@Inxsible said in Unbound stops resolving intermittently:
I didn't set up a resolver in the ....
A resolver doesn't need a setup.
Unlike a forwarder, which has to forward to 'something', like your ISP DNS, obtained by a DHCP request on WAN, o something else, like 8.8.8.8 if you want to sell your DNS requests to them.
A resolver primes fro the root of he Internet. The "famous 13", their location / names / addresses are hard coded. -
@Gertjan said in Unbound stops resolving intermittently:
A resolver doesn't need a setup.
What I meant was that I did not explicitly resolve DNS on the caddy2 container to anything different than my pfSense box which is the default nameserver I use when creating new containers.
-
@Gertjan said in Unbound stops resolving intermittently:
@Inxsible said in Unbound stops resolving intermittently:
DHCP Registration -- checkedUncheck this. If you have some devices that you address by their host name, set them using using static MAC leases.
Every new lease that comes in will restart unbound, the resolver.This is almost certainly what did it for you.
Note the long-running discussion/complaint about pfSense's implementation choice to restart unbound with every new DHCP lease, instead of just reloading the local zone (the only zone the DHCP leases can affect):
https://forum.netgate.com/topic/115482/frequent-unbound-restartsThis issue is especially pronounced if you have short DHCP leases and/or use pfBlockerNG, as the former means more frequent restarts, and the long list of blacklisted domains with the latter means each restart takes longer.
In addition to the traffic missed during restart, this also flushes the cache, so subsequent resolutions of common domain names become cache misses the next time they are accessed after a restart.
-
@brad-edmondson said in Unbound stops resolving intermittently:
have short DHCP leases
I did disable the DHCP registration and also the OpenVPN clients checkboxes as suggested by @Gertjan .
In addition to that, I also updated my VPN client settings to add multiple servers -- in case my VPN provider decides to change IP addresses or if they simply decommission the server that I am connecting to.
I haven't seen any issues since then. So it was a combination of those two things that fixed it for me. Obviously if you don't use a VPN provider, then the second part wouldn't apply to you.
-
-
-