Occassionally DNS fails to resolve, restarting DNS Resolver fixes it
-
I have noticed a problem with DNS Resolver going stale and not updating its cache correctly. It seems to happen about once a month but isn't on any regularity that I can figure out. Restarting DNS Resolver fixes it. Oddly there is nothing in the DNS Resolver logs to explain what is going on. Like literally zero entries. It just happened to me a moment ago and look at these two logs entries and dates:
Aug 18 08:14:01 unbound 28900 [28900:0] info: start of service (unbound 1.15.0). Sep 13 11:11:43 unbound 28900 [28900:0] info: service stopped (unbound 1.15.0).
I restarted DNS Resolver on that second entry.
This is obviously an easy fix other than it is annoying to hear the "Dad... the Internet is down!" calls from the kids.
Also of note that I find odd is that I cannot OpenVPN into pfSense during this state. So i cannot remotely restart DNS Resolver. I have to be home to restart it. And simply restarting DNS Resolver fixes OpenVPN.
Is there any known issue here with a solution I can go charge after?
If it matters, pfSense is running on a Netgate SG-2220.
-
@scottlindner There is a long thread about DNS problems in 22.05 at https://forum.netgate.com/topic/173148/slow-dns-after-22-05/ but I have not seen those issues.
-
@scottlindner said in Occassionally DNS fails to resolve, restarting DNS Resolver fixes it:
Aug 18 08:14:01 unbound 28900 [28900:0] info: start of service (unbound 1.15.0).
Sep 13 11:11:43 unbound 28900 [28900:0] info: service stopped (unbound 1.15.0).The start and stop here are not related.
Or was unbound really down for one month ? Not only kids would have noticed that.It's normal that unbound restart 'ones in a while'.
You've installed pfBlockerng ? It will restart unbound, depending on feed reload frequency.
You have unbound restarted every time a new DHCP lease comes in. ( I tend to advise to de activate this behaviour ).
You upstream device is a modem type : when your WAN IP changes it will take down the WAN interface of pfSense : unbound will restart.
Any other LAN interlace goes down and up ? unbound restarts.And then there is the https://forum.netgate.com/topic/173148/slow-dns-after-22-05/ ....
When you find a very recent "info: service stopped (unbound 1.15.0)." look at all the log files, and check what the reason was. unbound doesn't stop, unless it is signalled to stop.
Only the GUI can actually stop unbound. If a system event arrives that need to be taken in accunt, the system (pfSense) will restart = stop, and then start unbound.One exception : OOM (Out Of Memory) events : these events will elect a process (a big memory user) and 'stop' it to save the integrity of the system. This is logged ... I've seen it happen.
Btw : I'm using myself 22.05 also on a 4100. I'm fully dual stacked. My unbound is rock solid. This doesn't mean it doesn't restart once in a while : I'm using pfBlockerng and do a lot of tests.
-
@gertjan said in Occassionally DNS fails to resolve, restarting DNS Resolver fixes it:
The start and stop here are not related.
That's what OP is saying, that there are no log entries in the unbound log until he initiated the restart.
From the other thread though I believe 1.15 was the new version so it's on 22.05.
A "hammer" approach would be to install the cron package and restart unbound every "n" hours but that isn't solving the problem.
-
@steveits said in Occassionally DNS fails to resolve, restarting DNS Resolver fixes it:
@gertjan said in Occassionally DNS fails to resolve, restarting DNS Resolver fixes it:
A "hammer" approach would be to install the cron package and restart unbound every "n" hours but that isn't solving the problem.
That is what I was thinking of doing. I installed cron but I need to learn more about using it. I saw all of the default rules and was about to delete them all then wondered if that's just the system's cron that is already setup, and then uninstalled it not knowing that answer and I don't want to jack my firewall until I know how to use it. Lol
-
@scottlindner Yes the cron package shows all jobs including default and other packages.
-
another ref: https://forum.netgate.com/topic/174248/need-help-troubleshooting-dns-after-upgrade-to-22-05/1
-
@gertjan said in Occassionally DNS fails to resolve, restarting DNS Resolver fixes it:
@scottlindner said in Occassionally DNS fails to resolve, restarting DNS Resolver fixes it:
Aug 18 08:14:01 unbound 28900 [28900:0] info: start of service (unbound 1.15.0).
Sep 13 11:11:43 unbound 28900 [28900:0] info: service stopped (unbound 1.15.0).The start and stop here are not related.
Or was unbound really down for one month ? Not only kids would have noticed that.Yah... I noticed. :) Those are two back to back log entries. So that is exactly what it means.
It's normal that unbound restart 'ones in a while'.
You've installed pfBlockerng ? It will restart unbound, depending on feed reload frequency.
I don't. Is this the ideal approach vs a cron to restart unbound?You have unbound restarted every time a new DHCP lease comes in. ( I tend to advise to de activate this behaviour ).
You upstream device is a modem type : when your WAN IP changes it will take down the WAN interface of pfSense : unbound will restart.
Any other LAN interlace goes down and up ? unbound restarts.And then there is the https://forum.netgate.com/topic/173148/slow-dns-after-22-05/ ....
When you find a very recent "info: service stopped (unbound 1.15.0)." look at all the log files, and check what the reason was. unbound doesn't stop, unless it is signalled to stop.
Only the GUI can actually stop unbound. If a system event arrives that need to be taken in accunt, the system (pfSense) will restart = stop, and then start unbound.I'll do that. There is no deterministic frequency this happens. At least no pattern I have been able to figrue out.
One exception : OOM (Out Of Memory) events : these events will elect a process (a big memory user) and 'stop' it to save the integrity of the system. This is logged ... I've seen it happen.
Btw : I'm using myself 22.05 also on a 4100. I'm fully dual stacked. My unbound is rock solid. This doesn't mean it doesn't restart once in a while : I'm using pfBlockerng and do a lot of tests.
I don't care if unbound restarts, I care when I can't resolve DNS hostnames. Just to be clear what the root of my issue is here.
Really appreciate the time you are spending to help me sort through this.
-
For the record, my system is running the latest CE 2.7.0.
Following a reboot, Unbound status will show:
All is well until TTL reaches 0, when sometimes Unbound status will show:
or
At this point, a restart of Unbound will sometimes present a full list of DNS servers and other times only a partial list.
In terms of timeline, the system was updated just prior to July 6, 2022 and everything was working as expected. The problem appeared, for me, after updating the system on or around August 29, 2022.
-
@wlp94611 If you disable "Use SSL/TLS for outgoing DNS Queries to Forwarding Servers" does the problem still happen?
-
I didn't try that as using SSL/TLS was the whole point of my config.
What I did do is spin up a vm using CE 2.6.0 and voila the problem disappeared.
-
@wlp94611 said in Occassionally DNS fails to resolve, restarting DNS Resolver fixes it:
using SSL/TLS was the whole point of my config
I figured, but per the other thread some people have lots of trouble and others (including me) have not seen it at all, so I was looking for differences.
2.6 is a version behind so to speak...it's based on 22.01, so has the older unbound version. It's discussed I think towards the end of the long thread I linked above.
-