DNS Resolver (unbound) fails after reboot unless manually restarted
-
Here's the situation, at least as I've been able to diagnose thus far.
- Rebooting (either manually or after a power outage) causes the Unbound to start in a bad state.
- In pfSense 2.4.x: External DNS records could be found, but wouldn't be limited to the outgoing network interfaces required until restarting Unbound. So, connections were possible, but by leaking DNS.
- In pfSense 2.5: No external DNS records can be found, but internal records appear fine. This is better than creating a DNS leak, but makes the problem more pressing to fix. I'm still confirming the local records aren't cached somewhere on clients.
- I'm not seeing anything in the logs right now to explain what exactly the state is, but I'm cranking up the logs to run more diagnostics.
- I've limited Unbound to only access DNS servers through on of three outward OpenVPN connections (load balanced).
- Restarting Unbound manually (i.e., go to Status->Services, and restart DNS resolver) corrects the issue.
I suspect there's a race condition in which Unbound is attempting to start prior to the VPN connections coming online during the boot process. This causes something to timeout that doesn't recover automatically. However, opening up Unbound to connect to DNS servers directly through the WAN doesn't seem to resolve the issue either, so it might be more than just a race condition.
Here's a screenshot of the basic Unbound settings...
Easy enough to fix when I'm around, but if I'm not in the building, then people lose access to the internet until I'm back. And, the UPS won't keep it running through a long power outage, etc. I could probably work around the issue by forcing Unbound to restart again after reboot using a cron or something, but that feels like covering up the problem rather than solving it.
Does anybody have experience with this problem? Any ideas on how to resolve it, or other places I should look?
-
@josh-hall same issue for me and I work in hospital, I opened a case with the same problem same diagnostic but no solution till now.
Hope to find the solution as soon as possible and I think that Netgate must envolve to fix this critical issue.
-
@josh-hall said in DNS Resolver (unbound) fails after reboot unless manually restarted:
Does anybody have experience with this problem?
3 VPN's ...
Asking for SSL/TLS (its checked) while not forwarding : you are aware that the root, TLD and most if not any name DNS servers do not speak TLS/SS; right ?
Registering DHCP leases into DNS .....Poor unbound .....
edit : and your offering also SSL/TLS local support. Ok, why not.
-
Bumping this, as I'm having the same issue. Manual restart is needed for DNS resolver after any reboot.
-
Same here for a long time now. Using pfsense v.2.5.2
Some of my Unbound settings. -
@ajbrown said in DNS Resolver (unbound) fails after reboot unless manually restarted:
Bumping this, as I'm having the same issue. Manual restart is needed for DNS resolver after any reboot.
As you are using a VPN as a WAN, it might be possible that the VPN isn't yet "UP" when unbound is started. So it fails to start.
@CiscoX Try this : use the default :
for "Outgoing Network Interfaces".
-
Thanks, I'm going to try it out and report back after next reboot :)
-
@gertjan Doesn't this approach lead to DNS leaks as unbound is no longer constrained to specific network interfaces for outbound communication? Kinda defeats the point if I don't want my ISP selling my behavioral data.
Wouldn't a better approach be for unbound to simply retry the connection after a delay until unbound is able to find the outgoing network interfaces are UP? This way it can bind to the interface rather than failing silently. Doesn't need to be that aggressive even. Retry every 15s over a period of 5m would nearly always guarantee success. If it's still failed, then unbound should simply shutdown because all of the network interfaces weren't UP.
In fact, it's the silent failure that bothers me the most. Why would unbound appear to be started and running if the outgoing network interface wasn't bound correctly? I know it's called unbound, but that's supposed to be tongue-in-cheek, not a design decision.
I'm most likely missing something in the underlying decision process for unbound, but this doesn't make sense to me. Maybe somebody here can explain what I'm missing.
-
If it fails to start , I'd just put unbound in the service watchdog.
That way SW would keep trying to start it.I have unbound and all my VPN severs in SW.
I have seen (not often) unbound crash , and the i'm saved by SW restarting it wo. intervention.
/Bingo
-
@bingo600 It doesn't fail to start. It fails to start correctly on the initial boot. I've got unbound on the service watchdog, which works as expected if the network goes down while everything is running (this causes the openVPN to restart, which sometimes causes unbound to crash, then restart).
But, this isn't the pattern on initial boot. Unbound starts, but fails to attach to the outgoing networks that aren't up yet. With the service reportedly running, service watchdog can't do anything. However, at this point the service isn't running correctly. It's not bound to the outgoing networks (and doesn't attach once those networks are available).
Basically, unbound fails silently (and doesn't technically crash) in this situation. Hence all of the above observations and points I've made.
-
Ahh .. sorry i missed that info.
Could this work ?
https://phoenixnap.com/kb/crontab-rebootI mean ie. at ?? 2min after reboot "Restart unbound"
I have had a similar "Boot situation" with a "Raspberry" using secure DNS.
When the Raspi boots the clock isn't set , and it can't use Secure DNS , cert errors. It couldn't set time from NTP as DNS wouldn't work .....
So a catch22 , i ended up giving it an external RTC ... -
@bingo600 I tried that, but my bash skills are admittedly limited.
In cron
/usr/home/unbound_restart.sh
#!/bin/sh sleep 120 /usr/local/sbin/pfSsh.php playback svc restart unbound
Last time I dug into this, this solution did not work. I'm not certain if I'm triggering the unbound restart too quickly (i.e., before PHP is fully loaded or something), if the CLI doesn't fully restart the unbound service when in this odd state (but the GUI does), or if there's a better way to restart unbound from the CLI when calling from cron (I assume everything needed for PHP is in the path in this situation, but it might be a path dependency missing as well).
In short, I wasn't finding anything useful in the logs, and debug logs are an epic pain to read through... so I walked away before the screaming got too bad :)
Any other ideas or approaches are very welcome. I'll probably look at this again in a few weeks when I've got time.
-
@josh-hall
Well you could "just" kill unbound , and let SW start it again. -
@bingo600 That's a very good (and obvious) point I completely overlooked. I'll try that next time. Thanks!
-
@josh-hall said in DNS Resolver (unbound) fails after reboot unless manually restarted:
Doesn't this approach lead to DNS leaks as unbound is no longer constrained to specific network interfaces for outbound communication? Kinda defeats the point if I don't want my ISP selling my behavioral data.
Binding unbound to "All" interfaces doesn't mean it starts to look for "main root server" on one of your LAN's. pfSense knows that "198.41.0.4" or "a.root-servers.net" isb't reacable on LAN. As LAN exposes a route to "192.168.1.0/24".
True, if you have a working WAN at first - and afterwards a VPN connection comes up - as pfSense is using its VPN client to replace the WAN for all (or a part of) the traffic, then you should take care of that situation.
In one of the Netgate "OpenVPN" videos you'll find a firewall rule that starts routing traffic over a "VPN" out as soon as that interface exists.
As an interface (VPN) is created, unbound gets restarted. The (floating ?) firewall rule get active, and now all DNS goes over VPN instead of the default WAN.pfSEnse is not using your ISP DNS servers.
Way back, in the past, our ISP routers were forwarding DNS requests to the ISP DNS. Just to gain some time, and later on they invented 'commercial reasons" to do so.
That's all finished now.
pfSense (unbound) use these https://en.wikipedia.org/wiki/Root_name_server to resolve domain names. -
@gertjan
Hi, it seems to work when i changed the "Nerwork Interface" to ALL
Not like you suggested :)I have now reboot my pfs 5 times and everytime the DNS Resover worked like it should for me.
-
@ciscox
I wasn't suggestion anything about "Network interfaces" as you didn't show that setting (see your image above).
"WAN" as a selected outgoing interface should work.
"All" is best, and for that reason the default setting. -
@gertjan
Yeah, i know my bad :( But i had "WAN" selected in the Network Interface, after chancing that one, didn't had any problems with DNS Resolver. I have no idea of why. But it worked.
I tried "All" in the "outgoing network interface" but didn't seems to work, so that's why i tried "All" in Network Interface instead :)
I like to thank you for pointing me to the right direction :) -
Finally found the time to dig into this again, and have a workaround to the original problem.
The cron package doesn't actually use crontab (I assume it's a PHP-based cron-like implementation). This means, the @reboot syntax wasn't working as I expected. I thought I'd tested that successfully, but either something changed between 2.4 -> 2.5, or I'm an idiot and it never worked. I'm betting on the latter in this case.
To get around this, I had to log in via console to manually install a cron job. This just calls a simple script that waits 30 seconds for everything to finalize after reboot, then restarts unbound (ensuring the devices are initialized when unbound restarts).
I created the script at
/usr/home/unbound_restart.sh
. You also need to make sure the script has executable permissions (via console,chmod +x /usr/home/unbound_restart.sh
)Here's the script. You can remove the poor-man's logging if you want. I was using it to diagnose some of the above issues, and figured it's worth keeping around to double check after an upgrade (just to make sure the crontab isn't cleared or something).
#!/bin/sh # IMPORTANT: This must be manually installed into the root crontab via terminal. # The GUI interface appears to use a PHP based version of cron, which can't # support @reboot. Add this line to the root crontab using `crontab -e` # # @reboot /usr/home/unbound_restart.sh echo "$(date +%T) Sleeping for 30 seconds" >> /usr/home/restart.log sleep 30 echo "$(date +%T) Restarting unbound" >> /usr/home/restart.log /usr/local/sbin/pfSsh.php playback svc restart unbound # This also works if you're using the service monitor. It'll just be slower # as the monitor may not notice the service is down for a minute #/usr/bin/killall -9 unbound
Bit of a hack, but gets the job done. Given how long this issue has persisted in pfSense, I don't expect a proper solution anytime soon.
-
@josh-hall said in DNS Resolver (unbound) fails after reboot unless manually restarted:
The cron package doesn't actually use crontab
Look again ;)
The cron package maintains (== creates) the system file /etc/crontab.
PHP is used to create the "config file" (pfSense, the GUI, is mostly a huge FreeBSD + FreeBSD processes config file editior ;) )
Btw : what about this option :
Do not use cron, but install the package
Btw : why creating something in /home/ ?
You login using SSH ( or console if you have to ) using admin, which has root rights. So, put everything you make yourself over there.
Like/root/unbound_restart.sh
and
chmod +x /root/unbound_restart.sh
What about this solution :
Install this package.Now you have a new option in the Services menu.
Choose type "Shellcmd" and point it to your /root/unbound_restart.sh script file.@boot, this will get executed.
I'm using the Shellcmd package myself :
As you can see, the Patches package is already adding a line for itself.
This way, patches get checked when the system boots.I create a "socket" for FreeRadius so I can ready FreeRadius statistics. The socket is placed in the package folder, so it will get wiped on any Freeradius package update.
I map the connected keyboard to the correct language - for some reasons there are only French keyboards here around me.