DNS Resolver (unbound) fails after reboot unless manually restarted
-
@ajbrown said in DNS Resolver (unbound) fails after reboot unless manually restarted:
Bumping this, as I'm having the same issue. Manual restart is needed for DNS resolver after any reboot.
As you are using a VPN as a WAN, it might be possible that the VPN isn't yet "UP" when unbound is started. So it fails to start.
@CiscoX Try this : use the default :
for "Outgoing Network Interfaces".
-
Thanks, I'm going to try it out and report back after next reboot :)
-
@gertjan Doesn't this approach lead to DNS leaks as unbound is no longer constrained to specific network interfaces for outbound communication? Kinda defeats the point if I don't want my ISP selling my behavioral data.
Wouldn't a better approach be for unbound to simply retry the connection after a delay until unbound is able to find the outgoing network interfaces are UP? This way it can bind to the interface rather than failing silently. Doesn't need to be that aggressive even. Retry every 15s over a period of 5m would nearly always guarantee success. If it's still failed, then unbound should simply shutdown because all of the network interfaces weren't UP.
In fact, it's the silent failure that bothers me the most. Why would unbound appear to be started and running if the outgoing network interface wasn't bound correctly? I know it's called unbound, but that's supposed to be tongue-in-cheek, not a design decision.
I'm most likely missing something in the underlying decision process for unbound, but this doesn't make sense to me. Maybe somebody here can explain what I'm missing.
-
If it fails to start , I'd just put unbound in the service watchdog.
That way SW would keep trying to start it.I have unbound and all my VPN severs in SW.
I have seen (not often) unbound crash , and the i'm saved by SW restarting it wo. intervention.
/Bingo
-
@bingo600 It doesn't fail to start. It fails to start correctly on the initial boot. I've got unbound on the service watchdog, which works as expected if the network goes down while everything is running (this causes the openVPN to restart, which sometimes causes unbound to crash, then restart).
But, this isn't the pattern on initial boot. Unbound starts, but fails to attach to the outgoing networks that aren't up yet. With the service reportedly running, service watchdog can't do anything. However, at this point the service isn't running correctly. It's not bound to the outgoing networks (and doesn't attach once those networks are available).
Basically, unbound fails silently (and doesn't technically crash) in this situation. Hence all of the above observations and points I've made.
-
Ahh .. sorry i missed that info.
Could this work ?
https://phoenixnap.com/kb/crontab-rebootI mean ie. at ?? 2min after reboot "Restart unbound"
I have had a similar "Boot situation" with a "Raspberry" using secure DNS.
When the Raspi boots the clock isn't set , and it can't use Secure DNS , cert errors. It couldn't set time from NTP as DNS wouldn't work .....
So a catch22 , i ended up giving it an external RTC ... -
@bingo600 I tried that, but my bash skills are admittedly limited.
In cron
/usr/home/unbound_restart.sh
#!/bin/sh sleep 120 /usr/local/sbin/pfSsh.php playback svc restart unbound
Last time I dug into this, this solution did not work. I'm not certain if I'm triggering the unbound restart too quickly (i.e., before PHP is fully loaded or something), if the CLI doesn't fully restart the unbound service when in this odd state (but the GUI does), or if there's a better way to restart unbound from the CLI when calling from cron (I assume everything needed for PHP is in the path in this situation, but it might be a path dependency missing as well).
In short, I wasn't finding anything useful in the logs, and debug logs are an epic pain to read through... so I walked away before the screaming got too bad :)
Any other ideas or approaches are very welcome. I'll probably look at this again in a few weeks when I've got time.
-
@josh-hall
Well you could "just" kill unbound , and let SW start it again. -
@bingo600 That's a very good (and obvious) point I completely overlooked. I'll try that next time. Thanks!
-
@josh-hall said in DNS Resolver (unbound) fails after reboot unless manually restarted:
Doesn't this approach lead to DNS leaks as unbound is no longer constrained to specific network interfaces for outbound communication? Kinda defeats the point if I don't want my ISP selling my behavioral data.
Binding unbound to "All" interfaces doesn't mean it starts to look for "main root server" on one of your LAN's. pfSense knows that "198.41.0.4" or "a.root-servers.net" isb't reacable on LAN. As LAN exposes a route to "192.168.1.0/24".
True, if you have a working WAN at first - and afterwards a VPN connection comes up - as pfSense is using its VPN client to replace the WAN for all (or a part of) the traffic, then you should take care of that situation.
In one of the Netgate "OpenVPN" videos you'll find a firewall rule that starts routing traffic over a "VPN" out as soon as that interface exists.
As an interface (VPN) is created, unbound gets restarted. The (floating ?) firewall rule get active, and now all DNS goes over VPN instead of the default WAN.pfSEnse is not using your ISP DNS servers.
Way back, in the past, our ISP routers were forwarding DNS requests to the ISP DNS. Just to gain some time, and later on they invented 'commercial reasons" to do so.
That's all finished now.
pfSense (unbound) use these https://en.wikipedia.org/wiki/Root_name_server to resolve domain names. -
@gertjan
Hi, it seems to work when i changed the "Nerwork Interface" to ALL
Not like you suggested :)I have now reboot my pfs 5 times and everytime the DNS Resover worked like it should for me.
-
@ciscox
I wasn't suggestion anything about "Network interfaces" as you didn't show that setting (see your image above).
"WAN" as a selected outgoing interface should work.
"All" is best, and for that reason the default setting. -
@gertjan
Yeah, i know my bad :( But i had "WAN" selected in the Network Interface, after chancing that one, didn't had any problems with DNS Resolver. I have no idea of why. But it worked.
I tried "All" in the "outgoing network interface" but didn't seems to work, so that's why i tried "All" in Network Interface instead :)
I like to thank you for pointing me to the right direction :) -
Finally found the time to dig into this again, and have a workaround to the original problem.
The cron package doesn't actually use crontab (I assume it's a PHP-based cron-like implementation). This means, the @reboot syntax wasn't working as I expected. I thought I'd tested that successfully, but either something changed between 2.4 -> 2.5, or I'm an idiot and it never worked. I'm betting on the latter in this case.
To get around this, I had to log in via console to manually install a cron job. This just calls a simple script that waits 30 seconds for everything to finalize after reboot, then restarts unbound (ensuring the devices are initialized when unbound restarts).
I created the script at
/usr/home/unbound_restart.sh
. You also need to make sure the script has executable permissions (via console,chmod +x /usr/home/unbound_restart.sh
)Here's the script. You can remove the poor-man's logging if you want. I was using it to diagnose some of the above issues, and figured it's worth keeping around to double check after an upgrade (just to make sure the crontab isn't cleared or something).
#!/bin/sh # IMPORTANT: This must be manually installed into the root crontab via terminal. # The GUI interface appears to use a PHP based version of cron, which can't # support @reboot. Add this line to the root crontab using `crontab -e` # # @reboot /usr/home/unbound_restart.sh echo "$(date +%T) Sleeping for 30 seconds" >> /usr/home/restart.log sleep 30 echo "$(date +%T) Restarting unbound" >> /usr/home/restart.log /usr/local/sbin/pfSsh.php playback svc restart unbound # This also works if you're using the service monitor. It'll just be slower # as the monitor may not notice the service is down for a minute #/usr/bin/killall -9 unbound
Bit of a hack, but gets the job done. Given how long this issue has persisted in pfSense, I don't expect a proper solution anytime soon.
-
@josh-hall said in DNS Resolver (unbound) fails after reboot unless manually restarted:
The cron package doesn't actually use crontab
Look again ;)
The cron package maintains (== creates) the system file /etc/crontab.
PHP is used to create the "config file" (pfSense, the GUI, is mostly a huge FreeBSD + FreeBSD processes config file editior ;) )
Btw : what about this option :
Do not use cron, but install the package
Btw : why creating something in /home/ ?
You login using SSH ( or console if you have to ) using admin, which has root rights. So, put everything you make yourself over there.
Like/root/unbound_restart.sh
and
chmod +x /root/unbound_restart.sh
What about this solution :
Install this package.Now you have a new option in the Services menu.
Choose type "Shellcmd" and point it to your /root/unbound_restart.sh script file.@boot, this will get executed.
I'm using the Shellcmd package myself :
As you can see, the Patches package is already adding a line for itself.
This way, patches get checked when the system boots.I create a "socket" for FreeRadius so I can ready FreeRadius statistics. The socket is placed in the package folder, so it will get wiped on any Freeradius package update.
I map the connected keyboard to the correct language - for some reasons there are only French keyboards here around me.
-
Added my plea in https://redmine.pfsense.org/issues/13707.
-
Hi,
I now have an SG-2100 with 23.05.1 for the same setup and still the same problem.
Unbound fails to start as I have OpenVPNs as Outgoing Network Interfaces.
Still trying to get attention at https://redmine.pfsense.org/issues/13707. -
Now testing the SG-2100 with 23.05.1 for the similar setup but with multiple Wireguards instead of multiple OpenVPNs.
Unbound starts correctly.
I am guessing that Wireguard is faster than OpenVPN starting at boot.
Thanks again.