[Solved] Unbound fails on restart after pfBlockerNG updates
-
That error if from unbound itself. Are you using the service watchdog program to restart the resolver? If so, that could cause issues.
Similar thread here:
https://forum.pfsense.org/index.php?topic=123554.0I don't think it's a package related issue, but something with Unbound itself.
-
I am not using to restarting resolver. It is pfBlockerNG that does the reload which seems to fail. Below are the logs of the cron job in pfBlockerNG. Note the: "Reloading Unbound…. Not completed."
The weird thing is that the reload starts (and fails) and: 01:15:33 but DNSBL Update Competed message comes a minute later at 01:16:48.If I restart the process manually by stopping/starting unbound, everything works fine.
pfBlockerNG Cron Logs
===[ DNSBL Process ]================================================ [ EasyList ] Downloading update .. 200 OK. Whitelist: 10fbb07a4b0.se|32d1d3b9c.se|82d914.se|adaction.se|adrotate.se|ccebba93.se|contexlink.se|d1110e4.se|emediate.se| ---------------------------------------------------------------------- Orig. Unique # Dups # White # Alexa Final ---------------------------------------------------------------------- 7312 6970 0 9 0 6961 ---------------------------------------------------------------------- [ EasyPrivacy ] Downloading update [ 02/18/17 01:15:27 ] .. 200 OK. Whitelist: clicksen.se|easyresearch.se|getstatistics.se|ip2location.com|optimizely.com|publish-int.se|research-int.se|vastpaketet.se|webstat.se|webtraffic.se| ---------------------------------------------------------------------- Orig. Unique # Dups # White # Alexa Final ---------------------------------------------------------------------- 2759 2738 21 10 0 2707 ---------------------------------------------------------------------- [ Malware_Corpus_Tracker ] Downloading update [ 02/18/17 01:15:28 ] .. 200 OK Remote timestamp missing . ---------------------------------------------------------------------- Orig. Unique # Dups # White # Alexa Final ---------------------------------------------------------------------- 140 120 0 0 0 120 ---------------------------------------------------------------------- [ MalwareDomainList ] exists. [ 02/18/17 01:15:32 ] ------------------------------------------ Assembling database... completed Validating database... completed Reloading Unbound.... Not completed. DNSBL update [ 10941 | PASSED ]... completed [ 02/18/17 01:16:48 ] ------------------------------------------
System logs:
Feb 18 01:17:45 php [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Feb 18 01:15:25 php [pfBlockerNG] Starting cron process.
-
As per the link I posted above.
If you run this command from the pfSense shell, does it complete properly?Try the following command from the shell:
/usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf status
and also try this command:
/usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf reload
Are you using DHCP Registration in Unbound, or IPv6? Maybe post some screenshots of the unbound settings.
There is some issue in the pfSense unbound code that is causing these corner-case issues…
-
You are right. It is something with unbound and my setup.
If I run the "status" command it runs normallyversion: 1.5.9 verbosity: 1 threads: 4 modules: 2 [ validator iterator ] uptime: 48303 seconds options: control(ssl) unbound (pid 47565) is running...
Then I run "reload" and it returns "ok" immediately. If I run another "reload" shortly thereafter I get:
[1487451259] unbound-control[22739:0] error: connect: Operation timed out for 127.0.0.1
and in the unbound logs I see: ```
fatal error: Could not read config file: /unbound.confI will modify pfblockerng.inc file as to stop/start service as suggested by the thread you referenced.
-
Could you check a few things out for me please?
Does this folder exist in your box? ls -lah /var/tmp
and does this file exist? ls -lah /var/tmp/unbound_cacheSo basically it runs thru these steps to reload the new DNSBL changes into Unbound…
If Unbound is running:/usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache > /var/tmp/unbound_cache /usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf reload /usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf load_cache < /var/tmp/unbound_cache /usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf status
Its really strange that it gives that error on multiple reload attempts…
Could you check to see if the /var/unbound/unbound.conf file is changed after each "Reload"?
or PM me a copy of the unbound.conf before each reload attempt…Another thought is that the dump/load cache might have some issue... But can't tell until you run these commands and see if you can find another clue...
And is you pause for a second or two, do you still get the error?
Thanks!
-
Could you check a few things out for me please?
Does this folder exist in your box? ls -lah /var/tmp
and does this file exist? ls -lah /var/tmp/unbound_cacheYes. The file is there with correct timestamp:
ls -al /var/tmp/unbound_cache -rw-r--r-- 1 root wheel 509698 Feb 19 01:15 /var/tmp/unbound_cache
So basically it runs thru these steps to reload the new DNSBL changes into Unbound…
If Unbound is running:/usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache > /var/tmp/unbound_cache /usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf reload /usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf load_cache < /var/tmp/unbound_cache /usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf status
Running commands in sequence gives following:
[2.3.2-RELEASE][root@fortet.localdomain]/var/log: /usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache > /var/tmp/unbound_cache [2.3.2-RELEASE][root@fortet.localdomain]/var/log: /usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf reload ok [2.3.2-RELEASE][root@fortet.localdomain]/var/log: /usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf load_cache < /var/tmp/unbound_cache error: SSL handshake failed [2.3.2-RELEASE][root@fortet.localdomain]/var/log: /usr/sbin/chroot -u unbound -g unbound / /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf status [1487493495] unbound-control[69667:0] error: connect: Operation timed out for 127.0.0.1
The unbound crashes on "reload" command, so the "load_cache" and "status" fail. During the reload I could see the: "fatal error: Could not read config file: /unbound.conf" in resolver log file.
Its really strange that it gives that error on multiple reload attempts…
Based on the latest tests, my explanation is that it crashes on the first reload and the second time when I run the reload it has a long timeout before it returns error.
Could you check to see if the /var/unbound/unbound.conf file is changed after each "Reload"?
or PM me a copy of the unbound.conf before each reload attempt…Strangely enough the timestamp on the unbound.conf is changed, so the file is touched on reload. I copied the conf file before reload and run diff which gave me no changes.
Another thought is that the dump/load cache might have some issue… But can't tell until you run these commands and see if you can find another clue...
Based on the above, there is something in my setup that causes unbound to crash on "reload" command. I modified the pfblockerng.inc file and replaced:
exec("{$chroot_cmd} reload 2>&1", $result, $retval); pfb_logger('.', 1);
with:
exec("{$chroot_cmd} stop"); sleep(1); pfb_logger('.', 1); mwexec("/usr/local/sbin/unbound -c {$g['unbound_chroot_path']}/unbound.conf 2>&1", $result, $retval); pfb_logger('.', 1);
This procedure worked and didn't cause unbound to crash.
Since this is clearly caused by unbound, may I suggest that you add a flag in pfBlockerNG with option to replace reload with stop/start for people experiencing this problem?
-
My problem is now solved after following advice from BBcan177.
It turns out that all includes in unbound.conf options need to point to the same folder: /var/unbound.I had following in my configuration:
My domain overrides
server:include: /usr/local/etc/unbound/se-servers.conf
All I needed to do is to change the /usr/local/etc/unbound to /var/unbound, and unbound stopped crashing on "reload".
My domain overrides
server:include: /var/unbound//se-servers.conf
Thank you very much BBcan177 for this great package!
-
Sorry, but i don't understand the solution.
My Logs are full of:
Mar 1 17:56:37 unbound 30737:0 info: start of service (unbound 1.6.0). Mar 1 17:56:33 unbound 30737:0 error: cannot chdir to directory: (No such file or directory) Mar 1 17:56:33 unbound 30737:0 info: service stopped (unbound 1.6.0). Mar 1 17:50:18 unbound 30737:0 info: start of service (unbound 1.6.0). Mar 1 17:50:14 unbound 30737:0 error: cannot chdir to directory: (No such file or directory) Mar 1 17:50:14 unbound 30737:0 info: service stopped (unbound 1.6.0). Mar 1 17:46:30 unbound 30737:0 info: start of service (unbound 1.6.0). Mar 1 17:46:25 unbound 30737:0 error: cannot chdir to directory: (No such file or directory) Mar 1 17:46:25 unbound 30737:0 info: service stopped (unbound 1.6.0).
and very often in the morning i cannot resolve any dns, because the unbound did not startup.
-
@renegade: In my particular case I had my own domain overrides imported under DNS Resolver/Custom options. The file imported was located in /usr/local/etc/unbound/ folder. The unbound was crashing each time I would issue reload command and solution was to move my files to /var/unbound folder.
Please bear in mind that the error message I was getting was different than yours:
fatal error: Could not read config file: /unbound.conf
You can try running commands listed by BBcan177 and in his post: https://forum.pfsense.org/index.php?topic=125773.msg694881#msg694881 and see if it provides more clues. After running commands check unbound log and dmesg.
You can also force restart of unblocker which worked for me (before my problem was solved). For details about that read the end my post where I describe what needs to be modified in pfblockerng.inc file: https://forum.pfsense.org/index.php?topic=125773.msg694915#msg694915
Good luck!
-
Unbound is chrooted under /var/unbound in pfSense and that means all configuration files and files referenced by the configuration files must be placed under /var/unbound.
-
tried the commands but there was no error.
this morning i had to start unbound again manually. after the 24h reconnect. -
Should be fixed at some point https://redmine.pfsense.org/issues/7326
-
I removed many feeds from DNSBL and it seems to work stable… until 2.3.3p1 came. the last two mornings same effect. I could not resolve any ip adress until i restarted bind manually by logging in and restarting it.
-
I have posted a patch in redmine for this issue:
https://redmine.pfsense.org/issues/7326