Unbound stopped and won't start



  • Well - unbound quit today.  Pfsense is fine.  Nothing changed on the network. 
    It has been unchanged for months and today, the service just crashed and no matter what it won't start and stay started.

    Switched back to DNS forwarder.



  • Split this to its own topic since it had nothing to do with the thread it was posted in.

    What comes up in your resolver log when it tries to start?



  • OK - I reverted the setup….  Here is what it says:

    Oct 16 08:24:01 unbound: [85316:0] fatal error: failed to setup modules
    Oct 16 08:24:01 unbound: [85316:0] error: module init for module validator failed
    Oct 16 08:24:01 unbound: [85316:0] error: validator: could not apply configuration settings.
    Oct 16 08:24:01 unbound: [85316:0] error: validator: error in trustanchors config
    Oct 16 08:24:01 unbound: [85316:0] error: error reading auto-trust-anchor-file: /var/unbound/root.key
    Oct 16 08:24:01 unbound: [85316:0] error: failed to read /root.key
    Oct 16 08:24:01 unbound: [85316:0] error: failed to load trust anchor from /root.key at line 1, skipping
    Oct 16 08:24:01
    Oct 16 08:24:01 unbound: [85316:0] notice: init module 0: validator
    Oct 16 08:23:54 unbound: [58658:0] fatal error: failed to setup modules
    Oct 16 08:23:54 unbound: [58658:0] error: module init for module validator failed
    Oct 16 08:23:54 unbound: [58658:0] error: validator: could not apply configuration settings.
    Oct 16 08:23:54 unbound: [58658:0] error: validator: error in trustanchors config
    Oct 16 08:23:54 unbound: [58658:0] error: error reading auto-trust-anchor-file: /var/unbound/root.key
    Oct 16 08:23:54 unbound: [58658:0] error: failed to read /root.key
    Oct 16 08:23:54 unbound: [58658:0] error: failed to load trust anchor from /root.key at line 1, skipping
    Oct 16 08:23:54
    Oct 16 08:23:54 unbound: [58658:0] notice: init module 0: validator



  • errrrr….  Found the answer here.

    https://forum.pfsense.org/index.php?topic=87357.0

    However, the idea that anything was corrupted by an upgrade seems unlikely since I didn't do any upgrades recently.

    It simply broke without having been touched.  No Idea why.

    I was able to fix it but still would feel better if I knew why it broke to begin with.



  • You make note of the contents of root.key before deleting it?



  • I'm sorry - I didn't.

    The nearest reason I can guess this may have happened is a write to the file interrupted by power flicker/outage.

    The ups is currently needing a battery swap.



  • Was hoping to get a lead on the root cause there. Seems it's happened to roughly a half dozen people, but none have reported what the contents of root.key were before deleting it.

    If you happen to see it again (seems unlikely), or anyone else that happens upon this thread in the future seeing it, please note the contents of the file. Diag>Command, run:

    cat /var/unbound/root.key
    

    Or download /var/unbound/root.key from same page or via scp. The contents should be text, so cat should suffice.


  • Banned

    @cmb:

    but none have reported what the contents of root.key were before deleting it.

    You mean like this one? https://forum.pfsense.org/index.php?topic=87357.msg479617#msg479617 - there's some inetd nonsense in there.



  • CMB - I have a few of these running here and there so if it happens again I will take a look at whats inside the file before I send it to bit heaven.

    doktornotor - Yep - Thats exactly the errors it was throwing out.



  • Thanks, I looked through all those threads and missed the contents in that one. Looks like it's ending up with contents of other files in /var/ which would indicate the file wasn't fsynced by unbound after being written out. unbound-anchor also fails to create/update root.key if has invalid contents.

    Should be fixed.
    https://redmine.pfsense.org/issues/5334

    Also reporting upstream to unbound as it should be doing that fsync and doesn't appear to be.



  • That is cool.  I like fixed things (-:



  • Unbound fixed the missing fsync for a future release.
    https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=712

    The fsync I added should fix it in the mean time.



  • Even better…

    So I guess my bad batteries were less of a curse than I thought.

    Now I need to replace them...  From 8k miles away...


  • Banned

    @cmb:

    Should be fixed.
    https://redmine.pfsense.org/issues/5334

    Cannot reproduce the original issue (ZFS on the test rigs doesn't seem to suffer from any of similar "features") but intentionally screwing the anchors file gets recovered just fine now…


Log in to reply