unbound services crashes with certificate/file not found error



  • Hi,

    I have been running pfsense for about a year and am mostly very happy. However there is one stability issue in the unbound DNS resolver which I have been unable to fix so far and has resulted in network outages (i.e. internal DNS server down a couple of times)

    The build I am using is 2.4.4 Community but the issue has been around also in the two previous releases. I am not sure how to reproduce the issue

    What happens:
    Once in a while the unbound service crashes suddenlz and there is an error about a missing certificate file in /var/unbound/test/unbound_server.pem.
    The file is actually present in the directory above (/var/unbound/). But also even if I copy the certificate file manually into the test directory it actually gets deleted once I try to restart the service.

    Doing a full reboot of the firewall seem to usually fix the issue. However this is obviously not a great way of doing things as DNS is a rather important service in the local network. Also, service watchdog does not help in this matter because something breaks in the directory structure which can apparently only fixed through a full OS reboot.

    Can you maybe share any hints on this issue or how to work around it?

    0_1549890189826_pfsense_issue.png

    Thanks for the help!


  • LAYER 8 Global Moderator

    I take it that entry is in the

    # SSL Configuration
    

    Section... I just checked my conf, I have no such entry anywhere in the conf. And from what you posted where respond to incoming ssl/tls is unchecked, I would think you shouldn't have such an entry which is why your conf check is failing.

    I looked by hand, and then did a grep for it even...

    2.4.4-RELEASE][root@sg4860.local.lan]/var/unbound: cat unbound.conf | grep server-cert-file
    [2.4.4-RELEASE][root@sg4860.local.lan]/var/unbound: cat unbound.conf | grep auto-trust-anchor-file
    auto-trust-anchor-file: /var/unbound/root.key
    [2.4.4-RELEASE][root@sg4860.local.lan]/var/unbound: 
    

    The second grep for the auto-trust was just to validate that I could find such an entry with grep that I knew was in there, etc.

    Did at some point you try doing that? Are you in some sort of carp setup where that might be getting synced? I would suggest you download your unbound xml section and make sure that is not in your config even though the gui is saying it shouldn't be in there.

    edit: If I enable ssl listen, I then get this entry

    0_1549892959979_ssllisten.png

    # SSL Configuration
    ssl-port: 853
    ssl-service-pem: "/var/unbound/sslcert.crt"
    ssl-service-key: "/var/unbound/sslcert.key"
    

    So yeah something is off with your unbound conf for sure.. A search for that error comes up with
    https://forum.netgate.com/topic/125789/solved-dns-resolver-unbound-unable-to-start

    Maybe something when wrong with your update(s) to 2.4.4? Your really should be on 2.4.4p2 - I would suggest you update to current to see if that corrects the problem, etc. You mention previous releases.. You prob want to do clean install of 2.4.4p1 and then update to p2



  • Hi johnpoz,

    thanks for the detailed reply!

    I ran the commands you showed and also had a look at the unbound.conf: It looks identically to what you posted.

    There is no sync/replication/fall-over in place, its just a single device. Also I was a bit unclear in my original post: this system is actually running 2.4.4 (2.4.4-RELEASE-p2).

    The only non-standard thing I have in place is a DNS over TLS forwarding to some external servers and pfblockerNG.

    One thing I could try is to disable the internal DNS over TLS as I am not really using it ...

    Actually I had the same issue again today and the unbound service went down without any configuration change. Tried to bound the service but only a full restart of the system resolved the issue

    0_1549979460624_error_log.png

    1_1549979460624_error_log_2.png

    Because of the timed nature of the issue I suspect its a problem related to pfblockerNG list refreshes...


  • LAYER 8 Global Moderator

    @hn2323 said in unbound services crashes with certificate/file not found error:

    pfblockerNG list refreshes...

    Yeah that would be my guess as well.. While pfblocker can do some really kewl stuff.. It can also throw a wrench into the workings of unbound to be honest.. I would suggest you work with @BBcan177 to help determine if pfblocker could be indeed root of your issues.

    It has a vast array of configuration options and features that could in specific setups cause issues.. I sent a shout out to BBcan177 he is normally very responsive... But few days ago he had mentioned he was on vacation in another thread ;)

    I only ever fire it up for a specific testing of something that might come up in a thread I am interested in, I don't normally have it running so not going to be much help in validation if that is the problem or could be or not, etc.