Unbound Resolver starts returning SERVFAIL after resolving certain hostnames


  • Netgate

    Coming from here: https://forum.pfsense.org/index.php?topic=87491.msg488407#msg488407  Credit and apologies to those over there for isolating this way to reproduce.

    New thread since this looks to me like a different issue from whatever's going on with DNS server hijacking.

    I am running Unbound in Resolver mode with DNSSEC enabled.  I can routinely tickle this by asking unbound to resolve:

    
    ns3.csof.net
    and/or
    api-nyc01.exip.org
    
    Note that that exip.org hostname has csof name servers.
    
    ns3.csof.net.		600	IN	A	195.22.26.199
    api-nyc01.exip.org.	 10	IN	A	195.22.26.248
    
    

    Note that both of those are in a known hostile netblock.

    Anyway, my resolver has been running fine for days.  No problems until I asked it to resolve those two hostnames.  After doing so, apparently random domains start being returned as SERVFAIL.

    $ dig forum.pfsense.org

    ; <<>> DiG 9.8.3-P1 <<>> forum.pfsense.org
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 30471
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

    ;; QUESTION SECTION:
    ;forum.pfsense.org. IN A

    ;; Query time: 1781 msec
    ;; SERVER: 192.168.223.1#53(192.168.223.1)
    ;; WHEN: Mon Feb  9 17:46:41 2015
    ;; MSG SIZE  rcvd: 35

    There's one example.  This happens until unbound is restarted.  I did this a couple times.  Last one on unbound log level 5.  Haven't really looked at the logs yet.


  • Netgate

    Without DNSSEC enabled, All I had to do was query these two domain names and then I got this:

    gridbug:etc cjl$ dig www.pfsense.org

    ; <<>> DiG 9.8.3-P1 <<>> www.pfsense.org
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51593
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0

    ;; QUESTION SECTION:
    ;www.pfsense.org. IN A

    ;; ANSWER SECTION:
    www.pfsense.org. 10 IN A 195.22.26.248

    ;; AUTHORITY SECTION:
    org. 172779 IN NS ns1.csof.net.
    org. 172779 IN NS ns2.csof.net.
    org. 172779 IN NS ns3.csof.net.
    org. 172779 IN NS ns4.csof.net.

    ;; Query time: 159 msec
    ;; SERVER: 192.168.223.1#53(192.168.223.1)
    ;; WHEN: Mon Feb  9 18:48:39 2015
    ;; MSG SIZE  rcvd: 129

    This looks bad.



  • Do you have "harden glue" enabled on the Advanced tab of Unbound? If not, is it still replicable with that enabled?



  • @cmb:

    Do you have "harden glue" enabled on the Advanced tab of Unbound? If not, is it still replicable with that enabled?

    I had experienced the same issue as Derelict, and was able to replicate it in the same way.  I did not have 'harden glue' enabled.  After doing so, I have not been able to replicate the issue!

    Should the default setting for harden-glue be enabled?  The documentation for unbound suggests yes (https://www.unbound.net/documentation/unbound.conf.html, but it was definitely not enabled by default on my system.



  • My settings include…

    In Services: DNS Resolver: Advanced

    Harden Glue

    Harden DNSSEC data

    Unwanted Reply Threshold (10 million)

    Prefetch Support

    Prefetch DNS Key Support

    All those on - I had asked about 10x if those might be recommended without an answer.  After trying them for couple weeks, I'd say "Yes" - Definitely

    DNSSEC is on also and its not in forwarder mode.  Anyway - I'd recommend trying with these settings.

    Be sure to reboot everything and clear DNS Cache on all clients after.


  • Netgate

    Harden Glue appears to correct this, but that's pretty anecdotal.



  • Judging by the DNS traffic I captured when replicating that, harden glue should fix. I changed the default in new configs to enable, and we'll add config upgrade code so anyone who doesn't already have it enabled will have that changed upon upgrade to 2.2.1.



  • Not so much anecdotal.

    People are poisoning your cache either with malicious DNS records or with man-on-the-side attacks or both.

    Those settings are to prevent such things.  Although, IMHO DNS protocol is a broken piece of crap and needs to be replaced with something that both encrypts and authenticates.

    I'm sure that would introduce some latency, but my god…  Its ridiculous.  current DNS is about as secure as ftp and equally in need of being phased.


  • Banned

    @Derelict:

    Harden Glue appears to correct this, but that's pretty anecdotal.

    Never could reproduce this lolcal issue… I have harden-glue: yes enabled everywhere. So, sounds like a pretty good guess I'd say.

    @cmb: Can we get harden-referral-path exposed in the GUI as well? (Probably not default on, but visible.) Also, harden-below-nxdomain.


 

© Copyright 2002 - 2018 Rubicon Communications, LLC | Privacy Policy