Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unbound stops resolving when Domain Overrides DNS not answering

    Scheduled Pinned Locked Moved DHCP and DNS
    23 Posts 7 Posters 4.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • johnpozJ
      johnpoz LAYER 8 Global Moderator
      last edited by johnpoz

      Please come back when you have some more info, and make sure you check the infra_cache, when something not working from this forwarded domain.. Also what is currently cached for that domain as well, etc.

      You can almost always tell when something is returned from cache, because you will normally see a less that round number for the ttl on the returned info..

      If you dig for it and you get back say 3600, good bet it was resolved - vs say if you get back ttl of 1481 or something - yeah that more than likely was served from cache ;)

      With your domain override your forwarding to the authoritative ns for that domain, so it will return the full ttl vs something its cache, etc. unlike when you forward to say some public resolver like googledns or quad9, etc.

      An intelligent man is sometimes forced to be drunk to spend time with his fools
      If you get confused: Listen to the Music Play
      Please don't Chat/PM me for help, unless mod related
      SG-4860 24.11 | Lab VMs 2.7.2, 24.11

      1 Reply Last reply Reply Quote 0
      • iorxI
        iorx
        last edited by

        Will do. And thanks for the troubleshooting tips. Valued as I'm not that experienced on the subject.

        Just a thought. Is interface down/up event different for Unbound/pfSense when bringing the tunnel down/up manually or when connection is lost (which causes a OpenVPN reconnect)?
        Trying to figure out why my test didn't showed the result I was expecting.

        1 Reply Last reply Reply Quote 0
        • johnpozJ
          johnpoz LAYER 8 Global Moderator
          last edited by

          Yeah a interface down going to be different than just loss of connection.. Any way you can pull the plug on the wire or anything.. Or simulate from the other end by killing the openvpn server or something..

          I would change your outbound interface on unbound to the loopback, this should get around any sort of binding issues with interfaces like a vpn one, etc.

          An intelligent man is sometimes forced to be drunk to spend time with his fools
          If you get confused: Listen to the Music Play
          Please don't Chat/PM me for help, unless mod related
          SG-4860 24.11 | Lab VMs 2.7.2, 24.11

          1 Reply Last reply Reply Quote 0
          • J
            John41
            last edited by

            I am running 2.4.4 and have what appears to be a similar problem. This is over an ipssec tunnel. It has been this way for many versions of pfSense. When the dns server used for forwarding goes down (probably beyond the timeout mentioned above) forwarding stops. I haven't worked through the debugging steps in this thread. However, in "DNS Resolver General Settings" if I add Localhost to Outgoing Network Interfaces the forwarding name resolution does not happen at all.

            Still investigating...

            1 Reply Last reply Reply Quote 0
            • DerelictD
              Derelict LAYER 8 Netgate
              last edited by

              That is because sourcing traffic from the firewall can be problematic over VPNs. It can be done but you might have to make some changes. For instance, selecting an outgoing interface that makes the source traffic be interesting to IPsec (matches the traffic selector(s)) would probably fix your problem. This hack might also work:

              https://docs.netgate.com/pfsense/en/latest/vpn/ipsec/accessing-firewall-services-over-ipsec-vpns.html

              If vital infrastructure is necessary for that site to function it might be prudent to add redundancy and move it off the firewall. You could, for instance, run an authoritative slave DNS server (can you still say slave DNS server?) at that site that local users query. That way they could get work done even if the VPN was down for some reason.

              Chattanooga, Tennessee, USA
              A comprehensive network diagram is worth 10,000 words and 15 conference calls.
              DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
              Do Not Chat For Help! NO_WAN_EGRESS(TM)

              1 Reply Last reply Reply Quote 0
              • J
                John41
                last edited by

                I will take a look at those options.

                As you propose I have been thinking of running a DNS server so I can be secondary for the zone I am currently forwarding to. This is not a critical application so in my case might not be worth the overhead.

                Thanks,

                John

                1 Reply Last reply Reply Quote 0
                • iorxI
                  iorx
                  last edited by

                  Hi again!

                  Now I'm experiencing this with 2.4.5p1. Newly installed.
                  IPsec to main office.
                  The fix with LAN gateway and route.
                  Domain override in unbound.

                  If connection is lost for a brief moment making unbound timeout it stops resolving for the overridden domain.
                  I believe we came to the conclusion that unbound marks this as unreachable or something and just doesn't bother to ask again.

                  Any new idea on how to make pfsense/unbound not give up so easily? Or if it is possible in a script detect the unbound has "tombstoned" the entries?

                  Switching back to DNS Forwarder a solution maybe?

                  1 Reply Last reply Reply Quote 0
                  • iorxI
                    iorx
                    last edited by

                    No response? This is an issue, how to go about getting some attention for it?

                    bmeeksB 1 Reply Last reply Reply Quote 0
                    • bmeeksB
                      bmeeks @iorx
                      last edited by

                      @iorx said in Unbound stops resolving when Domain Overrides DNS not answering:

                      No response? This is an issue, how to go about getting some attention for it?

                      You can register and submit bug reports on the Redmine site here: https://redmine.pfsense.org/projects/pfsense.

                      Be prepared to fully describe in the report the actual bug and the steps required to reliably recreate the bug.

                      1 Reply Last reply Reply Quote 0
                      • johnpozJ
                        johnpoz LAYER 8 Global Moderator
                        last edited by

                        To your other question you can ask unbound who it would ask for something

                        unbound-control -c /var/unbound/unbound.conf lookup www.example.com
                        

                        It should list your domain override NS, and then info about that NS..

                        You could use the flush_negative command with that to flush all negative data

                        An intelligent man is sometimes forced to be drunk to spend time with his fools
                        If you get confused: Listen to the Music Play
                        Please don't Chat/PM me for help, unless mod related
                        SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                        1 Reply Last reply Reply Quote 0
                        • iorxI
                          iorx
                          last edited by

                          @johnpoz said in Unbound stops resolving when Domain Overrides DNS not answering:

                          unbound-control -c /var/unbound/unbound.conf lookup

                          Nice. I will see if I can find a way to trigger a flush when resolution stops for the overrides (go around the problem until a better solution)

                          For the moment I'm testing to use DNS Forwarder instead, but have experience some weirdness there too. But the Forwarder is "dumb" isn't it? No caching? So maybe last time it stopped working was a related to the IPsec, need to check that further.

                          But unbound I know have this issue. I'll try to create a bug report with reproducible steps to trigger the problem.

                          1 Reply Last reply Reply Quote 0
                          • johnpozJ
                            johnpoz LAYER 8 Global Moderator
                            last edited by johnpoz

                            @iorx said in Unbound stops resolving when Domain Overrides DNS not answering:

                            But the Forwarder is "dumb" isn't it? No caching?

                            Not sure where you would of gotten that idea, it caches. It would really be pretty pointless if it didn't

                            Here I enabled dnsmasq on port 5353 (so I didn't have to turn off unbound), then asked it how big its cache is

                            $ dig @192.168.9.253 -p 5353 +short chaos txt cachesize.bind
                            "10000"
                            

                            As simple way to see if something is cached or not, is look to see how fast it resolves.. If you get an answer in 0 or couple of ms vs how long it would take to forward to where your forwarding and back, it was cached and your answer was returned from cache.

                            You can also ask like the command above what is the hit rate on your cache.

                            $ dig @192.168.9.253 -p 5353 +short chaos txt hits.bind
                            "2"
                            

                            Do a query for something a few times, and then check it again - see the number go up..

                            $ dig @192.168.9.253 -p 5353 +short chaos txt hits.bind
                            "7"
                            

                            You can ask it how many misses its had

                            $ dig @192.168.9.253 -p 5353 +short chaos txt misses.bind
                            "1"
                            

                            Keep in mind I just enabled it 30 seconds ago and have only done query for www.google.com, not actually using it, etc.

                            You can get info for cachesize.bind, insertions.bind, evictions.bind, misses.bind, hits.bind, auth.bind and servers.bind

                            There is a way you can get it to dump its cache to syslog too.. you have to set it to log queries and then

                            -q, --log-queries
                                 Log the results of DNS queries handled by dnsmasq. Enable a full 
                                 cache dump on receipt of SIGUSR1.
                            

                            Unbound is much more robust dns option..

                            Check out the dnsmasq man page for other info
                            https://linux.die.net/man/8/dnsmasq

                            BTW, that is caches is right in its description ;)

                            Name
                            dnsmasq - A lightweight DHCP and caching DNS server. 
                            

                            An intelligent man is sometimes forced to be drunk to spend time with his fools
                            If you get confused: Listen to the Music Play
                            Please don't Chat/PM me for help, unless mod related
                            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                            1 Reply Last reply Reply Quote 0
                            • iorxI
                              iorx
                              last edited by

                              Got the forwarder (dnsmasq) capabilities and function backwards I understand. Didn't read up enough on that, my apologies.
                              Many thanks for the awesome explanation!

                              I'll go forth trying to make reproducible lookup scenario. Going to try out both dnsmasq and unbounds behavior on domain overrides.

                              1 Reply Last reply Reply Quote 0
                              • johnpozJ
                                johnpoz LAYER 8 Global Moderator
                                last edited by

                                A simple test I would do when you feel your not resolving something over your vpn connection be it ipsec or openvpn... Is just do a direct query yourself via your fav lookup too, dig, host, nslookup - do you get a response?

                                If not then there is no possible way unbound or dnsmasq could either. If you do, then you need to figure out why unbound or dnsmasq is not - did they loose their binding to interface that would allow them to query down the vpn connection? Where exactly sort of response do you get, do you get timeout, refused, servfail, nx?

                                Was what you were looking for not cached? If it was cached you should of gotten response be it you could talk to that other ns either way.

                                I am not clear enough on how routing and pfsense works with ipsec, and what interface your binding unbound too. But least likely to fail sort of setup is to set unbound to only use localhost as as its outbound interface.. Now it should use routing to get to where you setup a domain override, or normal resolving/forwarding. If it has route to where the IP is that you setup in your domain override that says go over the vpn, it should do that.

                                If had some binding issue with its outbound interface, that has failed for some reason - reconnection of vpn, without restart of unbound.. Then sure it could have problems.. Which use of localhost as outbound interface could remedy.

                                Another option when your doing odd stuff with vpn connections that could reconned, and effect some applications binding to an interface/ip is to move the NS off pfsense, and put it on your network, so anything it would be trying to talk to would be normally routed just like any other client on your network.

                                An intelligent man is sometimes forced to be drunk to spend time with his fools
                                If you get confused: Listen to the Music Play
                                Please don't Chat/PM me for help, unless mod related
                                SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                                iorxI 1 Reply Last reply Reply Quote 1
                                • iorxI
                                  iorx @johnpoz
                                  last edited by iorx

                                  @johnpoz

                                  (necroposting, sorry for that. but I felt the need to follow up)

                                  To begin with, I never thanked you for educating and helping me on the subject! Thanks!

                                  This has been brewing for a while, I've gone back and forth, tested stuff and given up.

                                  Short info/summary:
                                  "remotesite.local" points to a DNS on the other side of a VPN connection. An override in Unbound.
                                  "localsite.n23" is the local network where I am.
                                  Unbound stops resolving "remotesite.local" hosts after a while. Works for a while again after restarting Unbound and the stops resolving at remotesite.local

                                  Today using some extreme googe-fu after I realized something. The only overrides that stops resolving are those ending with .local.

                                  What lead me to this conclusion was this:

                                  As one can see (logs below) 17:18 it was able to resolve hosts at the remote site. At 17:19 it couldn't anymore. Checking the logs for Unbound i found that it's not even trying to resolve anything on the .local domain.
                                  Googled around on the issue and found that someone had a similar problem with .local that just stopped responding.
                                  domain-overrides-stop-resolving-periodically-they-only-resume-after-the-service-has-been-restarted
                                  The solution there was to make an override ".local" to point out a DNS. Tested to do that, a "local" override that points to 127.0.0.1.

                                  This was a couple of hours ago and it looks like it's working.
                                  The reason .local was used at the remove domain is ancient, it's a windows domain created when Microsoft "best practice" was to create local FQDN with .local at the end.

                                  Unbound log:

                                  Mar 18 17:19:24 	unbound 	52338 	[52338:3] info: validation success host01.remotesite.local. AAAA IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:3] info: validator operate: query host01.remotesite.local. AAAA IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:3] info: finishing processing for host01.remotesite.local. AAAA IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:3] info: resolving host01.remotesite.local. AAAA IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:3] info: validator operate: query host01.remotesite.local. AAAA IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:2] info: validation success host01.remotesite.local.localsite.n23. AAAA IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:2] info: validator operate: query host01.remotesite.local.localsite.n23. AAAA IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:2] info: finishing processing for host01.remotesite.local.localsite.n23. AAAA IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:2] info: resolving host01.remotesite.local.localsite.n23. AAAA IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:2] info: validator operate: query host01.remotesite.local.localsite.n23. AAAA IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:0] info: validation success host01.remotesite.local.localsite.n23. A IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:0] info: validator operate: query host01.remotesite.local.localsite.n23. A IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:0] info: finishing processing for host01.remotesite.local.localsite.n23. A IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:0] info: resolving host01.remotesite.local.localsite.n23. A IN
                                  Mar 18 17:19:24 	unbound 	52338 	[52338:0] info: validator operate: query host01.remotesite.local.localsite.n23. A IN
                                  Mar 18 17:18:04 	unbound 	52338 	[52338:2] info: validation success host01.remotesite.local. A IN
                                  Mar 18 17:18:04 	unbound 	52338 	[52338:2] info: validator operate: query host01.remotesite.local. A IN
                                  Mar 18 17:18:04 	unbound 	52338 	[52338:2] info: finishing processing for host01.remotesite.local. A IN
                                  Mar 18 17:18:04 	unbound 	52338 	[52338:2] info: resolving host01.remotesite.local. A IN
                                  Mar 18 17:18:04 	unbound 	52338 	[52338:2] info: validator operate: query host01.remotesite.local. A IN 
                                  
                                  iorxI 1 Reply Last reply Reply Quote 1
                                  • iorxI
                                    iorx @iorx
                                    last edited by

                                    This post is deleted!
                                    1 Reply Last reply Reply Quote 0
                                    • M
                                      masupilamie
                                      last edited by masupilamie

                                      Can confirm iorx's "workaround" works. It seems the tld needs to be added as a domain override pointing to itself when a subdomain of that tld is used for local resolution and another subdomain is used for remote resolution via domain override.

                                      In my case my local network uses main.lan and the remote site uses remote.lan
                                      Only adding remote.lan as domain override to the remote site's DNS server made it work for less than a minute after flushing unbound's cache. Adding "lan" as domain override pointing to 127.0.0.1 made DNS resolution to remote.lan stable.

                                      configured Domain Overrides
                                      Screenshot 2025-01-19 at 20.55.04.png

                                      pfsense version: 2.7.2

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.