Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unbound/DNS resolver with IPv6 unreliable finally solved

    Scheduled Pinned Locked Moved IPv6
    21 Posts 4 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      strandte
      last edited by

      I just want to share a finding I think could be valuable to others using pfSense/Netgate with IPv6. We have a Netgate 8300 Max with pfSense+ and a firewall with pfSense CE, This bug/issue is the same on both versions. Now we have upgraded the pfSense CE box to the 2.8 Beta, and the issue is the same there also.

      This problem has made DNS unreliable for us a long time, and had it not been that we also in parallel to the two pfSense firewalls in a HA-setup used a Pi-Hole DNS server for outgoing DNS resolving it would have been too unreliable to use. I did not know the extent of this issue until I setup monitoring of both IPv4 and IPv6 DNS resolving for all our DNS resolvers. Initially we didn't see the problems until the Pi-Hole DNS resolver failed on rate limiting, because both of the pfSense DNS resolvers (unbound) had failed. Sometimes the unbound service had actually stopped, but mostly the DNS resolver just did not respond to incoming queries anymore.
      After I setup monitoring I found out that the DNS resolver on the pfSense boxes often stopped for a while and then automatically started to respond to queries again, and that the problem seemed to be more pronounced for resolving via the IPv6 addresses of the pfSense boxes. Often the unbound stopped responding to queries done via the IPv6 address, but still responded to queries done via the IPv4 address. After a while both became unresponsive. When this was the case restarting the service made it respond to queries again, but I think it might also would have started again at some time if I had not done anything. When the service had stopped this would not be the case of cause.

      I have read a lot of other forum topics that seems related and describes the problem in much the same way as we experience this problem. Often these forums touch the topic of unbound restarting when clients get a dhcp IP address and register to DNS. We do not use the pfSense boxes for dhcp or internal DNS, as we use MS AD controllers for this, and uses the pfSense DNS resolvers only for outgoing DNS.

      I also need to mention that we use the "Python Module" with unbound, even if I do not think this is related to this problem.

      Finally I can start to talk about my findings:

      As most of you know the incoming queries are either resolved from cache or forwarded to an external DNS server. As I understand it outgoing resolution attempts go via the localhost interface (127.0.0.1 and ::1), and then automatically are routed out through the correct interface. It is important to choose only the localhost interface as outgoing interface under the unbound configuration, not use the default "All". The only exception is if you need the pfSense boxes to resolve internal addresses from another internal dns server, like we need when using dns names in firewall rules. Of this reason we also have one of the internal interfaces selected.
      In the list of interfaces the localhost interface is just named "Localhost", so we actually do not know if this means just 127.0.0.1 or both 127.0.0.1 and ::1, but I assume the latter.
      You have the choice to either use the default choice of not checking the "Disable Auto-added Access Control" under unbound "Advanced Settings", or to check it and configure the access to do queries against the DNS resolver of the pfSense boxes manually.
      If you choose the default access settings I believe that access to the ::1 localhost interface has been forgotten in the access rules. After I changed to manual access rules, and added both 127.0.0.1 and ::1 to the access rules for allowing queries to the localhost interface unbound has been rock solid.

      When you test this remember that cached and internal lookups against the IPv6 interface of the pfSense box will still work, so you do not see the problem before you try non cached domain name lookups.

      I will issue a bug report on this to see if I'm correct on my assumption that the ::1 localhost address has been forgotten for inclusion in the automatic access rules on pfSense.

      S GertjanG 2 Replies Last reply Reply Quote 0
      • S
        strandte @strandte
        last edited by

        @strandte The bug reported can be found on https://redmine.pfsense.org/issues/16137

        1 Reply Last reply Reply Quote 1
        • GertjanG
          Gertjan @strandte
          last edited by Gertjan

          @strandte said in Unbound/DNS resolver with IPv6 unreliable finally solved:

          As most of you know the incoming queries are either resolved from cache or forwarded to an external DNS server

          Exact.
          3 steps :

          1. If the DNS request reached unbound, it can do something with it.
          2. If the DNS request has a match in the cache (TTL valid, etc), the answer will be send back right away.
          3. If unknown, resolving starts, which implies usage of an aviable gateway 'upstream'.

          With unbound extended logging, and if pfBlockerng is installed, the python mod pfb_python will also log, so you check if the DNS traffic actually reached unbound.
          I have the impression that its told that unbound has failed, or is was actually up and running, but there was an "interface problem". Like, just an example, an IPv6 prefix has changed, the client wasn't aware, still using the now depreciated IPv6 address, and as IPv6 traffic is preferred over IPv4, the comm fails.

          If a resolver action is needed, same thing : IPv6 is preferred, and if issues with IPv6, unbound will / might switch over to IPv4 ... and if it doesn't for some reason, well, game over.

          There are people that prefer to forward instead of resolving. And of course : forwarding over TLS.
          That will add a massive quantity of "TLS" code (external libraries, etc), and the slightest bug will ... fail again.
          That's why I tend to think : DNS is important for me so I KIS it : I resolve.
          I do use DNSSEC, which is more of a parallel process to the classic DNS handling.

          Something that might protect me from potential issues : all my pfSense interfaces are hooked up to devices that are UPS protected : my upstream ISP router and all my 'core' switches' so my pfSense always stay 'up'.

          And the final potential issue : ISP ... and IPv6.
          For some reason there can't be an ISP that implemented IPv6 according the existing RFCs .... Mine, the biggest in France (like 16 million clients), has known IPv6 flaws.

          @strandte said in Unbound/DNS resolver with IPv6 unreliable finally solved:

          You have the choice to either use the default choice of not checking the "Disable Auto-added Access Control" under unbound "Advanced Settings", or to check it and configure the access to do queries against the DNS resolver of the pfSense boxes manually.
          If you choose the default access settings I believe that access to the ::1 localhost interface has been forgotten in the access rules. After I changed to manual access rules, and added both 127.0.0.1 and ::1 to the access rules for allowing queries to the localhost interface unbound has been rock solid.

          Mine is unchecked.

          You said in the bug post :

          I could probably have checked in the source code if my assumption is correct,

          so why did't you open that file ?
          Not the source : look at the unbound config file.

          My access_list.conf file (soured by unbound.conf) :

          access-control: 127.0.0.1/32 allow_snoop
          access-control: ::1 allow_snoop
          access-control: 127.0.0.0/8 allow 
          access-control: 192.168.1.0/24 allow 
          access-control: 192.168.2.0/24 allow 
          access-control: 192.168.3.0/24 allow 
          access-control: 192.168.100.0/24 allow 
          access-control: 2a01:cb19:dead:beef::/64 allow 
          access-control: ::1/128 allow
          

          ::1 is there - twice !

          Isn't this what you mean ?

          Btw : I'm using 25.03 beta 2 for two weeks now, rock solid.
          I presume 24.11 was also good.

          No "help me" PM's please. Use the forum, the community will thank you.
          Edit : and where are the logs ??

          1 Reply Last reply Reply Quote 1
          • S
            strandte
            last edited by

            When I do the search:

            find / -name access_list.conf

            The name seems to be with an "s" on list -> access_lists.conf

            When I search the config /usr/local/etc/unbound/unbound.conf I do not find any reference to access_lists.conf.
            When I search the /var/unbound/unbound.conf I do see the reference to access_lists.conf.

            What are the difference between the two config files?

            I also see that access_lists.conf changes to my own rules when I check the "Disable Auto-added Access Control"

            So when I use manual access control the:

            access-control: ::1 allow_snoop

            is not included in access_lists.conf

            Maybe the access line:

            access-control: ::1/128 allow

            isn't evaluated since the "access-control: ::1 allow_snoop" comes first in the list in access_lists.conf when auto rules is chosen?

            This should indicate that ::1 has not been forgotten, but that is not what I experience.

            GertjanG 1 Reply Last reply Reply Quote 1
            • GertjanG
              Gertjan @strandte
              last edited by Gertjan

              @strandte said in Unbound/DNS resolver with IPv6 unreliable finally solved:

              When I do the search:

              find / -name access_list.conf

              All unbound related settings files are here
              /var/unbound/

              If you use the general 'search all' command, you might find the same file else where. These are not used.
              Run the magic command :

              ps aux | grep 'unbound' 
              ...
              unbound 64814   0.0  2.8 142788 114284  -  Ss   10:04       1:25.10 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
              ...
              

              Now you know where the actually unbound.conf file is, and all other files it includes, like the access_list.conf file.

              @strandte said in Unbound/DNS resolver with IPv6 unreliable finally solved:

              When I search the config /usr/local/etc/unbound/unbound.conf I do not find any reference to access_lists.conf.
              When I search the /var/unbound/unbound.conf I do see the reference to access_lists.conf.

              What are the difference between the two config files?

              pfSense is based upon FreeBSD, but isn't FreeBSD.
              pfSense uses FreeBSD packages, and when you install them, they can place config file somewhere under (example) /usr/local/.... but these are rarely used by pfSense.
              All (most) of the processes that are sued by pfSense have their config lives kept /var/.....

              About these :
              access-control: ::1 allow_snoop
              access-control: ::1/128 allow

              I couldn't tell you what the difference is between allow_snoop and allow or why ::1/128 and ::1.
              For me, these two, as 127.0.0.1, are only be used / reached by processes running on pfSense itself that need some host name to be resolved, like the pfSense package upgrade checker.

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              1 Reply Last reply Reply Quote 0
              • S
                strandte
                last edited by

                It says in the web gui what the differences are between the Allow and allow_snoop:

                Allow: Allow queries from hosts within the netblock defined below.

                Allow Snoop: Allow recursive and nonrecursive access from hosts within the netblock defined below. Used for cache snooping and ideally should only be configured for the administrative host.

                I will start testing with the allow_snoop before or after the allow in my manual access list. Then we can see if this is the root problem.

                GertjanG 1 Reply Last reply Reply Quote 1
                • GertjanG
                  Gertjan @strandte
                  last edited by

                  @strandte

                  allow_ or allow_snoop, thats one thing.
                  But what does is mean :

                  access-control: ::1 allow_snoop
                  access-control: ::1/128 allow

                  as ::1 and ::1/128 are the same for me.
                  So, allow_snoop gets set on ::1 and then overridden by 'allow' ?

                  Here you can see how the access_lists.conf file gets created :
                  /etc/inc/unbound.inc

                  First, "127.0.0.1/32 allow_snoop" gets thrown in and then "::1 allow_snoop".
                  You and I don't chose the 'allowed_snoop' from the GUI here, it's hard coded.

                  Then, all your local known interfaces, and this includes
                  127.0.0.0/8 allow
                  and
                  ::1/128 allow

                  and as said : these are the same for me.

                  Note that the 'allow' here is the one I set up here :

                  f98bcfaf-53c5-4af3-8011-0db92ef32d97-image.png

                  I wonder what happens if I delete these two lines :

                  a9c12224-4af3-4fde-8015-2265b6b91de5-image.png

                  No "help me" PM's please. Use the forum, the community will thank you.
                  Edit : and where are the logs ??

                  tinfoilmattT 1 Reply Last reply Reply Quote 0
                  • S
                    strandte
                    last edited by

                    It is possible to configure manually the "Allow_snoop", by choosing "Allow_snoop" under "Action" in the web gui of unbound under "Access Lists". The sequence of rules shown in the lower part of that web page are the same as the sequence of the rules in the access_lists.conf file. I'm currently testing to see if putting the snoop rule first or last has any influence on the end result, but so far I can't say that it seems to have any effect.
                    What I see is that after doing a change in the configuration the resolver will work for some minutes more, then be unresponsive for some minutes and then come back. I wonder if it is pfBlockers large DNSBL lists which need to be loaded before unbound can take care of resolving again?
                    After this down period of some minutes it again seems to be stable no matter if the snoop is first or last. The only thing I'm not able to reproduce is to make the rule in access_lists.conf 100% similar to the auto created rule:

                    Auto created it looks like this:

                    access-control: ::1 allow_snoop

                    but when I manually create it I can't make it in any other way than this (mask needs to be selected, and if you do not select it will be auto created):

                    access-control: ::1/128 allow_snoop

                    I guess that should be the same, if it isn't a bug which makes trouble for the auto rule?

                    GertjanG 1 Reply Last reply Reply Quote 0
                    • GertjanG
                      Gertjan @strandte
                      last edited by

                      @strandte said in Unbound/DNS resolver with IPv6 unreliable finally solved:

                      It is possible to configure manually the "Allow_snoop", by choosing "Allow_snoop" u

                      Noop.

                      I selected some random "Refuse Nonlocal" :

                      f5c02c50-025e-4dd8-97e2-b2ed86b11634-image.png

                      this creates :

                      access-control: 127.0.0.1/32 allow_snoop
                      access-control: ::1 allow_snoop
                      access-control: 127.0.0.0/8 allow 
                      access-control: 192.168.1.0/24 allow 
                      access-control: 192.168.2.0/24 allow 
                      access-control: 192.168.3.0/24 allow 
                      access-control: 192.168.100.0/24 allow 
                      access-control: 2a01:dead:beef:a6e2::/64 allow 
                      access-control: ::1/128 allow 
                      #Local
                      access-control: fc00::/7 refuse_non_local
                      access-control: fe80::/64 refuse_non_local
                      access-control: 10.0.0.0/24 refuse_non_local
                      access-control: ::ffff:0:0/96 refuse_non_local
                      access-control: 192.168.4.0/24 refuse_non_local
                      access-control: 192.168.3.0/24 refuse_non_local
                      access-control: 2a01:dead:beef:a6e2::/64 refuse_non_local
                      

                      so everything before
                      #Local
                      didn't change.

                      No "help me" PM's please. Use the forum, the community will thank you.
                      Edit : and where are the logs ??

                      1 Reply Last reply Reply Quote 0
                      • S
                        strandte
                        last edited by

                        Are you sure you have disabled the auto rules?
                        Services_ DNS Resolver_ Advanced Settings.png
                        The access_lists.conf does not look like that in my case with auto rules disabled.

                        GertjanG 1 Reply Last reply Reply Quote 0
                        • GertjanG
                          Gertjan @strandte
                          last edited by

                          @strandte

                          When I check this :

                          7f6060c6-b713-4733-8fd0-66da303b4378-image.png

                          ( which I don't have checked right now )

                          I have to create my own access list .... so more chances to f##k up.
                          I'm a "leave it to default" guy 😊

                          No "help me" PM's please. Use the forum, the community will thank you.
                          Edit : and where are the logs ??

                          1 Reply Last reply Reply Quote 0
                          • tinfoilmattT
                            tinfoilmatt @Gertjan
                            last edited by tinfoilmatt

                            @Gertjan said in Unbound/DNS resolver with IPv6 unreliable finally solved:

                            I wonder what happens if I delete these two lines :

                            a9c12224-4af3-4fde-8015-2265b6b91de5-image.png

                            I would delete ::1/128 allow, and add the /128 CIDR notation to the ::1 allow_snoop entry manually—and leave 127.0.0.1/32 allow_snoop as is.

                            But I agree that neither may be necessary as my auto-generated /var/unbound/access_lists.conf contains only the ACLs I've defined via the webGUI. No loopback addresses are present.

                            GertjanG 1 Reply Last reply Reply Quote 0
                            • S
                              strandte
                              last edited by

                              This post is deleted!
                              1 Reply Last reply Reply Quote 0
                              • GertjanG
                                Gertjan @tinfoilmatt
                                last edited by

                                @tinfoilmatt said in Unbound/DNS resolver with IPv6 unreliable finally solved:

                                127.0.0.1/128

                                Isn't that a 'syntax error' ?
                                127.0.0.1/32 is as far as it goes.

                                No "help me" PM's please. Use the forum, the community will thank you.
                                Edit : and where are the logs ??

                                tinfoilmattT 1 Reply Last reply Reply Quote 0
                                • S
                                  strandte
                                  last edited by

                                  I tried to add the:

                                  access-control: ::1/128 allow_snoop

                                  to my manual access list over the weekend. The result was that both the primary and the secondary firewall had a unresponcive unbond service on sunday. Today I have removed the access rule above. We will see how this goes.

                                  Does anybody know what this rule is for?

                                  1 Reply Last reply Reply Quote 0
                                  • S
                                    strandte
                                    last edited by

                                    Yes, 127.0.0.1/128 is wrong, and 127.0.0.1/32 is correct, but I see that the auto rule allow 127.0.0.0/8. Is that necessary? In case it is which other IP addresses in the 127.0.0.0/8 are in use?

                                    GertjanG 1 Reply Last reply Reply Quote 0
                                    • GertjanG
                                      Gertjan @strandte
                                      last edited by

                                      @strandte said in Unbound/DNS resolver with IPv6 unreliable finally solved:

                                      but I see that the auto rule allow 127.0.0.0/8. Is that necessary? In case it is which other IP addresses in the 127.0.0.0/8 are in use?

                                      127.0.0/8 is a bit large, true.

                                      Execute for example

                                      sockstat -4 | grep '127'
                                      

                                      to see who is using 127.a.b.c

                                      No "help me" PM's please. Use the forum, the community will thank you.
                                      Edit : and where are the logs ??

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        strandte
                                        last edited by

                                        I can't see any othe address in the 127.0.0.0/8 used other than 127.0.0.1, so I would assume it would be ok to change out 127.0.0.0/8 with 127.0.0.1/32.

                                        GertjanG 1 Reply Last reply Reply Quote 0
                                        • GertjanG
                                          Gertjan @strandte
                                          last edited by

                                          @strandte

                                          Sure.
                                          Will it make any difference ?
                                          Not sure.

                                          No "help me" PM's please. Use the forum, the community will thank you.
                                          Edit : and where are the logs ??

                                          1 Reply Last reply Reply Quote 0
                                          • w0wW
                                            w0w
                                            last edited by w0w

                                            @strandte said in Unbound/DNS resolver with IPv6 unreliable finally solved:

                                            After I setup monitoring I found out that the DNS resolver on the pfSense boxes often stopped for a while and then automatically started to respond to queries again, and that the problem seemed to be more pronounced for resolving via the IPv6 addresses of the pfSense boxes. Often the unbound stopped responding to queries done via the IPv6 address, but still responded to queries done via the IPv4 address. After a while both became unresponsive. When this was the case restarting the service made it respond to queries again, but I think it might also would have started again at some time if I had not done anything. When the service had stopped this would not be the case of cause.

                                            I honestly don't think that the unbound control settings are related to this issue. Unless access control for unbound simply prevents its endless restarts and refreshes, which, in turn, solves one problem but clearly causes a thousand others. In fact, unbound was rock-stable for me on 24.11 and earlier. But it "broke" on the 23.05 beta because pfSense suddenly decided that now, every time it receives configuration packets (RA info) from the ISP, it needs to refresh and update all related settings, including unbound, even if no changes are detected in those settings received. When I started digging into this issue, I was surprised to see just how many requests there were to stop and restart the service — sometimes ending with it stopping and not starting again. Ideally, with proper Python module integration, everything should be much more stable, but sometimes it is not.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.