Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Major DNS Bug 23.01 with Quad9 on SSL

    Scheduled Pinned Locked Moved General pfSense Questions
    185 Posts 27 Posters 183.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • GertjanG
      Gertjan @N0m0fud
      last edited by Gertjan

      @n0m0fud

      Strange :

      ...
      dhcp lease entries
      
      include: /var/unbound/dhcpleases_entries.conf
      Domain overrides
      
      include: /var/unbound/domainoverrides.conf
      Unbound custom options
      
      server:
      tls-upstream: yes
      forward-zone:
      name: "."
      forward-ssl-upstream: yes
      forward-addr: 1.1.1.1@853
      forward-addr: 1.0.0.1@853
      #forward-addr: 2606:4700:4700::64@853
      #forward-addr: 2606:4700:4700::6400@853
      #forward-addr: 149.112.112.11@853
      #forward-addr: 9.9.9.11@853
      #forward-addr: 2620:fe::11@853
      #forward-addr: 2620:fe::fe:11@853
      #forward-addr: 52.205.50.148@853
      
      Remote Control Config
      .....
      

      When you set :
      33d830eb-62c6-44a7-96dd-81b6d45fe64f-image.png

      pfSense will add a "forward-zone" section will all the needed addresses :

      .....
      # dhcp lease entries
      include: /var/unbound/dhcpleases_entries.conf
      
      
      # Domain overrides
      include: /var/unbound/domainoverrides.conf
      # Forwarding
      forward-zone:
      	name: "."
      	forward-tls-upstream: yes
      	forward-addr: 9.9.9.9@853#dns9.quad9.net
      	forward-addr: 149.112.112.112@853#dns9.quad9.net
      	forward-addr: 2620:fe::fe@853#dns9.quad9.net
      	forward-addr: 2620:fe::9@853#dns9.quad9.net
      
      
      # Unbound custom options
      server:
       statistics-cumulative: no
      
      
      ###
      # Remote Control Config
      ###
      .....
      

      And no "forward-ssl-upstream" but "forward-tls-upstream", although both are the same.

      So you are forwarding without the GUI set to forwarding ?
      Why would you use the custom options to achieve forwarding ?
      Maybe that needed to be done in the past, but no so anymore.

      The usage of

      tls-upstream: yes
      

      is also very rare.
      Google knows about it - in just one place ( !! ): it's the unbound.conf doc :

      
             tls-upstream: <yes or no>
                    Enabled or disable whether the upstream queries use TLS only for
                    transport.   Default is no.  Useful in tunneling scenarios.  The
                    TLS contains plain DNS in TCP wireformat.  The other server must
                    support  this  (see  tls-service-key).  If you enable this, also
                    configure a tls-cert-bundle  or  use  tls-win-cert  or  tls-sys-
                    tem-cert  to  load CA certs, otherwise the connections cannot be
                    authenticated. This option enables TLS for all of them,  but  if
                    you  do not set this you can configure TLS specifically for some
                    forward  zones  with  forward-tls-upstream.    And   also   with
                    stub-tls-upstream.
      
      

      Reading this makes me thing : I would stay away from it.

      Btw : I'm forwarding to quad9 (IPv4 and IPv6) for the last week or so.
      I didn't detect no issues what so ever.
      If my unbound got restarted, like this morning, that was me doing so.

      No "help me" PM's please. Use the forum, the community will thank you.
      Edit : and where are the logs ??

      1 Reply Last reply Reply Quote 1
      • S SteveITS referenced this topic on
      • S SteveITS referenced this topic on
      • cmcdonaldC
        cmcdonald Netgate Developer
        last edited by cmcdonald

        I am working on a build of the version of Unbound we shipped with 22.05 that will run on 23.01 (and one for 23.05). If the problem goes away with this old version of Unbound, I will start bisecting to find a root cause. I just don't want to go off in the weeds chasing ghosts.

        It would also be useful to know if this problem also manifests on 23.05.

        Standby

        Need help fast? https://www.netgate.com/support

        cmcdonaldC 1 Reply Last reply Reply Quote 4
        • cmcdonaldC
          cmcdonald Netgate Developer @cmcdonald
          last edited by cmcdonald

          This issue is not unique to pfSense.

          We do have a workaround:

          1. Stop the Unbound service
          2. Run elfctl -e +noaslr /usr/local/sbin/unbound
          3. Start the Unbound service

          Ref: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270912

          Need help fast? https://www.netgate.com/support

          J 1 Reply Last reply Reply Quote 6
          • J
            joedan @cmcdonald
            last edited by joedan

            @cmcdonald

            I am following this thread with interest, I once was plagued with this (DNS over TLS slowness, random timeouts) but no longer and its not 100% clear why, so I made the change as a precaution.

            elfctl -e +noaslr /usr/local/sbin/unbound

            elfctl /usr/local/sbin/unbound

            Shell Output - elfctl /usr/local/sbin/unbound
            File '/usr/local/sbin/unbound' features:
            noaslr 'Disable ASLR' is set.
            noprotmax 'Disable implicit PROT_MAX' is unset.
            nostackgap 'Disable stack gap' is unset.
            wxneeded 'Requires W+X mappings' is unset.
            la48 'amd64: Limit user VA to 48bit' is unset.

            This website indicates ASLR is on by default in FreeBSD14 -
            https://wiki.freebsd.org/AddressSpaceLayoutRandomization and not in 13 (or lower?) so maybe this explains why I stumbled across this after upgrading from 22.05 to 23.01?

            J 1 Reply Last reply Reply Quote 0
            • J
              joedan @joedan
              last edited by joedan

              @joedan

              After 24 hours here are some unbound stats observed by turning off ASLR as per note above. My environment has been stable during the last 10 days; this being the only change made 24 hours ago. I WFH 8-10 hours a week and my DNS calls have been consistent over this period (running DNS over TLS to Cloudflare).

              6a611e6f-a622-4237-8ea6-0193bb3f8cd5-image.png

              10-15ms average recursion time improvement since the change

              8c1bd27c-d032-4697-bb5c-dbedbc1fda25-image.png

              DNS queries have remained fairly consistent

              d171cb89-2ab7-4c86-925e-7409c3103e04-image.png

              This observation sticks out though. I'm not technical enough to pretend what's going on here but 'TCP out' has tightened up. All other metrics measured (which are hidden) are identical.

              4f1654bd-546d-4b2f-929d-fe663bdbbec3-image.png

              Data shows a consistent new trend after disabling ASLR several days later...

              8c1dcfe1-b000-4ffe-a68f-da75daf9b92f-image.png
              2079de11-347b-4f86-822c-af2755e26a30-image.png

              J GertjanG 2 Replies Last reply Reply Quote 0
              • J
                JonH @joedan
                last edited by

                @joedan Where do I find that graphing?

                1 Reply Last reply Reply Quote 0
                • J
                  joedan
                  last edited by joedan

                  @jonh

                  I use Grafana / InfluxDB.
                  I'm not a linux person so use a downloaded / pre-made Home Assistant virtual machine in Windows 11 Pro (HyperV). The Grafana / Influx DB addon's were a very simple click to install and run.

                  I use the pfSense Telegraf package using custom config for Unbound stats reporting documented here..
                  https://github.com/VictorRobellini/pfSense-Dashboard

                  The Grafana dashboard is here..
                  https://grafana.com/grafana/dashboards/6128-unbound/
                  Victor doesn't appear to have one for unbound but I also use his dashboard for other stats (from his Github page).

                  I didn't have to code anything just follow the bouncing ball on various sites to set things up.

                  J 1 Reply Last reply Reply Quote 1
                  • J
                    JonH @joedan
                    last edited by

                    @joedan Thanks, I'll check it out

                    1 Reply Last reply Reply Quote 0
                    • GertjanG
                      Gertjan @joedan
                      last edited by

                      @joedan said in Major DNS Bug 23.01 with Quad9 on SSL:

                      Like the subject of the thread :

                      490442f2-5e32-44dd-8063-58c7433a8a5b-image.png

                      but arguably the same issue : 1.1.1.1 or 9.9.9.9, "what is the difference ?", I'm forwarding just to test 'if it works, or not'.
                      Up until today, I didn't find any issues.

                      Note that I'm still using

                      700aaa28-6470-455b-b3c8-bb15bd5e2608-image.png

                      as I presume that error conditions would get logged, if they arrive.
                      The last log line form unbound tells me that it started a couple of day ago :

                      dc244d62-568b-4b23-9566-7a518425233b-image.png

                      I'm going to restart unbound now, and disable address space layout randomization (ALSR), although I just can't wrap my head around this workaround: why would the position in (virtual mapped) memory matter ?
                      ALSR is used in every modern OS these days.
                      It's a extra layer of obscurity without any cost or negative side effects, and, as far as I know, only makes the life of a hacker more difficult. hack entry vectors by using stack or memory (aka buffer) overruns are become much harder, as the process uses another layout in memory every time it starts.

                      Btw : this is is what I think. I admit I don't know shit about this ALSR executable option, and was aware only vaguely about the concept.

                      I also think, or thought, that a coder that makes programs doesn't need to be aware of 'where' the code, data and other segments are placed in memory. We all code relocatable for decades now without being aware of it, as the compiler and linker takes care of all these things.
                      The unbound issue was marked as as FreeBSD bug first, and they, FreeBSD, said : go ask the unbound author. See post above.
                      Disabling ASLR is just a stop-gap. (edit : if this is even related to this bug, issue ... we'll see)
                      IMHO, the real issue is somewhere between unbound and ones of it's linked libraries "libcrypto.so.111" and "libssl.so.111", as I presume that the issue arrives when forwarding over TLS is used.

                      The default unbound mode is resolving doesn't use TLS, so, for me, that explains why the resolver is working fine while resolving.

                      Anyway, not a pfSense issue, more an unbound issue or even further away, the way how all this interoperates.
                      The good news : Its still an issue for Netgate, as they are very FreeBSD aware, they will find out what the real issue is.

                      [ end of me thinking out loud ]

                      No "help me" PM's please. Use the forum, the community will thank you.
                      Edit : and where are the logs ??

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        I would love to see anyone who was hitting this issue repeatedly confirm the ASLR workaround here.

                        S J RobbieTTR 4 Replies Last reply Reply Quote 0
                        • S
                          SwissSteph @stephenw10
                          last edited by

                          @stephenw10
                          I'm testing right now and for the moment it's "OK" .... I just put back my DNS settings like on my 22.05 version (which was working without any problem)

                          5bd68f2f-86bd-4fa5-9835-b895cfebdfae-image.png

                          I started with two "no-name" pfsense, one for use at home and the other as a backup in case of problems (which can happen when you're new to pfsense).
                          ... And now I'm living with a Netgate 8200
                          ... And sorry for my bad English...

                          S 1 Reply Last reply Reply Quote 0
                          • S
                            SwissSteph @SwissSteph
                            last edited by

                            230b80ad-c87a-48f3-92b6-afa60040f2ed-image.png

                            I started with two "no-name" pfsense, one for use at home and the other as a backup in case of problems (which can happen when you're new to pfsense).
                            ... And now I'm living with a Netgate 8200
                            ... And sorry for my bad English...

                            GertjanG 1 Reply Last reply Reply Quote 1
                            • GertjanG
                              Gertjan @SwissSteph
                              last edited by Gertjan

                              @swisssteph

                              Your are forwarding : ok
                              and
                              using TLS - port 853 ?

                              Right ?

                              edit :
                              I am forwarding to these two over TLS - and most (not all) traffic goes actually over 2620:fe::fe and
                              2620:fe::9, the IPv6 counterpart of 9.9.9.9 and 149.112.112.112.
                              I did not do the ASLR patch .... I'm still waiting for it to fail 😢
                              As sson as I see the fail, I'll go patch, so I'll know what I don't want to see any more.

                              No "help me" PM's please. Use the forum, the community will thank you.
                              Edit : and where are the logs ??

                              S 1 Reply Last reply Reply Quote 0
                              • S
                                SwissSteph @Gertjan
                                last edited by

                                @gertjan

                                YES

                                704a9b91-693f-4a84-a04a-73490fcc6c39-image.png

                                I started with two "no-name" pfsense, one for use at home and the other as a backup in case of problems (which can happen when you're new to pfsense).
                                ... And now I'm living with a Netgate 8200
                                ... And sorry for my bad English...

                                GertjanG 1 Reply Last reply Reply Quote 1
                                • GertjanG
                                  Gertjan @SwissSteph
                                  last edited by

                                  @swisssteph

                                  Close.
                                  You mean :

                                  cc795123-915a-45fc-abd3-fe12b38a423c-image.png

                                  The "SSL/TLS Listen Port" (your image) is the port unbound uses on the LAN side, so it listens to that port for the DNS requests emitted by the pfSense LAN clients (if you have them, Windows 10 was not capable of doing DNS over TLS, I guess Windwos 11 can do it - didn't check).

                                  No "help me" PM's please. Use the forum, the community will thank you.
                                  Edit : and where are the logs ??

                                  S N 2 Replies Last reply Reply Quote 0
                                  • S
                                    SwissSteph @Gertjan
                                    last edited by

                                    @gertjan Sorry

                                    16e4dc1b-336d-47fc-8d38-ac73fffdb0ad-image.png

                                    I started with two "no-name" pfsense, one for use at home and the other as a backup in case of problems (which can happen when you're new to pfsense).
                                    ... And now I'm living with a Netgate 8200
                                    ... And sorry for my bad English...

                                    1 Reply Last reply Reply Quote 0
                                    • N
                                      N0m0fud @Gertjan
                                      last edited by

                                      @gertjan Windows 11 after a certain version supports DOT and DOH

                                      1 Reply Last reply Reply Quote 0
                                      • J
                                        JonH @stephenw10
                                        last edited by

                                        @stephenw10 The long waits to resolve have plagued me since upgrade to 23.01-Release with python mode & TLS. For the past week+ I've been using unbound/53 with no problems. I updated unbound as soon as I saw Chris's post. For past 2 days I've been back on python mode/853 and it's working well for me. Currently using localhost w/ fallback to dot1 & quad9. Hope this was the 'fix'.

                                        1 Reply Last reply Reply Quote 1
                                        • RobbieTTR
                                          RobbieTT @stephenw10
                                          last edited by RobbieTT

                                          @stephenw10 said in Major DNS Bug 23.01 with Quad9 on SSL:

                                          I would love to see anyone who was hitting this issue repeatedly confirm the ASLR workaround here.

                                          I don't know the syntax to reverse the ASLR command - anyone?

                                          I did a crude but repeatable test - hammered a load of name servers, including my pfSense resolver which is pointing at Quad9 using DoT:

                                          Before the ASLR hack:

                                          1684002538158-2023-05-13-at-19.08.59-before.png

                                          After the ASLR hack:

                                          1684002587941-2023-05-13-at-19.16.20-after.png

                                          • Uncached minimums down from 34ms to 9ms
                                          • Uncached maximums down from 663ms to 392ms
                                          • Uncached average down from 103ms to 67ms
                                          • Uncached SD down from 159ms to 90ms

                                          What's not to like?

                                          ☕️

                                          [NB capturing the random 'pauses' and 'fail to loads' suffered (as described earlier) is much harder to represent]

                                          jimpJ 1 Reply Last reply Reply Quote 0
                                          • jimpJ
                                            jimp Rebel Alliance Developer Netgate @RobbieTT
                                            last edited by

                                            @robbiett said in Major DNS Bug 23.01 with Quad9 on SSL:

                                            @stephenw10 said in Major DNS Bug 23.01 with Quad9 on SSL:

                                            I would love to see anyone who was hitting this issue repeatedly confirm the ASLR workaround here.

                                            I don't know the syntax to reverse the ASLR command - anyone?

                                            # elfctl /usr/local/sbin/unbound
                                            File '/usr/local/sbin/unbound' features:
                                            noaslr          'Disable ASLR' is unset.
                                            [...]
                                            # killall -9 unbound
                                            # elfctl -e +noaslr /usr/local/sbin/unbound
                                            # elfctl /usr/local/sbin/unbound
                                            File '/usr/local/sbin/unbound' features:
                                            noaslr          'Disable ASLR' is set.
                                            [...]
                                            # elfctl -e -noaslr /usr/local/sbin/unbound
                                            # elfctl /usr/local/sbin/unbound
                                            File '/usr/local/sbin/unbound' features:
                                            noaslr          'Disable ASLR' is unset.
                                            [...]
                                            

                                            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                            Need help fast? Netgate Global Support!

                                            Do not Chat/PM for help!

                                            RobbieTTR 1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.