• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Major DNS Bug 23.01 with Quad9 on SSL

Scheduled Pinned Locked Moved General pfSense Questions
185 Posts 27 Posters 163.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    joedan @joedan
    last edited by joedan May 16, 2023, 10:24 PM May 11, 2023, 9:51 PM

    @joedan

    After 24 hours here are some unbound stats observed by turning off ASLR as per note above. My environment has been stable during the last 10 days; this being the only change made 24 hours ago. I WFH 8-10 hours a week and my DNS calls have been consistent over this period (running DNS over TLS to Cloudflare).

    6a611e6f-a622-4237-8ea6-0193bb3f8cd5-image.png

    10-15ms average recursion time improvement since the change

    8c1bd27c-d032-4697-bb5c-dbedbc1fda25-image.png

    DNS queries have remained fairly consistent

    d171cb89-2ab7-4c86-925e-7409c3103e04-image.png

    This observation sticks out though. I'm not technical enough to pretend what's going on here but 'TCP out' has tightened up. All other metrics measured (which are hidden) are identical.

    4f1654bd-546d-4b2f-929d-fe663bdbbec3-image.png

    Data shows a consistent new trend after disabling ASLR several days later...

    8c1dcfe1-b000-4ffe-a68f-da75daf9b92f-image.png
    2079de11-347b-4f86-822c-af2755e26a30-image.png

    J G 2 Replies Last reply May 11, 2023, 10:41 PM Reply Quote 0
    • J
      JonH @joedan
      last edited by May 11, 2023, 10:41 PM

      @joedan Where do I find that graphing?

      1 Reply Last reply Reply Quote 0
      • J
        joedan
        last edited by joedan May 11, 2023, 11:21 PM May 11, 2023, 11:20 PM

        @jonh

        I use Grafana / InfluxDB.
        I'm not a linux person so use a downloaded / pre-made Home Assistant virtual machine in Windows 11 Pro (HyperV). The Grafana / Influx DB addon's were a very simple click to install and run.

        I use the pfSense Telegraf package using custom config for Unbound stats reporting documented here..
        https://github.com/VictorRobellini/pfSense-Dashboard

        The Grafana dashboard is here..
        https://grafana.com/grafana/dashboards/6128-unbound/
        Victor doesn't appear to have one for unbound but I also use his dashboard for other stats (from his Github page).

        I didn't have to code anything just follow the bouncing ball on various sites to set things up.

        J 1 Reply Last reply May 11, 2023, 11:22 PM Reply Quote 1
        • J
          JonH @joedan
          last edited by May 11, 2023, 11:22 PM

          @joedan Thanks, I'll check it out

          1 Reply Last reply Reply Quote 0
          • G
            Gertjan @joedan
            last edited by May 12, 2023, 9:54 AM

            @joedan said in Major DNS Bug 23.01 with Quad9 on SSL:

            Like the subject of the thread :

            490442f2-5e32-44dd-8063-58c7433a8a5b-image.png

            but arguably the same issue : 1.1.1.1 or 9.9.9.9, "what is the difference ?", I'm forwarding just to test 'if it works, or not'.
            Up until today, I didn't find any issues.

            Note that I'm still using

            700aaa28-6470-455b-b3c8-bb15bd5e2608-image.png

            as I presume that error conditions would get logged, if they arrive.
            The last log line form unbound tells me that it started a couple of day ago :

            dc244d62-568b-4b23-9566-7a518425233b-image.png

            I'm going to restart unbound now, and disable address space layout randomization (ALSR), although I just can't wrap my head around this workaround: why would the position in (virtual mapped) memory matter ?
            ALSR is used in every modern OS these days.
            It's a extra layer of obscurity without any cost or negative side effects, and, as far as I know, only makes the life of a hacker more difficult. hack entry vectors by using stack or memory (aka buffer) overruns are become much harder, as the process uses another layout in memory every time it starts.

            Btw : this is is what I think. I admit I don't know shit about this ALSR executable option, and was aware only vaguely about the concept.

            I also think, or thought, that a coder that makes programs doesn't need to be aware of 'where' the code, data and other segments are placed in memory. We all code relocatable for decades now without being aware of it, as the compiler and linker takes care of all these things.
            The unbound issue was marked as as FreeBSD bug first, and they, FreeBSD, said : go ask the unbound author. See post above.
            Disabling ASLR is just a stop-gap. (edit : if this is even related to this bug, issue ... we'll see)
            IMHO, the real issue is somewhere between unbound and ones of it's linked libraries "libcrypto.so.111" and "libssl.so.111", as I presume that the issue arrives when forwarding over TLS is used.

            The default unbound mode is resolving doesn't use TLS, so, for me, that explains why the resolver is working fine while resolving.

            Anyway, not a pfSense issue, more an unbound issue or even further away, the way how all this interoperates.
            The good news : Its still an issue for Netgate, as they are very FreeBSD aware, they will find out what the real issue is.

            [ end of me thinking out loud ]

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            1 Reply Last reply Reply Quote 0
            • S
              stephenw10 Netgate Administrator
              last edited by May 12, 2023, 11:53 AM

              I would love to see anyone who was hitting this issue repeatedly confirm the ASLR workaround here.

              S J R 4 Replies Last reply May 12, 2023, 12:06 PM Reply Quote 0
              • S
                SwissSteph @stephenw10
                last edited by May 12, 2023, 12:06 PM

                @stephenw10
                I'm testing right now and for the moment it's "OK" .... I just put back my DNS settings like on my 22.05 version (which was working without any problem)

                5bd68f2f-86bd-4fa5-9835-b895cfebdfae-image.png

                I started with two "no-name" pfsense, one for use at home and the other as a backup in case of problems (which can happen when you're new to pfsense).
                ... And now I'm living with a Netgate 8200
                ... And sorry for my bad English...

                S 1 Reply Last reply May 12, 2023, 12:09 PM Reply Quote 0
                • S
                  SwissSteph @SwissSteph
                  last edited by May 12, 2023, 12:09 PM

                  230b80ad-c87a-48f3-92b6-afa60040f2ed-image.png

                  I started with two "no-name" pfsense, one for use at home and the other as a backup in case of problems (which can happen when you're new to pfsense).
                  ... And now I'm living with a Netgate 8200
                  ... And sorry for my bad English...

                  G 1 Reply Last reply May 12, 2023, 12:45 PM Reply Quote 1
                  • G
                    Gertjan @SwissSteph
                    last edited by Gertjan May 12, 2023, 12:49 PM May 12, 2023, 12:45 PM

                    @swisssteph

                    Your are forwarding : ok
                    and
                    using TLS - port 853 ?

                    Right ?

                    edit :
                    I am forwarding to these two over TLS - and most (not all) traffic goes actually over 2620:fe::fe and
                    2620:fe::9, the IPv6 counterpart of 9.9.9.9 and 149.112.112.112.
                    I did not do the ASLR patch .... I'm still waiting for it to fail 😢
                    As sson as I see the fail, I'll go patch, so I'll know what I don't want to see any more.

                    No "help me" PM's please. Use the forum, the community will thank you.
                    Edit : and where are the logs ??

                    S 1 Reply Last reply May 12, 2023, 12:48 PM Reply Quote 0
                    • S
                      SwissSteph @Gertjan
                      last edited by May 12, 2023, 12:48 PM

                      @gertjan

                      YES

                      704a9b91-693f-4a84-a04a-73490fcc6c39-image.png

                      I started with two "no-name" pfsense, one for use at home and the other as a backup in case of problems (which can happen when you're new to pfsense).
                      ... And now I'm living with a Netgate 8200
                      ... And sorry for my bad English...

                      G 1 Reply Last reply May 12, 2023, 12:53 PM Reply Quote 1
                      • G
                        Gertjan @SwissSteph
                        last edited by May 12, 2023, 12:53 PM

                        @swisssteph

                        Close.
                        You mean :

                        cc795123-915a-45fc-abd3-fe12b38a423c-image.png

                        The "SSL/TLS Listen Port" (your image) is the port unbound uses on the LAN side, so it listens to that port for the DNS requests emitted by the pfSense LAN clients (if you have them, Windows 10 was not capable of doing DNS over TLS, I guess Windwos 11 can do it - didn't check).

                        No "help me" PM's please. Use the forum, the community will thank you.
                        Edit : and where are the logs ??

                        S N 2 Replies Last reply May 12, 2023, 1:00 PM Reply Quote 0
                        • S
                          SwissSteph @Gertjan
                          last edited by May 12, 2023, 1:00 PM

                          @gertjan Sorry

                          16e4dc1b-336d-47fc-8d38-ac73fffdb0ad-image.png

                          I started with two "no-name" pfsense, one for use at home and the other as a backup in case of problems (which can happen when you're new to pfsense).
                          ... And now I'm living with a Netgate 8200
                          ... And sorry for my bad English...

                          1 Reply Last reply Reply Quote 0
                          • N
                            N0m0fud @Gertjan
                            last edited by May 12, 2023, 1:51 PM

                            @gertjan Windows 11 after a certain version supports DOT and DOH

                            1 Reply Last reply Reply Quote 0
                            • J
                              JonH @stephenw10
                              last edited by May 12, 2023, 9:19 PM

                              @stephenw10 The long waits to resolve have plagued me since upgrade to 23.01-Release with python mode & TLS. For the past week+ I've been using unbound/53 with no problems. I updated unbound as soon as I saw Chris's post. For past 2 days I've been back on python mode/853 and it's working well for me. Currently using localhost w/ fallback to dot1 & quad9. Hope this was the 'fix'.

                              1 Reply Last reply Reply Quote 1
                              • R
                                RobbieTT @stephenw10
                                last edited by RobbieTT May 14, 2023, 11:21 AM May 13, 2023, 6:38 PM

                                @stephenw10 said in Major DNS Bug 23.01 with Quad9 on SSL:

                                I would love to see anyone who was hitting this issue repeatedly confirm the ASLR workaround here.

                                I don't know the syntax to reverse the ASLR command - anyone?

                                I did a crude but repeatable test - hammered a load of name servers, including my pfSense resolver which is pointing at Quad9 using DoT:

                                Before the ASLR hack:

                                1684002538158-2023-05-13-at-19.08.59-before.png

                                After the ASLR hack:

                                1684002587941-2023-05-13-at-19.16.20-after.png

                                • Uncached minimums down from 34ms to 9ms
                                • Uncached maximums down from 663ms to 392ms
                                • Uncached average down from 103ms to 67ms
                                • Uncached SD down from 159ms to 90ms

                                What's not to like?

                                ☕️

                                [NB capturing the random 'pauses' and 'fail to loads' suffered (as described earlier) is much harder to represent]

                                J 1 Reply Last reply May 13, 2023, 6:46 PM Reply Quote 0
                                • J
                                  jimp Rebel Alliance Developer Netgate @RobbieTT
                                  last edited by May 13, 2023, 6:46 PM

                                  @robbiett said in Major DNS Bug 23.01 with Quad9 on SSL:

                                  @stephenw10 said in Major DNS Bug 23.01 with Quad9 on SSL:

                                  I would love to see anyone who was hitting this issue repeatedly confirm the ASLR workaround here.

                                  I don't know the syntax to reverse the ASLR command - anyone?

                                  # elfctl /usr/local/sbin/unbound
                                  File '/usr/local/sbin/unbound' features:
                                  noaslr          'Disable ASLR' is unset.
                                  [...]
                                  # killall -9 unbound
                                  # elfctl -e +noaslr /usr/local/sbin/unbound
                                  # elfctl /usr/local/sbin/unbound
                                  File '/usr/local/sbin/unbound' features:
                                  noaslr          'Disable ASLR' is set.
                                  [...]
                                  # elfctl -e -noaslr /usr/local/sbin/unbound
                                  # elfctl /usr/local/sbin/unbound
                                  File '/usr/local/sbin/unbound' features:
                                  noaslr          'Disable ASLR' is unset.
                                  [...]
                                  

                                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                  Need help fast? Netgate Global Support!

                                  Do not Chat/PM for help!

                                  R 1 Reply Last reply May 14, 2023, 7:57 AM Reply Quote 2
                                  • R
                                    RobbieTT @jimp
                                    last edited by May 14, 2023, 7:57 AM

                                    @jimp
                                    Thanks Jim 👍

                                    1 Reply Last reply Reply Quote 0
                                    • R
                                      RobbieTT @stephenw10
                                      last edited by RobbieTT May 14, 2023, 12:35 PM May 14, 2023, 10:05 AM

                                      @stephenw10

                                      I should probably add that even with the ASLR unset I still get weird looking results when I attempt an individual DNS Lookup on a domain name that I know hasn't been cached:

                                       2023-05-14 at 10.43.36.png

                                      If I understand the pfSense diagnostics screen, when the internal DNS resolver has to use forwarding to answer a query I would expect a similar time to answer the query as the fastest responding name server (2629:fe::fe at 7ms in this example) plus the almost negligible processing delay from checking the cache. Yet it actually takes a snooze-worthy 168ms.

                                      Why does the DNS resolver take 168ms for a simple forwarded (uncached) query when the forwarder itself has an answer from an upstream provider in just 7ms or, in other words, around 24 times slower than expected?

                                      ☕️

                                      M 1 Reply Last reply May 14, 2023, 5:22 PM Reply Quote 0
                                      • S SteveITS referenced this topic on May 14, 2023, 1:44 PM
                                      • M
                                        MoonKnight @RobbieTT
                                        last edited by MoonKnight May 14, 2023, 5:28 PM May 14, 2023, 5:22 PM

                                        @robbiett

                                        Have been wondering about the same for some time now. It doesn't make sense

                                        733a0b99-efe9-4aed-b945-26c89e5a7e89-image.png

                                        And if you do the same lookup just seconds after the first time "The query time" is on 0.
                                        Wait 1 minute then back to 60 msec.

                                        I have been having this behavior since 23.01 and maybe on 22.05 also .

                                        --- 24.11 ---
                                        Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz
                                        Kingston DDR4 2666MHz 16GB ECC
                                        2 x HyperX Fury SSD 120GB (ZFS-mirror)
                                        2 x Intel i210 (ports)
                                        4 x Intel i350 (ports)

                                        R J 2 Replies Last reply May 14, 2023, 5:37 PM Reply Quote 0
                                        • R
                                          RobbieTT @MoonKnight
                                          last edited by May 14, 2023, 5:37 PM

                                          @moonknight said in Major DNS Bug 23.01 with Quad9 on SSL:

                                          @robbiett
                                          And if you do the same lookup just seconds after first time "The query time" is on 0.
                                          Wait 1 minute then back to 60 msec.

                                          I don't suffer the second part of your observation. Once my query is cached it stays cached until it is removed or reset - it obeys the settings I have given it.

                                          If you stop the resolver for a moment and run the command:

                                          unbound-control -c /var/unbound/unbound.conf dump_cache

                                          ...you can poke around and see what is in your cache.

                                          ☕️

                                          M 1 Reply Last reply May 14, 2023, 5:45 PM Reply Quote 1
                                          143 out of 185
                                          • First post
                                            143/185
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                            This community forum collects and processes your personal information.
                                            consent.not_received