Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Major DNS Bug 23.01 with Quad9 on SSL

    Scheduled Pinned Locked Moved General pfSense Questions
    185 Posts 27 Posters 151.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • GertjanG
      Gertjan @stephenw10
      last edited by

      @stephenw10 said in Major DNS Bug 23.01 with Quad9 on SSL:

      level 4

      4f3f6ec1-db21-46fb-bf4e-68654c213535-image.png

      shows 'nothing' special.
      And what you can't see doesn't exist ;)
      I'm forwarding right now to 1.1.1.1 etc using 853 - see setup above.
      I even have DNSSEC activated + "Harden DNSSEC Data" (on the Resolver Advanced Settings page) .... because, as Clause Kellerman always says : "why not !!".
      I'll leave it like this for the weekend. I'll get back on this monday morning.

      I'm banking, send some mails, received a ton of mail, all my colleges are also doing their toktok things, and no one came complaining to me (they know who to look) to tell me that I have to stop with "messing with the connection".

      No "help me" PM's please. Use the forum, the community will thank you.
      Edit : and where are the logs ??

      1 Reply Last reply Reply Quote 0
      • M
        MoonKnight @Gertjan
        last edited by MoonKnight

        @gertjan
        Hi, I don't see those errors here. I activated log level 4.
        I use Cloudflare DNS servers. 1.1.1.1 and 1.0.0.1
        But I use another hostname for the TLS verification.

        39aee25c-2a8a-43f3-81e1-ee17fdb38fb3-image.png

        3fbb3faa-bccf-4549-9eda-0dfb43c513f8-image.png

        Sorry, I take it back. It was on Log level 3.

        49071888-e2cc-4f06-94df-24f428d1ab60-image.png

        --- 24.11 ---
        Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz
        Kingston DDR4 2666MHz 16GB ECC
        2 x HyperX Fury SSD 120GB (ZFS-mirror)
        2 x Intel i210 (ports)
        4 x Intel i350 (ports)

        J 1 Reply Last reply Reply Quote 0
        • S SteveITS referenced this topic on
        • S SteveITS referenced this topic on
        • S SteveITS referenced this topic on
        • J
          Jimbohello @MoonKnight
          last edited by Jimbohello

          As far as i’m concern, quad9, google, whatever you use. 23.01 has a outstanding bug base on DoT using forwarding mode (dnssec uncheck) what ever config you use you will get « FAILED TO RESOLVE HOST » in DoT mode. For all alias that has to resolve to dynamic dns (xxxx.dyndns.org)

          J 1 Reply Last reply Reply Quote 0
          • J Jimbohello referenced this topic on
          • J Jimbohello referenced this topic on
          • J Jimbohello referenced this topic on
          • J
            Jimbohello @Jimbohello
            last edited by Jimbohello

            here log level 3

            from pfsense resolution imself for ALIASES with dynamic dns (xxxx.dyndns.org)
            Apr 7 21:33:30 unbound 88702 [88702:0] info: finishing processing for vrac-nicolas.dyndns.org.jimbohello.arpa. AAAA IN
            Apr 7 21:33:30 unbound 88702 [88702:0] info: query response was NXDOMAIN ANSWER
            Apr 7 21:33:30 unbound 88702 [88702:0] info: reply from <.> 1.1.1.1#853
            Apr 7 21:33:30 unbound 88702 [88702:0] info: response for vrac-nicolas.dyndns.org.jimbohello.arpa. AAAA IN
            Apr 7 21:33:30 unbound 88702 [88702:0] info: iterator operate: query vrac-nicolas.dyndns.org.jimbohello.arpa. AAAA IN
            Apr 7 21:33:30 unbound 88702 [88702:0] debug: iterator[module 0] operate: extstate:module_wait_reply event:module_event_reply

            From the client side (lan)

            Apr 7 21:38:42 unbound 88702 [88702:0] info: finishing processing for vrac-nicolas.dyndns.org. A IN
            Apr 7 21:38:42 unbound 88702 [88702:0] info: query response was ANSWER
            Apr 7 21:38:42 unbound 88702 [88702:0] info: reply from <.> 8.8.8.8#853
            Apr 7 21:38:42 unbound 88702 [88702:0] info: response for vrac-nicolas.dyndns.org. A IN
            Apr 7 21:38:42 unbound 88702 [88702:0] info: iterator operate: query vrac-nicolas.dyndns.org. A IN

            JESUS I FOUND THE ISSUE I GUEST :
            WHY IS PFSENSE ITSELF TRY TO RESOLVE
            vrac-nicolas.dyndns.org.jimbohello.arpa
            when it suppose to be vrac-nicolas.dyndns.org

            pfsense is adding the domain part of itself ! no wonder why it can't resolve

            GertjanG 1 Reply Last reply Reply Quote 0
            • GertjanG
              Gertjan @Jimbohello
              last edited by Gertjan

              @jimbohello said in Major DNS Bug 23.01 with Quad9 on SSL:

              vrac-nicolas.dyndns.org.jimbohello.arpa
              when it suppose to be vrac-nicolas.dyndns.org
              pfsense is adding the domain part of itself ! no wonder why it can't resolve

              Your Windows PC is doing the same thing ...

              Have a look what 'nslookup' does :

              C:\Users\Gauche>nslookup
              Serveur par defaut :   pfSense.mydomain.tld
              Address:  2a01:cb19:beef:a6dc::1
              
              > set debug
              > google.com
              Serveur :   pfSense.mydomain.tld
              Address:  2a01:cb19:beef:a6dc::1
              
              ------------
              Got answer:
                  HEADER:
                      opcode = QUERY, id = 2, rcode = NXDOMAIN
                      header flags:  response, want recursion, recursion avail.
                      questions = 1,  answers = 0,  authority records = 1,  additional = 0
              
                  QUESTIONS:
                      google.com.mydomain.tld, type = A, class = IN
                  AUTHORITY RECORDS:
                  ->  mydomain.tld
                      ttl = 446 (7 mins 26 secs)
                      primary name server = ns1.mydomain.tld
                      responsible mail addr = postmaster.mydomain.tld
                      serial  = 2023020723
                      refresh = 14400 (4 hours)
                      retry   = 3600 (1 hour)
                      expire  = 1209600 (14 days)
                      default TTL = 10800 (3 hours)
              
              ------------
              ------------
              Got answer:
                  HEADER:
                      opcode = QUERY, id = 3, rcode = NXDOMAIN
                      header flags:  response, want recursion, recursion avail.
                      questions = 1,  answers = 0,  authority records = 1,  additional = 0
              
                  QUESTIONS:
                      google.com.mydomain.tld.net, type = AAAA, class = IN
                  AUTHORITY RECORDS:
                  ->  mydomain.tld
                      ttl = 446 (7 mins 26 secs)
                      primary name server = ns1.mydomain.tld
                      responsible mail addr = postmaster.mydomain.tld
                      serial  = 2023020723
                      refresh = 14400 (4 hours)
                      retry   = 3600 (1 hour)
                      expire  = 1209600 (14 days)
                      default TTL = 10800 (3 hours)
              
              ------------
              ------------
              Got answer:
                  HEADER:
                      opcode = QUERY, id = 4, rcode = NOERROR
                      header flags:  response, want recursion, recursion avail.
                      questions = 1,  answers = 1,  authority records = 0,  additional = 0
              
                  QUESTIONS:
                      google.com, type = A, class = IN
                  ANSWERS:
                  ->  google.com
                      internet address = 142.250.74.238
                      ttl = 30 (30 secs)
              
              ------------
              Réponse ne faisant pas autorité :
              ------------
              Got answer:
                  HEADER:
                      opcode = QUERY, id = 5, rcode = NOERROR
                      header flags:  response, want recursion, recursion avail.
                      questions = 1,  answers = 1,  authority records = 0,  additional = 0
              
                  QUESTIONS:
                      google.com, type = AAAA, class = IN
                  ANSWERS:
                  ->  google.com
                      AAAA IPv6 address = 2a00:1450:4007:80c::200e
                      ttl = 30 (30 secs)
              
              ------------
              Nom :    google.com
              Addresses:  2a00:1450:4007:80c::200e
                        142.250.74.238
              
              >
              

              You saw what happened ?
              I wanted details (fact checking) so I used 'set debug' first.

              Then it showed that when I look up a domain, it adds the local PC domain first, mydomain.tld.

              Because ..... we (me and you) are doing it wrong 😊

              When you want to do a DNS lookup, you have to ask :
              google.com.
              The final dot is important.

              It's not really an issue.

              If I wanted to look up the IP of my PC, called 'gauche2' :
              ( which is just the host name, not the FQDN !)
              nslookup adds again mydomain.tld. and this time and asks pfSense
              gauche2.mydomain.tld

              and that is a 'good' question :
              I got an IPv4 and IPv6 as nslookup asks both by default.

              So, not really an error, and you could consider adding a final dot if the GUI accepts it (it does, I guess).

              Btw : when unbound receives "google.com.mydomain.tld." as the request, it knows that it is authoritative for "mydomain.tld." so it isn't going to ask upstream details about "mydomain.tld." : after all "unbound handles "mydomain.tld" and the upstream resolver doesn't know anything about local domains and resources (normally).

              I'm not going to ask 9.9.9.9 or 1.1.1.1 about FQDN info for the device in my LAN, that's not logic.

              Btw : I have the resolvers/unound "System Domain Local Zone Type" set to "Static", not to the (default?) "Transparant".
              When set to "Transparant", unbound will ask 9.9.9.9 to resolve "vrac-nicolas.dyndns.org.jimbohello.arpa." which ... no surprise, will give no answer or a "NXDOMAIN" as this domain is unknown or "new ?" to 9.9.9.9

              edit : since yesterday, I'm doing the forward thing : DoT to :

              58815cb5-4a85-426f-b947-cc2085760d0e-image.png

              No issues what so ever.

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              J 1 Reply Last reply Reply Quote 0
              • J
                joedan @joedan
                last edited by

                @joedan

                Well I am 95% sure I fixed my issue. Decided to switch dns over tls back on and after a couple of hours had the dreaded dns failures. This time I removed ntopng package completely. Ntopng has been monitoring both lan and wan since Nov 22 under 22.05 and 23.01 since release candidate.

                As soon as I removed ntopng, dns over tls through Cloudflare has been running ok for 24 hours. Browsing websites is super quick. My machine and bandwidth were never under stress however since removing ntopng it has drastically sped up overall speed to load a website and even the pfsense web interface itself. Pfblockerng is showing around 20k dns entries per hour which is normal load.

                J 1 Reply Last reply Reply Quote 0
                • J
                  joedan @joedan
                  last edited by

                  @joedan

                  Nevermind, thought I had fixed it, been over 24 hours and it happened again. Will just stay in Unbound resolver mode for now and leave it be. That seems to be stable and working at least.

                  J 1 Reply Last reply Reply Quote 0
                  • J
                    Jimbohello @Gertjan
                    last edited by

                    @gertjan

                    I’ve tried static ! All dyndns in my aliases does not resolves.

                    Before 22.05 was transparent and ad no issue

                    I did a work arround

                    Instead of regular network/host aliases i did « url ip table aliases » update frequency 1 days. Now it’s working as expected !

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      To be clear you created a file with the dyndns FQDNs in it hosted locally and added that as the URL Table location?

                      J 1 Reply Last reply Reply Quote 0
                      • J
                        Jimbohello @stephenw10
                        last edited by Jimbohello

                        @stephenw10

                        Exaclly

                        Aliases url ip table

                        Host on a web server

                        Http://server.com/mydnamicdns.txt

                        All my dynamic in that files

                        DoT activated

                        All good ninja style

                        1 Reply Last reply Reply Quote 1
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Ok, good. I thought for a minute it was handling URL aliases incorrectly.

                          Well that seems like a clue then. Why is it resolving those entries differently. 🤔

                          J 1 Reply Last reply Reply Quote 0
                          • J
                            Jimbohello @stephenw10
                            last edited by Jimbohello

                            @stephenw10
                            hey that's why i'm doing debugging !
                            i'm not pfsene engeenir !
                            but i don't let my self goes down until i found solution.
                            and for DoT activated with formarding to remote dns ! that's the only solution a found so far
                            hope help :)

                            have a nice one !

                            i know that pfsense aliases HOST/NETWORK seem to use someting call "dns filter"
                            maybe when resolving from "URL IP TABLE" it does it using nslookup or dig or something else !

                            1 Reply Last reply Reply Quote 0
                            • S SteveITS referenced this topic on
                            • S SteveITS referenced this topic on
                            • J
                              joedan @joedan
                              last edited by joedan

                              @joedan

                              I gave DNS over TLS another go after making two adjustments in my environment (under 23.01).

                              I unchecked Disable hardware checksum offload
                              I unchecked Enable the ALTQ support for hn NICs.

                              Not sure why I had the last option ticked given I don't virtualise or use shaping, I use Intel igc / i225 on a dedicated Mini PC. Both these settings were on without issue in 22.05.

                              I ran some load testing on my machine and funny enough this thing is now stable, it actually completed.

                              348988 queries over 2853 seconds at an average of 120 queries a second way, way more than I normally do. It actually finished and I could WFH comfortable and browse websites whilst this was running. Prior to that dns stopped working after a couple of minutes.

                              019f08b4-9443-4445-9385-e610cbf76888-image.png

                              I feel more confident I may have finally (fingers crossed) solved my specific config issue.

                              GertjanG 1 Reply Last reply Reply Quote 2
                              • GertjanG
                                Gertjan @joedan
                                last edited by Gertjan

                                @joedan
                                Thanks for the reminder : I completely forgot to re install my munin unbound graphing for unbound. It's up and collecting as from now.

                                Nice graph btw !

                                I'm still forwarding to 1.1.1.1, actually more using 2606:4700:4700::1111 using TLS.
                                pfBlockerng with some classic DNSBL, using python mode of course.

                                All seems fine to me.
                                The munin charts will give me some visual insights, and is far better as the usual "DNS doesn't work".
                                Now I think about it : the built in Status> Monitoring should have some basic DNS activity monitoring.
                                And, because its friday : why not a flag on the pfSense dashboard : "You've broken DNS !" ? 😊

                                No "help me" PM's please. Use the forum, the community will thank you.
                                Edit : and where are the logs ??

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Hmm, that's interesting.

                                  The ALTQ for hn NICs setting does nothing if you don't have hn NICs.

                                  Re-enabling hardware checksum offload would do something. Only after a reboot though, I assume you did that?
                                  It's hard to see how that wouldn't affect a lot more than just DNS over TLS though.
                                  It would likely also be NIC specific too. Is it possible this only affects igc? That seems unlikely, but possible.

                                  J 1 Reply Last reply Reply Quote 0
                                  • J
                                    joedan @stephenw10
                                    last edited by joedan

                                    @stephenw10

                                    Yes rebooted immediately after the change.

                                    I am the only one with access to pfsense and do so keeping a detailed change log, snapshot and config backup for everything I modify. My last post talks about removing ntopng which may just have taken some load off however that still had issues where dns over tls did eventually stop working, always as the first symptom.

                                    During my load testing post before that I did manage to break standard dns forwarding once but it was a lot harder to do after several attempts. Didn’t think much of it because of the huge dns load which seemed excessive anyway. Going back to standard resolving worked even better. When I did load it up with dns requests it wouldn’t break and was rock solid but things did on occasion slow down. Again due to the ridiculous amount of dns requests it was generating that seemed acceptable. I only have a small pipe (80mbit) to the internet and never had any other issues apart from dns over tls resolution on 23.01. Some other testing which I didn’t post about was to change from Cloudflare to Quad9 to Google for dns over tls but that made no difference. Dns over tls would eventually stop with any upstream provider.

                                    My machine, ram and ssd are completely oversized running bare metal (specs in my post above) and never broke a sweat. I am just glad it’s fixed for me and was thrilled to see dns over tls back on.

                                    I used the same input file for the dns load tester which broke it last time, it was 25MB. When I observed the test finished without issues I reran twice which which just resulted in a lot of cached hits. I then got all of the parts from GitHub and had a 250MB monster. Even this couldn’t break it. Dns over tls has been rock solid since.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Yup, glad it seems good for you and all info is good. 👍

                                      J 1 Reply Last reply Reply Quote 0
                                      • S SteveITS referenced this topic on
                                      • E
                                        Enhance2736
                                        last edited by

                                        Late to the party here guys. I am experiencing DNS resolution issues specifically using quad9 with DoT enabled sporradically throughout the day. If i disable DoT everything works fine. Or if i keep DoT enabled and switch to CloudFlaire then it works throughout the day with no issues.

                                        Running netgate 6100Max, using pfBlockerng with DNSBL and unbound resolver in python mode.

                                        Disable hardware checksum offload was already unchecked and I unchecked Enable the ALTQ support for hn NICs and rebooted.

                                        Hope this helps.

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Are you using one of the igc ports as WAN?

                                          E 1 Reply Last reply Reply Quote 0
                                          • E
                                            Enhance2736 @stephenw10
                                            last edited by

                                            @stephenw10 Yes sir igc3 needed 2.5G configured so i can reuse 10G ix0 for lan.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.