Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Random DNS Resolver failure with Quad9 over SSL

    DHCP and DNS
    7
    31
    1.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      digitalgimpus
      last edited by

      I've got DNS resolver setup using Quad9 over SSL.

      Seemingly random (hours to days), it stops resolving until DNS is restarted in pfsense. Quad9 connectivity seems fine (i've got uptime kuma monitoring).

      I turned logging a little more verbose, the only thing that even remotely stood out was:

      Feb 26 11:50:12	unbound	12508	[12508:0] error: SSL_handshake syscall: Broken pipe
      Feb 26 11:50:12	unbound	12508	[12508:0] error: SSL_handshake syscall: Broken pipe
      

      Anyone have thoughts as to what could be going on?

      Nothing in this config is particularly unique, Listening on LAN/all VLAN's, outgoing interface is exclusively WAN, Python module enabled, pfblockerNG enabled. DNSSEC disabled.

      keyserK 1 Reply Last reply Reply Quote 0
      • keyserK
        keyser Rebel Alliance @digitalgimpus
        last edited by

        @digitalgimpus I bumped into that issue as well, and at became to much of a burden for the clients so I had to revert to regular forwarding. Never did find the reason, so I’ll follow this thread :-)

        Love the no fuss of using the official appliances :-)

        M 1 Reply Last reply Reply Quote 1
        • GertjanG
          Gertjan
          last edited by Gertjan

          SSL (TLS) connections are way more 'complicated' as the classic UDP packets.
          Sending a sall DNS packet (couple of hundreds bytes ?) cost some CPU resources.
          Use TLS, and the DNS business becomes encrypted channel, the CPU load goes up "a 1000 times" (probably way more).
          And not only that, a TLS connection needs to be set up. There will be a certificate exchange, some negotiations about how - with what cipher type - to encrypt, etc. For this to happen, your device needs a random number generator. I'll get back to that.
          These TCP using connections can't be closed as soon as the DNS request is done. Rebuilding one will take way to much resources on both side, so the engine is kept running, in idle mode for sure, but it stays up.
          Btw : maybe our routers could handle the constant TLS recreation, but on the Quad9 they will take the full hit from the millions of requests that come in. So, again : the connection stays active.

          Why do semi permanent connection brake ?
          The Whatapp application doesn't throw messages on the screen that it had to re establish a connection to 'home' - neither our gmail apps. Both also use this semi permanent link, so the server can signal the presence of a new message immediately, and no constant client polling is needed. In case of an IP change or any other event that breaks the link, the conenction is rebuild, and both sides take the penalty hit.

          Up until now, I'll invite everybody to look up for themselves how TLS connections are maintained, build etc.

          From now on, I have to presume.
          I presume that we can all already and accept Quad9 has a financial business model.
          They have after all a pretty big hardware setup, with interlinks between several sites they use, and several uplink to pay so they have a working good connection to the Internet itself.
          I also presume they pay the people that work for them.
          Solar, or not, they pay their electricity.
          And so on.
          If some one is now motivated enough, its a company, so their "goals" are probably mentioned somewhere.
          I presume .... that their main goal is not a simple : "deliver DNS solution for planet earth."
          Bills have to be settled.

          All this boils down to : I believe that they are picky about what to accept, and what they refuse. I'm talking about incoming connections. The ones that costs 'a lot' when they have to be reconstructed every time. That part of the connection is expensive.
          What I think, because that's I I would do : if a device (and they know who you are ^^) is reconnecting again and again and again, they will filter you (for a not know moment).
          If the device was using an app, and frequent reconnects happen, they could inform the app author (aka gmail, facebook etc) and tell them : your app misbeaves. Do some, or "pay the price". as Both sides have to gain gere something, issues are handled without the end user being aware of anything.
          With unbound (our pfSense) : they can do that. They know the client side isn't a suer device, but a dumb DNS forwarder.
          So they will apply filters ... to protect themselves.

          Normally, all this 'knowledge' isn't something the consumer should be worried about, they don't know sh*t about DNS, TLS, TCP. And of course, it must be 'free' of charge, and so on.

          The random numbers : our TLS stream need good random numbers, as without it, our TLS becomes less random : we all saw the two "Enigma" movies a couple of years ago, it was unbreakable (with the solution possible back then) and today we all decode Enigma with a paper, pencil, an spreadsheet app and 30 minutes of our time.
          Enigma was broken, and one reason was : its less random as was presumed. (the weak factor nasty factor was : it was used by humans and they then to repeat the same words over and over ...)
          So : a good random number generator is very needed.
          Random numbers can be read from a stream (pipe) and this stream is not infinite .... If to much randomness is asked, the system 'stalls' or starts to give less random number, which will impact security.

          Btw : Still strange that unbound doesn't log messages if it can't connect to the upstream DNS resolver over TLS anymore .... someone looked / asked at the unbound support forum as this issue isn't pfSense related, probably more 'unbound' (nlnetlab.nl - the author) ?
          What shows up when you crank the debug level to the max, and then look for 'connection" (TLS) errors ?
          If my throttle (filter) theory is correct, this will not create any messages, just a very very slow connection - like in the good old days.
          I've been using 9.9.9.9 over TLS for a coupe of months in the past (pfSense 24.11, 24.03 or the 23.xx version ?) and found no issues ... but I didn't looked for issues neither.

          Also : keep in mind. I'm member of the "I can resolve - so I resolve" cult. My opinion is somewhat biased, I admit.
          The main reason is : I'm a fan of DNSSEC.
          After more then two decades waiting, we can do our own local 'DNS'. So that's now the default pfSense operating mode. This wasn't some random choice of Netgate - these guys know what is best ^^ But, pfSense can be set up differently if needed. After all, some one has to stand up to help Quand9. (and 8.8.8.8 and 1.1.1.1, etc etc)

          edit : sorry for the ramble.

          No "help me" PM's please. Use the forum, the community will thank you.
          Edit : and where are the logs ??

          D 1 Reply Last reply Reply Quote 0
          • D
            digitalgimpus @Gertjan
            last edited by

            @Gertjan Your theory breaks down when restarting unbound fixes the issue for another day+. If they were selectively blocking a restart wouldn't fix the issue. If they were disconnecting Unbound will reconnect automatically.

            GertjanG 1 Reply Last reply Reply Quote 0
            • GertjanG
              Gertjan @digitalgimpus
              last edited by

              @digitalgimpus

              I agree.
              Restarting unbound wouldn't change the WAN IP.
              The connection gets restarted, a new (TCP) TLS stream gets negotiated, and that one seems to work.
              If the previous TCP was really bad, then a TCP timeout would occur, which implies a re-connection. But that doesn't seem to happen ... so it was connected and kept 'alive', but very slow ?

              Like you, I'd like to be able to point my finger at the source of the issue.
              I presume that (from more to less) :
              Quand9 has a lots of clients.
              Not all, but still a lot use DNS over TLS and not the default UDP port 53 access.
              There are pfSense users that use Quand9 with TLS, I'm pretty sure you are probably not the only one.
              We all use the same (bit by bit) unbound program **

              Only our pfSense (unbound, interface) settings - and your ISP differ.

              So, what different between 'you' and 'me' ?

              If possible, what happens when you use the none-TLS classic connection ?

              Your pfSense version ?

              If unbound (recent version) couldn't work with Quad9-TLS, we would have find that with one simple Google search ...

              ** With one difference : arm or x86-64 code. If you use an Intel processor, with have both the same binary.

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              bmeeksB 1 Reply Last reply Reply Quote 0
              • bmeeksB
                bmeeks @Gertjan
                last edited by bmeeks

                @Gertjan said in Random DNS Resolver failure with Quad9 over SSL:

                We all use the same (bit by bit) unbound program **

                This is not always true. It depends on which version of pfSense someone is running. The unbound daemon in recent pfSense Plus updates is much more current than the daemon running in 2.7.2 CE. There were bugs in the older unbound versions that were addressed upstream.

                One thing I've noticed here is posters seldom include their pfSense version (and thus the unbound or DNS Resolver version they are running). What works fine for someone using the DNS Resolver in pfSense Plus 24.11 might indeed be problematic under some conditions for a user running 2.7.2 CE.

                My own impression by following this sub-forum is that many, many users shoot themselves in the foot by monkeying with the default DNS settings in pfSense. Unless you truly understand how DNS operates (the difference between resolvers and forwarders, how forwarding really works, what the root servers are, etc.) you should never monkey with the DNS settings on an out-of-the-box pfSense install. If you don't change a single thing it will just work. Unless you are a DNS Jedi master, once you start monkeying with forwarding, TLS, DNSSEC, etc., you can expect issues.

                S 1 Reply Last reply Reply Quote 0
                • S
                  SteveITS Galactic Empire @bmeeks
                  last edited by

                  @bmeeks said in Random DNS Resolver failure with Quad9 over SSL:

                  more current than the daemon running in

                  Your post rang a bell...there was actually this, but it was fixed in 2.7.0/23.09:
                  https://redmine.pfsense.org/issues/14056

                  FWIW I've been using SSL at home to Quad9 without noticeable issues on 24.03 and now 24.11.

                  Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                  When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                  Upvote 👍 helpful posts!

                  bmeeksB 1 Reply Last reply Reply Quote 0
                  • bmeeksB
                    bmeeks @SteveITS
                    last edited by bmeeks

                    @SteveITS said in Random DNS Resolver failure with Quad9 over SSL:

                    @bmeeks said in Random DNS Resolver failure with Quad9 over SSL:

                    more current than the daemon running in

                    Your post rang a bell...there was actually this, but it was fixed in 2.7.0/23.09:
                    https://redmine.pfsense.org/issues/14056

                    FWIW I've been using SSL at home to Quad9 without noticeable issues on 24.03 and now 24.11.

                    Yeah, I didn't go back and research the history, but I do recall some early bugs in the DNS Resolver that got fixed with later releases. My comment was intended to remind folks there are multiple versions of some of these "problematic" binaries out there (unbound and kea), and something that works just fine on a recent 24.11 Plus installation may indeed behave differently on an older version.

                    GertjanG 1 Reply Last reply Reply Quote 0
                    • GertjanG
                      Gertjan @bmeeks
                      last edited by

                      @bmeeks said in Random DNS Resolver failure with Quad9 over SSL:

                      got fixed with later releases

                      😵 wait .. no one here is using older, buggy versions, right ?

                      I presume @digitalgimpus uses 2.7.2. I've been using 2.7.2 half a year or so, and did some Quad9 testing. Worked just fine IRC, and I'm using pfSense with a load of hotel clients behind it, so if there was an issue they would have told me about it. After all, if a free service doesn't work, that would be considered as inadmissible (here in Europe), they would have asked for a a refund right immediately.

                      No "help me" PM's please. Use the forum, the community will thank you.
                      Edit : and where are the logs ??

                      1 Reply Last reply Reply Quote 0
                      • D
                        digitalgimpus
                        last edited by

                        So here's how I have things configured, I don't think there's anything particularly unique going on here.

                        I've tried just the IPv4 hosts, I've tried IPv6 only, I've tried both.

                        sc1.png

                        sc2.png

                        GertjanG 1 Reply Last reply Reply Quote 0
                        • N
                          nattygreg
                          last edited by

                          Unbound does time out from time to time, use service watchdog to restart it. you will not see anymore glitch.

                          D 1 Reply Last reply Reply Quote 0
                          • D
                            digitalgimpus @nattygreg
                            last edited by

                            @nattygreg I have watchdog running. It sees the process as still up when this happens.

                            There are other times where unbound crashes or whatever and watchdog restarts successfully. If unbound was crashing it would be less problematic. The problem here is a prolonged dns outage because watchdog doesn't know to do anything.

                            N 1 Reply Last reply Reply Quote 0
                            • GertjanG
                              Gertjan @digitalgimpus
                              last edited by

                              @digitalgimpus

                              What is your pfSense version ?

                              Then here is a solution : do not do this (uncheck !) :

                              344d8d86-e738-4116-89f2-5e0e040600c9-image.png

                              as every time a lease comes in on any LAN type interface, unbound gets restarted.
                              If you have have many LAN devices, and/or devices that are using wifi, this can get create a "x times a minute" unbound restart and as unbound needs anything from 5 to xx seconds to restart, you will get the impression unbound isn't running at all. And correct, its restarting all the time.
                              And the moment it is running, the service watchdog find it wasn't running a second ago, and adds its doses of restarts.
                              This issue is very known and very old, and a solution will be coming very soon now : pfSense 2.8.0 (pfSense Plus already has the solution).

                              No "help me" PM's please. Use the forum, the community will thank you.
                              Edit : and where are the logs ??

                              D 1 Reply Last reply Reply Quote 0
                              • M
                                michmoor LAYER 8 Rebel Alliance @keyser
                                last edited by

                                @keyser i agree here with you and I'm glad I'm not the only one. When i use Quad9, DNS breaks. No rhyme or reason. It can be a week or 2 weeks after i make the change. All external DNS resolution fails. At first it was weird where certain sites wouldn't resolve such as anything related to Microsoft. Ok strange but then after some time, everything external stopped resolving. This is only with Quad9.

                                I have since switched to Cloudflare - no issues.

                                Firewall: NetGate,Palo Alto-VM,Juniper SRX
                                Routing: Juniper, Arista, Cisco
                                Switching: Juniper, Arista, Cisco
                                Wireless: Unifi, Aruba IAP
                                JNCIP,CCNP Enterprise

                                1 Reply Last reply Reply Quote 0
                                • D
                                  digitalgimpus @Gertjan
                                  last edited by digitalgimpus

                                  @Gertjan said in Random DNS Resolver failure with Quad9 over SSL:

                                  as every time a lease comes in on any LAN type interface, unbound gets restarted.
                                  If you have have many LAN devices, and/or devices that are using wifi, this can get create a "x times a minute" unbound restart and as unbound needs anything from 5 to xx seconds to restart, you will get the impression unbound isn't running at all. And correct, its restarting all the time.
                                  And the moment it is running, the service watchdog find it wasn't running a second ago, and adds its doses of restarts.
                                  This issue is very known and very old, and a solution will be coming very soon now : pfSense 2.8.0 (pfSense Plus already has the solution).

                                  I don't think this is the problem here.

                                  If I use my ISP's DNS, or Google or CloudFlare, this isn't an issue. Only Quad9 requires a manual restart.

                                  If this was the culprit, it should happen with all upstream providers regardless.

                                  GertjanG 1 Reply Last reply Reply Quote 0
                                  • GertjanG
                                    Gertjan @digitalgimpus
                                    last edited by

                                    @digitalgimpus said in Random DNS Resolver failure with Quad9 over SSL:

                                    I don't think this is the problem here

                                    No need to think ^^ Fact check.

                                    [25.03-BETA][root@pfSense.bhf.tld]/root: grep "start" /var/log/resolver.log
                                    ....
                                    <30>1 2025-02-26T09:46:43.449085+01:00 pfSense.bhf.tld unbound 22263 - - [22263:0] info: start of service (unbound 1.22.0).
                                    <30>1 2025-02-26T10:02:58.437287+01:00 pfSense.bhf.tld unbound 44152 - - [44152:0] info: start of service (unbound 1.22.0).
                                    <30>1 2025-02-26T15:19:51.097535+01:00 pfSense.bhf.tld unbound 10684 - - [10684:0] info: start of service (unbound 1.22.0).
                                    <30>1 2025-03-03T00:15:23.627116+01:00 pfSense.bhf.tld unbound 65579 - - [65579:0] info: start of service (unbound 1.22.0).
                                    

                                    If your resolver(unbound) restarts a coupe of times a day, you'll be ok.
                                    Several times per hour or even more : that's less optimal, or plain bad. Read again what has been said above ...
                                    That said, using the watchdog and "Register DHCP leases in the DNS resolver" introduces race conditions. Many have tried and they all lost. See forum : hundreds or more posts about this subject)

                                    No "help me" PM's please. Use the forum, the community will thank you.
                                    Edit : and where are the logs ??

                                    N D 2 Replies Last reply Reply Quote 0
                                    • N
                                      nattygreg @Gertjan
                                      last edited by

                                      @Gertjan did you patch unbound they are patches for that, that will stop inbound from restarting every time it gives out a lease. Do that first. Then set it up in watchdog. Yes a few times per day, it will restart, I know because it says connection loss when I’m streaming and it depends also sometimes unbound stops and will not restart therefore throwing everyone off the network.

                                      Use the patches you can download patch in packages and then it will show recommended patches just install all.

                                      GertjanG 1 Reply Last reply Reply Quote 0
                                      • GertjanG
                                        Gertjan @nattygreg
                                        last edited by

                                        @nattygreg

                                        Read ... please.
                                        I'm using 23.05-Beta, which is the latest and greatest.
                                        All know patches are included in that included of pfSense. Maybe not the ones discovered after 5 February 2025.

                                        To stop unbound from restating when new leases coming in, are when leases are renewed, uncheck "Register DHCP leases in the DNS resolver".
                                        After all, by default, that option is not checked (by Netgate).

                                        This situation is known since ... can't remember, 2012 ?!!

                                        @nattygreg said in Random DNS Resolver failure with Quad9 over SSL:

                                        because it says connection loss when I’m streaming

                                        When the resolver restarts this will not influence or even break any connections already established.
                                        After all the resolver (unbhund) handles DNS, which exists for us, humans. Your TV, phone, Pad, PC, etc etc uses ethernet traffic - not "host names". Only when a connection has to be created with a host name, for example, "www.youtube.com", then that "www.youtube.com" is translated ones into an IP address. pfSense's unbound and your device will then keep that resolved host name for a while (cached).
                                        I see no reason why streaming stops when unbound restarts.
                                        I fired up a Youtube and a netflix stream on my PC, and stopped unbound on pfSense for half a minute. Nothing stopped ...
                                        And even when the netflix or youtube needed to resolve a publicity server host name, it will wait a bit before everything comes crashed down.

                                        @nattygreg said in Random DNS Resolver failure with Quad9 over SSL:

                                        sometimes unbound stops and will not restart therefore throwing everyone off the network

                                        Nobody goes of the network, actually, the network works just fine.
                                        Only resolving aka DNS doesn't work anymore. So, use the ancient method : use I addresses and things works very well.
                                        I know, that tedious, people don't use numbers any more, and with Ipv6 it close to impossible.

                                        @nattygreg said in Random DNS Resolver failure with Quad9 over SSL:

                                        Use the patches you can download patch in packages and then it will show recommended patches just install all.

                                        92d84498-90b5-41bb-b886-440df6504312-image.png

                                        so : nope - no patches exist for me.
                                        I did create my own patches, they are listed at the top of the page.

                                        IMHO, the issue "Random DNS Resolver failure with Quad9 over SSL" can't be resolved with a patch.

                                        No "help me" PM's please. Use the forum, the community will thank you.
                                        Edit : and where are the logs ??

                                        N 1 Reply Last reply Reply Quote 0
                                        • N
                                          nattygreg @Gertjan
                                          last edited by

                                          @Gertjan you’re right I don’t have this problem since I only connect to quad9 over TLS. I read you said SSL, but from what I have read I could be wrong dns connection are over TLS, HTTPS, again I could wrong the ports are 53, 853, 443 if you are able to connect by SSL maybe and I said maybe using wrong to connect to Quad9

                                          1 Reply Last reply Reply Quote 0
                                          • N
                                            nattygreg @digitalgimpus
                                            last edited by nattygreg

                                            @digitalgimpus I had that same issue, when it happens look at the dns status under ping it would say zero, remove that dns and use another one

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.