Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Major DNS Bug 23.01 with Quad9 on SSL

    Scheduled Pinned Locked Moved General pfSense Questions
    185 Posts 27 Posters 152.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Hmm, that's interesting.

      The ALTQ for hn NICs setting does nothing if you don't have hn NICs.

      Re-enabling hardware checksum offload would do something. Only after a reboot though, I assume you did that?
      It's hard to see how that wouldn't affect a lot more than just DNS over TLS though.
      It would likely also be NIC specific too. Is it possible this only affects igc? That seems unlikely, but possible.

      J 1 Reply Last reply Reply Quote 0
      • J
        joedan @stephenw10
        last edited by joedan

        @stephenw10

        Yes rebooted immediately after the change.

        I am the only one with access to pfsense and do so keeping a detailed change log, snapshot and config backup for everything I modify. My last post talks about removing ntopng which may just have taken some load off however that still had issues where dns over tls did eventually stop working, always as the first symptom.

        During my load testing post before that I did manage to break standard dns forwarding once but it was a lot harder to do after several attempts. Didn’t think much of it because of the huge dns load which seemed excessive anyway. Going back to standard resolving worked even better. When I did load it up with dns requests it wouldn’t break and was rock solid but things did on occasion slow down. Again due to the ridiculous amount of dns requests it was generating that seemed acceptable. I only have a small pipe (80mbit) to the internet and never had any other issues apart from dns over tls resolution on 23.01. Some other testing which I didn’t post about was to change from Cloudflare to Quad9 to Google for dns over tls but that made no difference. Dns over tls would eventually stop with any upstream provider.

        My machine, ram and ssd are completely oversized running bare metal (specs in my post above) and never broke a sweat. I am just glad it’s fixed for me and was thrilled to see dns over tls back on.

        I used the same input file for the dns load tester which broke it last time, it was 25MB. When I observed the test finished without issues I reran twice which which just resulted in a lot of cached hits. I then got all of the parts from GitHub and had a 250MB monster. Even this couldn’t break it. Dns over tls has been rock solid since.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Yup, glad it seems good for you and all info is good. 👍

          J 1 Reply Last reply Reply Quote 0
          • S SteveITS referenced this topic on
          • E
            Enhance2736
            last edited by

            Late to the party here guys. I am experiencing DNS resolution issues specifically using quad9 with DoT enabled sporradically throughout the day. If i disable DoT everything works fine. Or if i keep DoT enabled and switch to CloudFlaire then it works throughout the day with no issues.

            Running netgate 6100Max, using pfBlockerng with DNSBL and unbound resolver in python mode.

            Disable hardware checksum offload was already unchecked and I unchecked Enable the ALTQ support for hn NICs and rebooted.

            Hope this helps.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Are you using one of the igc ports as WAN?

              E 1 Reply Last reply Reply Quote 0
              • E
                Enhance2736 @stephenw10
                last edited by

                @stephenw10 Yes sir igc3 needed 2.5G configured so i can reuse 10G ix0 for lan.

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Hmm.

                  Ok just for sanity can anyone confirm they are hitting this on some interface other than igc?

                  It seems very unlikely it would be that but....

                  C 1 Reply Last reply Reply Quote 0
                  • E
                    Enhance2736
                    last edited by

                    I'll switch to ix on Wednesday and will report back just incase no none would confirm by then.

                    1 Reply Last reply Reply Quote 1
                    • C
                      Cylosoft @stephenw10
                      last edited by

                      @stephenw10 We have several with WAN on ix0 and LAN on ix1. They stall out just like the others.

                      1 Reply Last reply Reply Quote 1
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Ok, thanks.

                        And just to confirm that's to Quad9 specifically?

                        C 1 Reply Last reply Reply Quote 0
                        • C
                          Cylosoft @stephenw10
                          last edited by

                          @stephenw10 Yes. They all started out with Quad9 and TLS.

                          1 Reply Last reply Reply Quote 1
                          • E
                            Enhance2736
                            last edited by

                            yessir, Quad9 in System > General Setup > DNS Server Settings

                            Below settings are causing intermittend DNS resolution issues described above by others (can't resolve anything for few minutes then eventually starts resolving):

                            Address: 9.9.9.9 Hostname for TLS Verification: dns.quad9.net 
                            Address: 149.112.112.112 Hostname for TLS Verification: dns.quad9.net 
                            

                            When I change above settings to use CloudFlare infont of Quad9 then resolves without issues:

                            Address: 1.1.1.1 Hostname for TLS Verification: cloudflare-dns.com 
                            
                            1 Reply Last reply Reply Quote 0
                            • E
                              Enhance2736
                              last edited by

                              I am testing now on Quad9 with DoT and it is falilng right now:

                              ❯ ping github.com
                              ping: cannot resolve github.com: Unknown host
                              
                              ❯ dig github.com
                              
                              ; <<>> DiG 9.10.6 <<>> github.com
                              ;; global options: +cmd
                              ;; Got answer:
                              ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 40969
                              ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
                              
                              ;; OPT PSEUDOSECTION:
                              ; EDNS: version: 0, flags:; udp: 1432
                              ;; QUESTION SECTION:
                              ;github.com.                    IN      A
                              
                              ;; Query time: 27 msec
                              ;; SERVER: 10.11.100.1#53(10.11.100.1)
                              ;; WHEN: Mon Apr 17 18:17:08 EDT 2023
                              ;; MSG SIZE  rcvd: 39
                              

                              pfSense UI opened and it sat on main page. Then unbound started to resolve:

                              ❯ dig github.com
                              
                              ; <<>> DiG 9.10.6 <<>> github.com
                              ;; global options: +cmd
                              ;; Got answer:
                              ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25085
                              ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
                              
                              ;; OPT PSEUDOSECTION:
                              ; EDNS: version: 0, flags:; udp: 1432
                              ;; QUESTION SECTION:
                              ;github.com.                    IN      A
                              
                              ;; ANSWER SECTION:
                              github.com.             39      IN      A       140.82.112.3
                              
                              ;; Query time: 38 msec
                              ;; SERVER: 10.11.100.1#53(10.11.100.1)
                              ;; WHEN: Mon Apr 17 18:21:11 EDT 2023
                              ;; MSG SIZE  rcvd: 55
                              

                              What i noticed too is that it was failing until i opened pfSense UI and it started to resolve. No idea what is going on lol

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Still nothing logged in the resolver log?

                                That's weird that it started after you logged in. What widgets do you have on the dashboard?

                                w0wW E 2 Replies Last reply Reply Quote 0
                                • w0wW
                                  w0w @stephenw10
                                  last edited by

                                  After changes made and posted https://forum.netgate.com/post/1094989 I do have fewer problems with DNS responses, but like somebody said it mostly appears when using a mobile device and trying to load some page with a bunch of images and scripts, for example reddit. I see previews but when trying to open large image by clicking on preview, not opening post, it can load it immediately or just hangs and give no output, showing broken image icon. This happens randomly. I am also using pfBlocker-NG, suricata and python mode. All found reddit domains are whitelisted. Nothing unusual in the pfBlocker or suricata logs. Forwarding to Cloudflare and Google servers, using TLS / SSL, using both IPV4 and IPv6 services. Also using multiwan setup. Planning to disable IPv6 and one of the two gateways just to test it again.

                                  1 Reply Last reply Reply Quote 0
                                  • E
                                    Enhance2736 @stephenw10
                                    last edited by

                                    @stephenw10 Let me get some logs and i will post them here. As far as widgets go i have Gateways, zfs, traffic Grapsh, interfaces, ups status, Firewall logs, OpenVpn, WireGuard, HAProxy, Interface Statistics, Services status, pfBlockerNG, Smart status, Installed Packages, and system information.

                                    As far as system packages:
                                    Name Version
                                    acme 0.7.3_1
                                    arpwatch 0.2.1
                                    haproxy 0.61_9
                                    Netgate_Firmware_Upgrade 0.56
                                    ntopng 0.8.13_10
                                    nut 2.8.0_2
                                    pfBlockerNG 3.2.0_4
                                    Service_Watchdog 1.8.7_1
                                    WireGuard 0.1.6_5

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Hmm, hard to imagine anything there would affect it. Could Unbound have been restarted when you connect to the dash? The resolver logs would show that if so.

                                      1 Reply Last reply Reply Quote 0
                                      • E
                                        Enhance2736
                                        last edited by

                                        Ok it is happening now again lol
                                        what i did was switched to quad9 this morning at 8:30 Eastern and logs show bunch of entries around that time. DNS works fine for som time and now it is failing again below is the dig output. Logs literally stop around the time i made the switch to quad9 and applied the changes (assuming it restarts unbound). There are no entries around when DNS failures occur

                                        ❯ dig github.com
                                        
                                        ; <<>> DiG 9.10.6 <<>> github.com
                                        ;; global options: +cmd
                                        ;; Got answer:
                                        ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 62433
                                        ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
                                        
                                        ;; OPT PSEUDOSECTION:
                                        ; EDNS: version: 0, flags:; udp: 1432
                                        ;; QUESTION SECTION:
                                        ;github.com.                    IN      A
                                        
                                        ;; Query time: 27 msec
                                        ;; SERVER: 10.11.100.1#53(10.11.100.1)
                                        ;; WHEN: Tue Apr 18 08:48:29 EDT 2023
                                        ;; MSG SIZE  rcvd: 39
                                        

                                        Unbound logs are set to 4

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          If you log into the gui and it starts working is anything logged at that point?

                                          E 1 Reply Last reply Reply Quote 0
                                          • E
                                            Enhance2736 @stephenw10
                                            last edited by

                                            @stephenw10 Testing som more. I set logging to 4 around 8:50 am ET. bunch of logs at that point. i am staying logged in and the thing is sporadically stops resolving. Here is an example of SERVFAIL followed by NOERR:

                                            ❯ dig developers.redhat.com
                                            
                                            ; <<>> DiG 9.10.6 <<>> developers.redhat.com
                                            ;; global options: +cmd
                                            ;; Got answer:
                                            ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 4300
                                            ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
                                            
                                            ;; OPT PSEUDOSECTION:
                                            ; EDNS: version: 0, flags:; udp: 1432
                                            ;; QUESTION SECTION:
                                            ;developers.redhat.com.         IN      A
                                            
                                            ;; Query time: 40 msec
                                            ;; SERVER: 10.11.100.1#53(10.11.100.1)
                                            ;; WHEN: Tue Apr 18 09:40:44 EDT 2023
                                            ;; MSG SIZE  rcvd: 50
                                            
                                            ❯ dig developers.redhat.com
                                            
                                            ; <<>> DiG 9.10.6 <<>> developers.redhat.com
                                            ;; global options: +cmd
                                            ;; Got answer:
                                            ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24803
                                            ;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1
                                            
                                            ;; OPT PSEUDOSECTION:
                                            ; EDNS: version: 0, flags:; udp: 1432
                                            ;; QUESTION SECTION:
                                            ;developers.redhat.com.         IN      A
                                            
                                            ;; ANSWER SECTION:
                                            developers.redhat.com.  3584    IN      CNAME   developers.redhat.com2.edgekey.net.
                                            developers.redhat.com2.edgekey.net. 21584 IN CNAME developers.redhat.com2.edgekey.net.globalredir.akadns.net.
                                            developers.redhat.com2.edgekey.net.globalredir.akadns.net. 900 IN CNAME e40408.dsca.akamaiedge.net.
                                            e40408.dsca.akamaiedge.net. 4   IN      A       23.221.225.234
                                            e40408.dsca.akamaiedge.net. 4   IN      A       23.221.225.210
                                            
                                            ;; Query time: 253 msec
                                            ;; SERVER: 10.11.100.1#53(10.11.100.1)
                                            ;; WHEN: Tue Apr 18 09:40:48 EDT 2023
                                            ;; MSG SIZE  rcvd: 235
                                            

                                            Unbound log /var/log/resolver.log does not show anything after around restart time (logging level change):

                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] info: receive_udp on interface: 10.11.100.1
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: mesh_run: start
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: mesh_run: python module exit state is module_wait_module
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: process_request: new external request event
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: iter_handle processing q with state INIT REQUEST STATE
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] info: resolving api.atlassian.com. A IN
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: request has dependency depth of 0
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: forwarding request
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: iter_handle processing q with state QUERY TARGETS STATE
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] info: processQueryTargets: api.atlassian.com. A IN
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: processQueryTargets: targetqueries 0, currentqueries 0 sentcount 0
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] info: DelegationPoint<.>: 0 names (0 missing), 2 addrs (0 result, 2 avail) parentNS
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug:   [dns.quad9.net] ip4 149.112.112.112 port 853 (len 16)
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug:   [dns.quad9.net] ip4 9.9.9.9 port 853 (len 16)
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: attempt to get extra 3 targets
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: rpz: iterator module callback: have_rpz=0
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: selrtt 376
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] info: sending query: api.atlassian.com. A IN
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: sending to target: <.> 149.112.112.112#853
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: dnssec status: not expected
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: mesh_run: iterator module exit state is module_wait_reply
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] info: mesh_run: end 1 recursion states (1 with reply, 0 detached), 1 waiting replies, 0 recursion replies sent, 0 replies dropped, 0 states jostled out
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] info: 0RDd mod1 rep api.atlassian.com. A IN
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: cache memory msg=66072 rrset=66072 infra=7808 val=0
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: serviced send timer
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: tcp bound to src ip4 127.0.0.1 port 0 (len 16)
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: the query is using TLS encryption, for dns.quad9.net
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: comm point start listening 25 (-1 msec)
                                            Apr 18 08:50:26 pfSense unbound[1677]: [1677:3] debug: comm point listen_for_rw 25 0
                                            
                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.