Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    23.01 breaks DNS resolver and pFblocker

    General pfSense Questions
    9
    23
    3.1k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • johnpozJ
      johnpoz LAYER 8 Global Moderator @Draco
      last edited by johnpoz

      @draco said in 23.01 breaks DNS resolver and pFblocker:

      Flickr for instance, hang for quite some time

      Ok how long does that take, in your browser web tools or whatever look to see what is trying to be resolved..

      Again if your trying to troubleshoot something need specifics.. If fickr is a problem - what on that is the hold up..

      That was even faster than reddit

      ; <<>> DiG 9.18.8 <<>> www.flickr.com +trace
      ;; global options: +cmd
      .                       76294   IN      NS      l.root-servers.net.
      .                       76294   IN      NS      a.root-servers.net.
      .                       76294   IN      NS      c.root-servers.net.
      .                       76294   IN      NS      j.root-servers.net.
      .                       76294   IN      NS      k.root-servers.net.
      .                       76294   IN      NS      h.root-servers.net.
      .                       76294   IN      NS      g.root-servers.net.
      .                       76294   IN      NS      b.root-servers.net.
      .                       76294   IN      NS      d.root-servers.net.
      .                       76294   IN      NS      f.root-servers.net.
      .                       76294   IN      NS      m.root-servers.net.
      .                       76294   IN      NS      e.root-servers.net.
      .                       76294   IN      NS      i.root-servers.net.
      .                       76294   IN      RRSIG   NS 8 0 518400 20230317050000 20230304040000 951 . kJaY9uAyEtbTnrjZ1qAQTsqHExUgSViSqmXstFQXUmBOgAbAHKQlp9Nj BCAb0pUbm3sDWOrGvOaqxN6QKFXd8331v6lxtsDKd3kIGE5Wo7kLwzw4 XzZeGRwfuRPnmwXtfYnJTo+X4tGgg2xK6c0uy5QdsFVzHEPwJNXURZVE rXQ/erzXJmKUXFuZim8sfm7UjTTJJsBwk8+P8uM+B9CKDtfE0CvxtyIS BbGi9pg4PDlJz0zB3V9VM/9+IcJuQ4NfnBDvw3pD9Q0LVx9qN2GzG1TK 06r6LMEBB9RRhO5wkZ7UwZuVzloYxntIpBMVL3zdTl2vVCIQFlzqSqJL 5OfW1A==
      ;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 1 ms
      
      com.                    172800  IN      NS      e.gtld-servers.net.
      com.                    172800  IN      NS      f.gtld-servers.net.
      com.                    172800  IN      NS      c.gtld-servers.net.
      com.                    172800  IN      NS      l.gtld-servers.net.
      com.                    172800  IN      NS      b.gtld-servers.net.
      com.                    172800  IN      NS      a.gtld-servers.net.
      com.                    172800  IN      NS      k.gtld-servers.net.
      com.                    172800  IN      NS      d.gtld-servers.net.
      com.                    172800  IN      NS      h.gtld-servers.net.
      com.                    172800  IN      NS      j.gtld-servers.net.
      com.                    172800  IN      NS      i.gtld-servers.net.
      com.                    172800  IN      NS      g.gtld-servers.net.
      com.                    172800  IN      NS      m.gtld-servers.net.
      com.                    86400   IN      DS      30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766
      com.                    86400   IN      RRSIG   DS 8 1 86400 20230317170000 20230304160000 951 . hxP6AHA8/MhX3JTy2BcSkd4CeviA+3lw1LFWfIPNDgpdka84SUuKYc50 8hhh7bcW5/MDeKZJ82JkxBlZkrWWaNpGncQKOLdjmlkesYB03WPoOo/I aJohqNzLawFsVK4+2c48yrCeX1uesJQCiJnvJEUHyJmd8KtrRYeUnqDn nBiIlzuEHm5r3TQodZTO8AiH+Dp722SzlP8E8JI8LPdsozvClNKTGcCp KZVPMq3yCeuZA8+T859Ah8HuJjyh4NAEIAQe2K4uuD9B2ZSCt9lEf5i1 qcBwMXtUf9Od86hnXK/cjI6uCNMCPBBeN6QJ7uIQK64zHBZhejcPq0EN PU5D9w==
      ;; Received 1205 bytes from 2001:500:12::d0d#53(g.root-servers.net) in 45 ms
      
      flickr.com.             172800  IN      NS      ns-573.awsdns-07.net.
      flickr.com.             172800  IN      NS      ns-421.awsdns-52.com.
      flickr.com.             172800  IN      NS      ns-1683.awsdns-18.co.uk.
      flickr.com.             172800  IN      NS      ns-1244.awsdns-27.org.
      CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q2D6NI4I7EQH8NA30NS61O48UL8G5 NS SOA RRSIG DNSKEY NSEC3PARAM
      CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20230311052252 20230304041252 36739 com. exNISCQI4v/S0m9ksCZH3zghILb9b1aARin3TLpc3yxNweWFzrozuCSm GnYNeNNy8OjdvPFw3/uue0qCY6vux7LlhCALbK4pGq58BFz2p7JZz7Um dCN3AnraZXWMhkG80d0ovafSyqOLPwBMg6rGXJyQnvFDkA2Y46ClZhOz r6PU3UvTEmtsa1IDaG8UeDdySojtSmMjSqaEepy7US86Gg==
      8AEGLV925R77BHJM7FFD4RKA8CGTNSFK.com. 86400 IN NSEC3 1 1 0 - 8AEGTREIKABQ6N53PE432PFN3BMU2HM1 NS DS RRSIG
      8AEGLV925R77BHJM7FFD4RKA8CGTNSFK.com. 86400 IN RRSIG NSEC3 8 2 86400 20230309061449 20230302050449 36739 com. kejD8AEDs1s8jUO2xUTJ2IN6Bgh2A5ItECrYExvbbQYZzSSnlbPEzyL7 n6uDtlE6TrYpOU/uH4wM+0Pt/USS4EmSUty+07+RF4hoM512BfYkUjxj QpQoLeFTRh3oFtFUQfQgYPD5oVJOtFcGErUhJ3lz3J4y9yavaa9phYxu Web4Fx3MJlvsA67u7Kp9NlrfTiF0JXHfqXBLyhXDbWM+zQ==
      ;; Received 745 bytes from 192.55.83.30#53(m.gtld-servers.net) in 19 ms
      
      www.flickr.com.         60      IN      A       99.84.171.73
      flickr.com.             300     IN      NS      ns-1244.awsdns-27.org.
      flickr.com.             300     IN      NS      ns-1683.awsdns-18.co.uk.
      flickr.com.             300     IN      NS      ns-421.awsdns-52.com.
      flickr.com.             300     IN      NS      ns-573.awsdns-07.net.
      ;; Received 196 bytes from 205.251.196.220#53(ns-1244.awsdns-27.org) in 27 ms
      
      [23.01-RELEASE][admin@sg4860.local.lan]/: 
      

      And yes with dig that is a FULL resolve - this would be the slowest lookup of anything, because its a full resolve, down from the roots.. Even once the ttl for www.flickr.com expires - which 60 second ttl is just Fing insane.. You would only have to go talk to the NS for flickr.com directly..

      Going back to 22.05 doesn't really tell you what the problem is - keep in mind unbound changed from like 1.15 to 1.17.1

      I would troubleshoot the exactly problem vs rolling back all of pfsense.. Which really gives you no where to even start to what the actual problem is..

      I can tell you right now there is nothing wrong with 23.01 or unbound 1.17.1 at least how I have mine configured.. Because again I have zero issues. maybe something specific with your hardware, your config, your connection, etc..

      You know when you rollback - is when you are in a limited change window to update.. And something is not working and the change window is expiring.. And you hit the rollback mark, that is when you rollback ;)

      There is not really a website on the planet anymore that loads just www.domain.tld -- they are all going to load sub domains or other resources off other domains, etc.. So call up your browser tools... How long does the page take to load?

      I am not seeing any issues with www.flickr.com loading.. But then all I get is a page saying start for free.. What exactly are you loading..

      If call up the browser console - I see quite a few "errors" etc.. but nothing look dns related - and the page popped pretty much instant and the background keeps changing pictures instantly.

      console.jpg

      If I view what is going on in the network and how long stuff takes - I see my ad blocker is blocking some stuff

      blocke..jpg

      But don't see anything failing from dns, or time out, etc.

      Click the timing button - what does it show for dns resolution.. Clear your browser cache, restart unbound so its cache is clear, clear your os cache as well windows ipconfig /flushdns etc..

      dns.jpg

      An intelligent man is sometimes forced to be drunk to spend time with his fools
      If you get confused: Listen to the Music Play
      Please don't Chat/PM me for help, unless mod related
      SG-4860 24.11 | Lab VMs 2.7.2, 24.11

      T D 2 Replies Last reply Reply Quote 1
      • T
        terryzb @johnpoz
        last edited by

        @johnpoz As a pfSense and networking newbie, I just wanted to thank you John for your in-depth explanations, with pictures even! Much appreciated!

        1 Reply Last reply Reply Quote 0
        • areckethennuA
          areckethennu @johnpoz
          last edited by areckethennu

          @johnpoz said in 23.01 breaks DNS resolver and pFblocker:

          @gertjan good settings, another one I might suggest is serve zero. I have been using it for years, and have never had a problem with it.

          serve 0 allows unbound to serve up the last IP it had in cache for some fqdn even if the ttl has expired. The ttl it hands off to the client will be 0, so if for some reason that doesn't work the client will have to ask again for that fqdn, and by this time unbound will have looked it up.

          An aside from the crux of this thread, but I saw that and I'd like to try it. Is this the actual entry? It looks like it, but I thought I'd ask:

          Serve Expired [] Serve cache records even with TTL of 0
          When enabled, allows unbound to serve one query even with a TTL of 0, if TTL is 0 then new record will be requested in the background when the cache is served to ensure cache is updated without latency on service of the DNS request.
          

          I'm just a home user with pfSense 23.09-RELEASE (amd64) on a Protecli VP2410

          johnpozJ 1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator @areckethennu
            last edited by

            @areckethennu yeah that is the serve 0 setting.. Check the box..

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

            1 Reply Last reply Reply Quote 0
            • L
              llebgrate
              last edited by

              Just upgraded to 23.01 myself and am experiencing DNS resolver issues. A quick restart of the service fixes it for clients. Wish I had saved the nslookup error on a domain I was testing. Can't recall the exact wording, but it was one I had not seen before about a server error.

              I did enable "Serve Expired" in the advanced DNS resolver settings and will monitor to see if it happens again.

              1 Reply Last reply Reply Quote 0
              • D
                Draco @johnpoz
                last edited by

                @johnpoz What I see when I log onto Flickr:

                This site can’t be reached
                www.flickr.com’s server IP address could not be found.
                Try:
                • Checking the connection
                • Checking the proxy, firewall, and DNS configuration
                ERR_NAME_NOT_RESOLVED

                Then after 30+ seconds I see the Flickr homepage frame sans content, and things still appear to be loading after 2 minutes (but do not complete). Once I hit refresh (using Chrome), it loads right up, no problems.

                When I open the Chrome debug window (note that I do not see the same thing you show in your screenshots, e.g. no TIMINGS menu) the timings on the initial page load don't seem too bad:

                netgate initial timings.png

                I am not seeing the 0 time for DNS; I suppose because it was not yet cached (I restarted Unbound and cleared my local DNS cache). But then, things get ugly when some fonts and scripts try to load:
                Netgate scripts loading.png

                The flickr.com load time is what is detailed above. The font failures are not, I suspect, fatal. I think the problem is the combo?yui:3.16.0/yui.../loader-hermes/... line. This references combo.static.flickr.com. When this load fails (seems to be jscript?), it torpedoes the rest of the website load (and accounts for the hang I see before refresh). My guess is this script does some init work and then loads other scripts.

                When I hit REFRESH in the "hung" browser window, Flickr loads as noted above. The big difference is combo?yui?3.16.10…. call to that first Hermes URL (whatever that is -- scripts is my guess, as noted above) succeeds, and whatever that loads leads to a long list of more Hermes calls that also succeed. The website now loads properly.

                I tried doing DIGs on flickr.com and combo.static.flickr.com (the URL for the Hermes URLs). They do not look alike, and I do not know enough about DIG to interpret that (happy to upload if you want a look).

                I tried running these same tests on my older computer (slower), and I never run into the website load problems. My best guess is that my newer, faster computer is timing out. I did Malware and AV scans on my newer computer to ensure that was not causing issues. So the computer used to load the website makes a difference. I have replicated these problems using both Chrome and Firefox, and on multiple websites (see ironic note below). I've only drilled deeper on Flickr.

                FWIW, when I run a Windows console app to do simple DNS queries (not DIG, just one call), I've seen these fail for google.com and other common sites too. This is why I suspected DNS issues. It now appears that it is not quite that simple, and is tied to machine speed somehow (or software config, or both or...?).

                At this point I am well outside my toolset and knowledge (give me local code and a debugger and it's a different story), and have spent far more time than I have to spare on this issue. I can't have a production machine failing to load websites, and worse failing to do online backups or software updates, for the sake of finding this problem. If Netgate wants to get involved, then I'd be happy to set aside some time to work through this with them (I have the current 23.01 config on another USB stick). I remain hopeful that going back to 22.05 will have my production environment working again.

                I do want to thank you for the time and effort you put into your responses. Without your screenshots and comments I would not have tried DIG or getting the timings included here.

                On the side of irony, when I loaded the forum to enter this response, I hit the same This site can't be reached error. I think the difference between this and Flickr is that the forum isn't loading a script whose failure to load leaves the website non-functional a la Flickr. After a few seconds, the Netgate forum site just loads up and resolves normally. Given my tests with the DNS Query code I wrote (see google.com DNS query failure noted above), it is clearly related to DNS and timing issues; I just cannot pinpoint how.

                D 1 Reply Last reply Reply Quote 0
                • D
                  Draco @Draco
                  last edited by

                  @draco Replying to my last post: I decided to try a reboot from the Console before re-applying 22.05. Someone mentioned rebooting on another thread, which I had not tried because pfSense reboots as part of the upgrade. But I tried it anyhow...

                  So far my SG-5100 has been up for almost an hour and I have not repro'd the Flickr problem. What would rebooting change that leaves things working all of a sudden? My primary PC was not rebooted or changed, just the SG-5100.

                  1 Reply Last reply Reply Quote 0
                  • L
                    llebgrate
                    last edited by

                    Just following up that I tried the Serve Expired setting and a simple reboot and unfortunately the problem still persists.

                    windows client nslookup:

                    forum.netgate.com
                    Server: firewall.blah.com
                    Address: 192.168.150.1
                    *** firewall.blah.com can't find forum.netgate.com: Server failed

                    restart DNS Resolver service

                    forum.netgate.com
                    Server: firewall.blah.com
                    Address: 192.168.150.1

                    Non-authoritative answer:
                    Name: forum.netgate.com
                    Addresses: 2610:160:11:18::199
                    208.123.73.199

                    I did have telegraf scraping stats from the resolver as well, but have since turned it off and will continue to monitor.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      No errors in the resolver log when it's failing to resolve?

                      If you turn up the logging does it at least show the incoming requests?

                      L 1 Reply Last reply Reply Quote 0
                      • L
                        llebgrate @stephenw10
                        last edited by llebgrate

                        @stephenw10

                        I actually just caught it again. An entire page filled with these:

                        Mar 6 20:18:33	unbound	84264	[84264:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
                        Mar 6 20:18:32	unbound	84264	[84264:1] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
                        Mar 6 20:18:32	unbound	84264	[84264:3] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
                        Mar 6 20:18:32	unbound	84264	[84264:3] info: generate keytag query _ta-4f66. NULL IN
                        

                        Found this: https://forum.netgate.com/topic/152338/unbound-failed-to-prime-trust-anchor-could-not-fetch-dnskey-rrset-dnskey-in and am thinking it's dnssec related. I do in fact have forwarding and dnssec enabled, so going to play with the settings for a bit. Might also mess with dynamic dhcp client reg options. Haven't changed anything here though since the upgrade in a very very long time.

                        GertjanG 1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Hmm. Yes, that's DNSSec. I would at least try disabling and see if that removes the issue.
                          That really shouldn't be a problem though...

                          1 Reply Last reply Reply Quote 0
                          • GertjanG
                            Gertjan @llebgrate
                            last edited by

                            @llebgrate said in 23.01 breaks DNS resolver and pFblocker:

                            I actually just caught it again. An entire page filled with these:

                            Mar 6 20:18:33 unbound 84264 [84264:0] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN

                            Before unbound is started, some house keeping is done.
                            unbound is started with a single command that asks it to download a copy of the DNSSEC root key file. Here you can see that file, at the top.
                            One of the tasks is : prepare a good know copy of root DNSKEY, id 20236 (for now, as it can change when needed).

                            The thing is, and this is probably your real issue :
                            It can't !!
                            This means your unbound isn't able to download a small file, 1 kilo byte file (here it is) from the Internet.
                            That's not promising. Why would it have to try many times ?
                            This smells 'uplink issues'.

                            When you see :

                            info: generate keytag query _ta-4f66. NULL IN
                            

                            you know the root key file has been downloaded successfully.
                            Because hex 4f66 is 20326 decimal, the key ID.

                            @llebgrate: good news : because you are forwarding, you have to trust the resolver you are forwarding to, you can disable DNSSEC.
                            Still, it might be worthwhile why unbound has issues getting 'stuff' from the Internet.
                            Something is impacting your traffic that was generated by unbound. That your DNS traffic, it's not much but very important.

                            No "help me" PM's please. Use the forum, the community will thank you.
                            Edit : and where are the logs ??

                            L 1 Reply Last reply Reply Quote 0
                            • L
                              llebgrate @Gertjan
                              last edited by llebgrate

                              @gertjan appreciate the detailed reply.

                              After some diagnostics on my end, it does not appear to be DNSSEC settings (I've re-enabled it w/out issue) but rather the Use SSL/TLS for outgoing DNS Queries to Forwarding Servers. I currently use Google DNS (8.8.8.8/8.8.4.4 > dns.google) and have not had any issues in many years with this enabled so not sure what happened since the upgrade. I have read that this setting is generally incompatible with DNSSEC, so I've unchecked both for now and everything is working just fine.

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Generally you would not have DNSSec enabled with DoT but only because you will be in forwarding mod for DoT. You should be able to use them together but it's likely far less tested because there's little point.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.