Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Strange DNS issue for internal clients...

    Scheduled Pinned Locked Moved DHCP and DNS
    13 Posts 3 Posters 142 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E
      ericwentz
      last edited by

      Hi All, I'm running pfsense+ 24.11-RELEASE on a Protectli vault. In the system DNS settings I have two servers set up -one from Cloudflare and the other from Level3. My DHCP is set up with "Enable DNS registration" and "Enable early DNS registration" checked. I'm using the DNS Resolver service.
      First off, the DNS resolution to external sites is rock solid. The issue comes with my internal devices. I have the majority of them set up with pseudo fixed IPs, i.e. I'm using the mac address to bind to the IP address.
      So much for the background, here's the issue: the name resolution of my internal devices is intermittent. Sometimes I get a valid ip (using nslookup from my Mac Mini) and other times, I get "Can't find server.domain.com: No answer." Then another test will return the IP address again. This behavior is the same when I use the pfsense "Diagnostics / DNS Lookup" tool. I've also had the same results from my Macbook Pro.
      I'd appreciate it if anyone could shed some light on this issue. Please let me know if I can supply any additional information about my setup.
      Thanks
      Eric

      GertjanG johnpozJ 2 Replies Last reply Reply Quote 0
      • GertjanG
        Gertjan @ericwentz
        last edited by

        @ericwentz

        Like this :

        92ad2eca-9a0b-40bd-9fd5-fbec87af9aa9-image.png

        So can I presume you use kea and not ISC DHCP ?

        Check your /etc/hosts file.
        It looks correct ?

        @ericwentz said in Strange DNS issue for internal clients...:

        Sometimes I get a valid ip (using nslookup from my Mac Mini) and other times, I get "Can't find server.domain.com: No answer." Then another test will return the IP address again. This behavior is the same when I use the pfsense "Diagnostics / DNS Lookup" tool.

        Humm, if even the "Diagnostics / DNS Lookup" gives no answer, this means that it could conrtact unbound, the Resolver.
        Or, that process is always in the running starte, so 127.0.0.1:53 will answer to any questions.

        Check you Status > System Logs > System > DNS Resolver log file and locate the word "start", does it start (thus restart) often ? This can happen every time if there is an pfSense network interface that goes down and then up again.

        Example :

        When I set up a DHCP static lease on my LAN DHCP server like this :

        ff41e821-6feb-4b8e-905b-8347f3350d36-image.png

        then from now on, it info will also exist in the /etc/hosts file :

        fd7678a9-4932-45e8-8d81-1d5546c2ac7a-image.png

        and this file is 'integrated' into unbound, the Resolver, so :

        c57cdd49-8391-4100-b01b-ea7f7d356722-image.png

        Extra info : whe dointg this :

        @ericwentz said in Strange DNS issue for internal clients...:

        I get a valid ip (using nslookup from my Mac Mini) and other time

        be sure that the request is send to the pfSense LAN IP, and not some 8.8.8.8 or other DNS server.

        Just to be sure :
        Packet capture on your LAN port, set it up with "all the details", TCP and UDP, port 53, and IP address = 192.168.1.1 or whatever LAN pfSense IP you have.
        Now capture.
        Do a nslookup on your mini mac and you should see the packet with the DNS request.

        You can also go wild with the Resolver log details : Pick any : the higher the better :

        c075df37-a1cf-4203-a3f2-dc7a4d6e12ee-image.png

        and you'll see your DNS request coming in, and handled.

        Warning : don't forget to set this back to a normal level like "Level 1" as high levels will produce huge quantities of log lines.

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        1 Reply Last reply Reply Quote 0
        • johnpozJ
          johnpoz LAYER 8 Global Moderator @ericwentz
          last edited by

          @ericwentz said in Strange DNS issue for internal clients...:

          Please let me know if I can supply any additional information about my setup.

          Your clients only point to pfsense IP on your network for dns? When clients point to more than 1 name server your never really sure which one it might ask.

          So for example if your client has say

          8.8.8.8
          192.168.1.1 (pfsense IP)

          And you ask pfsense IP for say server.home.arpa and it knows about this you will get an answer. But if you ask 8.8.8.8 it is not going to have a clue about anything in a home.arpa domain.

          If your going to point your clients to more than 1 IP for dns - you need to be sure that your different IPs can resolve the same stuff.

          If your pointing to only pfsense IP on your clients, and you sometimes get an answer for server.home.arpa and sometimes not, it could be unbound is restarting.. If unbound is in the middle of restarting, it can't answer anything.

          An intelligent man is sometimes forced to be drunk to spend time with his fools
          If you get confused: Listen to the Music Play
          Please don't Chat/PM me for help, unless mod related
          SG-4860 24.11 | Lab VMs 2.7.2, 24.11

          1 Reply Last reply Reply Quote 0
          • E
            ericwentz
            last edited by

            Thanks so much for the prompt replies - much appreciated. I got into the console and was looking at /var/log/dhcpd.log (I used tail -f ). Looks like something is bouncing up and down every few seconds. Here's a snippet out of the log file:
            May 15 14:14:05 fw kea2unbound[26679]: Remove record: "zentrios-ace.{redacted}. 600 IN A XX.X.10.72"
            May 15 14:14:06 fw kea2unbound[26679]: Write include: /var/unbound/leases/leases4.conf (719c5c75ef3cb1c10f9e35886f48ba3fb90b09cb9d4105f510567584cd54c475)
            May 15 14:14:17 fw kea-dhcp4[26043]: WARN [kea-dhcp4.dhcp4.0x636b2812000] DHCP4_MULTI_THREADING_INFO enabled: yes, number of threads: 4, queue size: 64
            May 15 14:14:17 fw kea-dhcp4[26043]: ERROR [kea-dhcp4.commands.0x636b2812000] COMMAND_SOCKET_WRITE_FAIL Error while writing to command socket 26 : Broken pipe
            May 15 14:14:17 fw kea-dhcp4[26043]: ERROR [kea-dhcp4.commands.0x636b2812000] COMMAND_SOCKET_WRITE_FAIL Error while writing to command socket 29 : Broken pipe
            May 15 14:14:17 fw kea-dhcp4[26043]: ERROR [kea-dhcp4.commands.0x636b2812000] COMMAND_SOCKET_WRITE_FAIL Error while writing to command socket 32 : Broken pipe
            May 15 14:14:17 fw kea-dhcp4[26043]: ERROR [kea-dhcp4.commands.0x636b2812000] COMMAND_SOCKET_WRITE_FAIL Error while writing to command socket 28 : Broken pipe
            May 15 14:14:17 fw kea-dhcp4[26043]: ERROR [kea-dhcp4.commands.0x636b2812000] COMMAND_SOCKET_WRITE_FAIL Error while writing to command socket 31 : Broken pipe
            May 15 14:14:31 fw kea2unbound[89935]: Add record: "winserver.{redacted}. 600 IN A XX.X.1.5"

            I had a bit more - but my post was getting marked as spam. But it seems like I'm cycling through a remove/add cycle every few seconds with those warnings and errors included.

            johnpozJ 1 Reply Last reply Reply Quote 0
            • E
              ericwentz
              last edited by

              Okay I think I may have solved it (time will tell). I had set up the service watchdog to keep my DHCP server running - unforunately, it was connected to the OLD DHCP service and kept trying to start it. I'll keep an eye on it for a few days and add a follow-up post if in this was the actual fix for the issue.

              Hope that maybe this may help someone else as well - this was a pain!!
              Eric

              1 Reply Last reply Reply Quote 0
              • johnpozJ
                johnpoz LAYER 8 Global Moderator @ericwentz
                last edited by

                @ericwentz dhcp has nothing to do with dns.. Other than it writing records to unbound.. But not being able to write something, shouldn't prevent unbound from answering for something that it already knows about.

                Unless the failure to write is because unbound is down.

                I personally don't think kea is ready for primetime, at least for my use case.. But these bothers me..

                Remove record: "zentrios-ace.{redacted}. 600 IN A XX.X.10.72"
                Add record: "winserver.{redacted}. 600 IN A XX.X.1.5"

                Why would it be using a 5 minute ttl (600 seconds) for a record it was adding to unbound? Because of a dhcp lease? That is really low ttl, I haven't looked into kea really since first preview..

                Why would it be removing a record, did the lease expire, was it released by the client?

                I got around the issue of isc restarting unbound all the time by not registering dhcp, and only dhcp reservations.. Once a client is going to be on my network I set a reservation so it always has the same IP... If something is just going to be on my network temp, like a guest user to my wifi, or some box working on for someone, etc. I have zero need to resolve it via a fqdn.

                I would hope when kea is ready for primetime they would allow you to adjust what the TTL is of records its going to add via dhcp leases. Most of my leases are like 8 days long - there would be no reason to have a ttl of 5 minutes.. All of mine are min of 1 hour.

                An intelligent man is sometimes forced to be drunk to spend time with his fools
                If you get confused: Listen to the Music Play
                Please don't Chat/PM me for help, unless mod related
                SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                GertjanG 1 Reply Last reply Reply Quote 0
                • GertjanG
                  Gertjan @johnpoz
                  last edited by

                  @johnpoz said in Strange DNS issue for internal clients...:

                  Why would it be using a 5 minute ttl (600 seconds) ...

                  Here is some fresh info about that short TTL.

                  No "help me" PM's please. Use the forum, the community will thank you.
                  Edit : and where are the logs ??

                  johnpozJ 1 Reply Last reply Reply Quote 0
                  • johnpozJ
                    johnpoz LAYER 8 Global Moderator @Gertjan
                    last edited by johnpoz

                    @Gertjan thanks..

                    "Unbound cache with the TTL being one-third of the lease duration. "

                    So your saying he has a 15 minute lease time set?? That seems really low ;)

                    An intelligent man is sometimes forced to be drunk to spend time with his fools
                    If you get confused: Listen to the Music Play
                    Please don't Chat/PM me for help, unless mod related
                    SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                    GertjanG 1 Reply Last reply Reply Quote 0
                    • GertjanG
                      Gertjan @johnpoz
                      last edited by

                      @johnpoz said in Strange DNS issue for internal clients...:

                      So your saying he has a 15 minute ...

                      Not me 😊

                      I looked for TTL/ttl in /usr/local/bin/kea2unbound - I found where the local-data xxxxxxx are created, and these line, imho, can't loose their TTL value as they are known / declared locally, like the revolver"s "Host Overrides" : ones declared they stay valid for live.

                      True is, DHCP leases are always time limited 😊 ..... and bingo, found it - I search with "3" and found it straight away

                      It's RFC defined behaviour.

                      No "help me" PM's please. Use the forum, the community will thank you.
                      Edit : and where are the logs ??

                      johnpozJ 1 Reply Last reply Reply Quote 0
                      • johnpozJ
                        johnpoz LAYER 8 Global Moderator @Gertjan
                        last edited by johnpoz

                        @Gertjan

                        RFC clearly states

                        "but SHOULD NOT be less than 10 minutes. "

                        Yet seems its putting in 5 minutes - by my math, 5 is less than 10 ;)

                        Like I said - not ready for primetime imho. So 2 hour default lease time, 1/3 of that then the ttl put in should be like 40 minutes.

                        An intelligent man is sometimes forced to be drunk to spend time with his fools
                        If you get confused: Listen to the Music Play
                        Please don't Chat/PM me for help, unless mod related
                        SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                        1 Reply Last reply Reply Quote 0
                        • E
                          ericwentz
                          last edited by

                          Okay, closing the issue - removing the Service Watchdog for the old DHCP service and adding an entry for the new kea-dhcp4 service has solved the issue. I've also taken the feedback regarding my TTL times and set them all to 7200. Problem solved. Thanks to all for your generous contributions to helping me work this problem. This forum is great.

                          Best to all - Eric

                          johnpozJ 1 Reply Last reply Reply Quote 0
                          • johnpozJ
                            johnpoz LAYER 8 Global Moderator @ericwentz
                            last edited by

                            @ericwentz where did you set that in kea? If kea is registering the entry? Did you set min ttl in unbound or something?

                            And watchdog sure shouldn't be needed.. It has had series issues in the past.

                            An intelligent man is sometimes forced to be drunk to spend time with his fools
                            If you get confused: Listen to the Music Play
                            Please don't Chat/PM me for help, unless mod related
                            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                            1 Reply Last reply Reply Quote 0
                            • E
                              ericwentz
                              last edited by

                              I've set the DHCP TTL on the "Services->DHCP Server" then to the settings for each individual network under the "Other DHCP Options" - there's an entry called "Default Lease Time" which looks like defaults to 7200 seconds, but I explicitly put this value in - just to be sure.

                              Finally, I've removed the "Service Watchdog" service - I really have not had any issues with services failing, so this is probably unnecessary. I had originally configured it early in my pfsense journey and never looked at it again. Figure no sense in putting any extra load on the FW. -e

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.