Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    DHCP load-balancing (failover enabled) causing hostnames to be unavailable for resolution by Unbound

    Scheduled Pinned Locked Moved DHCP and DNS
    6 Posts 4 Posters 858 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      foobert
      last edited by foobert

      I have a moderately large lab environment (>100 hosts) that I recently transitioned to using an HA pfsense (2.4.4_1) cluster for DHCP & local name resolution services.

      In dhcpd.conf, I see that the failover peers default to a load-balancing arrangement (ala the: "split 128;" setting). This rather surprised me as all the other routing features in an HA cluster are very much an active-standby relationship.

      Thus, with each dhcpd peer only responding to half of the local hosts -- unbound is stuck with only half the picture and cannot "authoritatively" resolve all the local host names.

      My preference was to have the local hosts simply use the cluster's LAN VIP as their DNS to keep things simple and reliable via cluster failover.

      Am I missing something stupid, or is this an oversight in the behind-the-scenes failover dhcpd configuration?

      Any thoughts on the best way to work-around this?

      1 Reply Last reply Reply Quote 1
      • P
        peoplex
        last edited by

        I'm battling this same issue on 2.4.4 and it is driving me crazy! All the other HA features work great for me.

        I have 2 ESXi nodes with a pfSense node on each with a CARP VIP for each VLAN.

        Both my DHCP servers are in the 'normal' state. Each DHCP server is assigning IPs to half of my VMs. Each Unbound instance can only resolve half the hosts, the rest I get this with nslookup
        ** server can't find <hostname>: NXDOMAIN

        I'm assuming the issue is caused by DHCP only registering hosts to the local Unbound instance, and not the other one. I'm also assuming there is no DNS zone replication in unbound :(

        My thoughts at this point is to either create some replication script of my own, or move DNS+DHCP to a pair of Windows servers. What were you using prior to pfsense?

        1 Reply Last reply Reply Quote 0
        • jimpJ
          jimp Rebel Alliance Developer Netgate
          last edited by

          Unfortunately this is 100% on the ISC DHCP daemon. The failover mechanism doesn't (always?) share hostnames between peers, so both nodes don't know about all the hostnames, so they can't publish them all in DNS.

          The best workaround is to have a real BIND or other suitable DNS server and then have DHCP DNS registration setup that way, rather than relying on the firewall to handle it.

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator
            last edited by

            My take on resolving dhcp clients, if you want/need to resolve dhcp clients.. You might as well setup a reservation for this client so you always know what its IP is, and it will not change and can just setup actual dns entry for it vs having to deal with dhcp registering anything.

            Just another way to skin the cat ;)

            If your running a lab with over 100 devices, and you want to be able to resolve their names, etc. Its prob time to run real authoritative Name Services and have either the dhcpd register the names, or even the client themselves, etc.

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

            F 1 Reply Last reply Reply Quote 0
            • P
              peoplex
              last edited by

              After a lot of testing I think I fixed my resolutions. For me it was NTP!

              One of my nodes was out by a good margin. I know this causes issues with dhcpd. I think this put the cluster in a weird state where they both handed out IPs but would not exchange hostname data. I fixed this yesterday but it wasn't until today when I restarted VMs en masse that resolutions started working.

              @foobert maybe worth checking the time on each pfsense node?

              Thanks for the informative replies @jimp and @johnpoz . I understand the mechanism a lot better now!

              1 Reply Last reply Reply Quote 0
              • F
                foobert @johnpoz
                last edited by foobert

                @jimp -- Thanks for the confirmation on what I'm seeing. I suppose I should follow up with ISC.

                @johnpoz I completely respect that point of view on reservations. It's just not realistic when I have a dozen worker bees setting up/tearing down stuff every day. They need autonomy w/o getting me involved constantly.

                At this point, I'm strongly considering going back to dnsmasq -- it worked flawlessly for this. I may absorb the headache of running BIND, but, I'm not sure its really worth the HA benefit that prompted the change in the first place. "don't fix what isn't broken" ¯\(ツ)/¯

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.