Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    HA DNS/Unbound Fails on Backup Node after CARP Failover (pfSense 2.8.0)

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    5 Posts 2 Posters 103 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • empbillyE Offline
      empbilly
      last edited by

      Hello everyone,

      I am running two identical pfSense CE 2.8.0 appliances in a CARP High Availability (HA) setup. Both firewalls are physical appliances with multiple VLANs and virtual IPs configured for each segment. We are now using Unbound (DNS Resolver) for internal DNS resolution.

      Scenario:

      • Each node is configured for HA using CARP, with a dedicated SYNC interface.
      • Each VLAN/subnet uses a dedicated CARP VIP as the default gateway, and all CARP states appear to transition correctly during failover.
      • DHCP is handled by Kea (IPv4), with clients set to use the CARP VIP for DNS.
      • Unbound is enabled on both nodes, set to listen on All interfaces.
      • Outbound NAT is set to Manual, with explicit rules to NAT “This Firewall” and “127.0.0.0/8” to the WAN CARP VIP for port 53, as well as rules for all internal subnets.
      • Firewall rules in all relevant LANs allow UDP/TCP 53 to any (for troubleshooting).
      • HA sync is enabled for rules/NAT/etc, and all relevant configs are checked as identical.

      Issue:
      When I put the primary node into CARP persistent maintenance mode, the backup node becomes MASTER for all CARP VIPs (verified via Status > CARP and via ifconfig). However, clients immediately lose DNS resolution. The VIPs are correctly assumed, but DNS requests to the VIP are not answered.

      • Unbound is up and running on the backup, listening on all interfaces, including the CARP VIPs (sockstat and netstat confirm bind on port 53).
      • Outbound NAT rules ensure all traffic from the firewall itself and 127.0.0.0/8 to port 53 is NAT’d to the WAN CARP VIP, and these rules are at the top of the list.
      • No firewall blocks are logged; rules are set to log all port 53 traffic for visibility.

      Additional info:

      • This only occurs after a failover event. DNS on the primary node (before maintenance) works flawlessly.
      • Unbound is NOT set to use “strict interface binding.”
      • Syncing settings via XMLRPC works fine for rules and NAT.

      Troubleshooting steps tried:

      • Restarting Unbound on the backup after failover resolves the issue (but obviously not practical in production).
      • Switching “Network Interfaces” in Unbound between “All” and explicit selection (including all VIPs and LANs) does not help.
      • Re-applying firewall rules and NAT rules post-failover (no effect).
      • Adjusting/refreshing Outbound NAT (no effect).

      Questions:

      1. Is this a known issue with Unbound and CARP VIPs in pfSense 2.8.0? Is there any workaround to avoid having to manually restart Unbound after each failover?
      2. Is there any hidden setting or system tunable that controls Unbound’s interface binding after CARP VIP transition?
      3. Should I consider using “Bind only to CARP VIPs” instead of “All” in the Unbound config?
      4. Any other troubleshooting suggestions for making Unbound always respond correctly to VIP traffic after failover?

      Any guidance or insight would be greatly appreciated. I’m happy to provide logs, packet captures, or configs if needed.

      Thanks in advance!

      https://eliasmoraispereira.wordpress.com/

      N 1 Reply Last reply Reply Quote 0
      • N Online
        netblues @empbilly
        last edited by

        @empbilly Just tested it on 25.07rc (which is almost the same as 2.8.0)
        Did a manual failover and tried nslookup using the lan vip

        unbound worked with no apparent issues.
        Unbound listens on all interfaces, uses all wan interfaces for queries and python mode due to pfblockerng.
        No special configuration exists.

        I suggest running nslookup directly to the lan interfaces (and not the vip) via nslookup or dig
        and see what happens before and after the failover.
        The secondary should answer requests at all times on its local lan interface.

        empbillyE 1 Reply Last reply Reply Quote 0
        • empbillyE Offline
          empbilly @netblues
          last edited by

          @netblues

          One thing I forgot to mention in the first post is that for some VLANs, the DNS is the IP address of our Active Directory.

          I don't know if that's the reason for the problem, especially since it works normally on pfmaster.

          nslookup and dig return an error saying they couldn't resolve the domain, for example, google.com.

          https://eliasmoraispereira.wordpress.com/

          N 1 Reply Last reply Reply Quote 0
          • N Online
            netblues @empbilly
            last edited by

            @empbilly What can't resolve?

            Active directory dns has nothing to do with what we are testing

            empbillyE 1 Reply Last reply Reply Quote 0
            • empbillyE Offline
              empbilly @netblues
              last edited by

              @netblues

              The problem was with the outbound NAT rules. I had disabled our AD's VLAN so that it would connect to the internet using its own IP address rather than CARP, but I didn't realize that this would interfere.

              After enabling it, it worked correctly.

              Thanks for your help!

              https://eliasmoraispereira.wordpress.com/

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.