HA DNS/Unbound Fails on Backup Node after CARP Failover (pfSense 2.8.0)
-
Hello everyone,
I am running two identical pfSense CE 2.8.0 appliances in a CARP High Availability (HA) setup. Both firewalls are physical appliances with multiple VLANs and virtual IPs configured for each segment. We are now using Unbound (DNS Resolver) for internal DNS resolution.
Scenario:
- Each node is configured for HA using CARP, with a dedicated SYNC interface.
- Each VLAN/subnet uses a dedicated CARP VIP as the default gateway, and all CARP states appear to transition correctly during failover.
- DHCP is handled by Kea (IPv4), with clients set to use the CARP VIP for DNS.
- Unbound is enabled on both nodes, set to listen on All interfaces.
- Outbound NAT is set to Manual, with explicit rules to NAT “This Firewall” and “127.0.0.0/8” to the WAN CARP VIP for port 53, as well as rules for all internal subnets.
- Firewall rules in all relevant LANs allow UDP/TCP 53 to any (for troubleshooting).
- HA sync is enabled for rules/NAT/etc, and all relevant configs are checked as identical.
Issue:
When I put the primary node into CARP persistent maintenance mode, the backup node becomes MASTER for all CARP VIPs (verified via Status > CARP and via ifconfig). However, clients immediately lose DNS resolution. The VIPs are correctly assumed, but DNS requests to the VIP are not answered.- Unbound is up and running on the backup, listening on all interfaces, including the CARP VIPs (
sockstat
andnetstat
confirm bind on port 53). - Outbound NAT rules ensure all traffic from the firewall itself and 127.0.0.0/8 to port 53 is NAT’d to the WAN CARP VIP, and these rules are at the top of the list.
- No firewall blocks are logged; rules are set to log all port 53 traffic for visibility.
Additional info:
- This only occurs after a failover event. DNS on the primary node (before maintenance) works flawlessly.
- Unbound is NOT set to use “strict interface binding.”
- Syncing settings via XMLRPC works fine for rules and NAT.
Troubleshooting steps tried:
- Restarting Unbound on the backup after failover resolves the issue (but obviously not practical in production).
- Switching “Network Interfaces” in Unbound between “All” and explicit selection (including all VIPs and LANs) does not help.
- Re-applying firewall rules and NAT rules post-failover (no effect).
- Adjusting/refreshing Outbound NAT (no effect).
Questions:
- Is this a known issue with Unbound and CARP VIPs in pfSense 2.8.0? Is there any workaround to avoid having to manually restart Unbound after each failover?
- Is there any hidden setting or system tunable that controls Unbound’s interface binding after CARP VIP transition?
- Should I consider using “Bind only to CARP VIPs” instead of “All” in the Unbound config?
- Any other troubleshooting suggestions for making Unbound always respond correctly to VIP traffic after failover?
Any guidance or insight would be greatly appreciated. I’m happy to provide logs, packet captures, or configs if needed.
Thanks in advance!
-
@empbilly Just tested it on 25.07rc (which is almost the same as 2.8.0)
Did a manual failover and tried nslookup using the lan vipunbound worked with no apparent issues.
Unbound listens on all interfaces, uses all wan interfaces for queries and python mode due to pfblockerng.
No special configuration exists.I suggest running nslookup directly to the lan interfaces (and not the vip) via nslookup or dig
and see what happens before and after the failover.
The secondary should answer requests at all times on its local lan interface. -
One thing I forgot to mention in the first post is that for some VLANs, the DNS is the IP address of our Active Directory.
I don't know if that's the reason for the problem, especially since it works normally on pfmaster.
nslookup and dig return an error saying they couldn't resolve the domain, for example, google.com.
-
@empbilly What can't resolve?
Active directory dns has nothing to do with what we are testing
-
The problem was with the outbound NAT rules. I had disabled our AD's VLAN so that it would connect to the internet using its own IP address rather than CARP, but I didn't realize that this would interfere.
After enabling it, it worked correctly.
Thanks for your help!