Domain overrides not working (was working until I noticed just now)
-
@kevindd992002 All I can say is domain overrides are working fine for me across dnsmasq and openvpn site-to-site tunnels in both 2.5.2 and 2.6.0. I don't have experience with unbound or wireguard. You say your setup "was working" so something changed, and at a guess, a firewall is swallowing those replies or the route back to the querier is now incorrect. Doesn't narrow it down much I know, but each of those possibilities can be investigated with tests (log default block/pass rules, add particular rules to log, and use packet capture on Pfsense till you find those reply packets and then follow them till they go awry. If you're not sure the queries are arriving in the first place and generating reply packets, start with that leg first. Dnsmasq can log all queries whcih makes that part easy; I don't know about unbound. good luck!
-
Ok, I've had time to work on this again and I've isolated the problem to the home.arpa domain! When the query from the site 2 and 3 pfsense boxes are directed to the home.arpa domain, unbound doesn't even use the override rule at all (proven by packet capture logs). It just simply drops the packets.
Site 1 = home.arpa
Site 2 = condo.arpa
Site 3 = jojo.arpaIf I create a domain override for any other domain on Site 2 and 3 and point it to the DNS resolver in Site 1, everything works fine! The queries from the site 2 and 3 pfsense boxes (localhost) are being forwarded just fine.
So this has to be a change on the pfsense software side disallowing the use of home.arpa domain overrides for some reason, though I'm not sure what the reason behind this is because that domain is very known to be used for, well, "home" use.
-
Any help here please?
-
@kevindd992002 Then change Site1 also and look if your assumption was correct?
-
@bob-dig change Site 1 to what? Are you suggesting changing the whole local domain of site 1?
-
Also, the packet captures on both sites 2 and 3 do not show the packets being forwarded to the destination dns server when the domain override is home.arpa. So technically you can ignore site 1.
-
This post is deleted! -
@kevindd992002 Have you added the other subnets to the Access List in site 1?
"By default, IPv4 and IPv6 networks residing on internal interfaces of this firewall are permitted. Additional networks must be allowed manually." -
@steveits yes, I did. Also, the wireguard service does this automatically for you. And like I said, it works when I forward to the site1 dns resolver for another domain, do ACLs are not the problem.
-
@kevindd992002 Hmm, home.arpa is a special domain (https://www.iana.org/domains/arpa) so maybe that is confusing things? I wonder if myhome.home.arpa or something like that would behave differently.
-
@steveits said in Domain overrides not working (was working until I noticed just now):
@kevindd992002 Hmm, home.arpa is a special domain (https://www.iana.org/domains/arpa) so maybe that is confusing things? I wonder if myhome.home.arpa or something like that would behave differently.
Exactly. home.arpa is what people should be using in their home environment. That is its purpose. And I'm 100% sure this was working not too long ago so something changed with unbound in maybe 2.5.2 and above.
-
@jimp I replied to the bug I filed here:
https://redmine.pfsense.org/issues/13065?tab=history
-
Any other ideas here?
When I try to do a DNS Lookup on the firewall of sites 2 and 3, I don't even see the home.arpa domain being listed under Status -> DNS Resolver. This is another indication that the query is somehow being dropped if it's for the home.arpa domain.
-
@kevindd992002 said in Domain overrides not working (was working until I noticed just now):
I don't even see the home.arpa domain being listed under Status -> DNS Resolver
Well then you don't have your domain override setup... Once you setup a domain overrride it would be there all the time, ask unbound how it would lookup home.arpa.
Actually do a query and it would be listed..
Once you do a lookup - it would be listed in the dns resolver status.
And then if I ask unbound again - it would be listed in cache.
Be it that IP even answers for that domain or not, etc.. - mine sure doesn't - I don't even have 192.168.9.42 device..
-
As I mentioned on the Redmine entry there is nothing special about home.arpa in pfSense other than it being the default domain name under System > General Setup. When it is that domain, it has special settings in unbound automatically but if you have changed that then it wouldn't treat it any differently.
You'll need to post a lot more of your setup here. It could be any number of things. Missing routes in the routing table for the firewall itself to reach places both ways. Missing ACLs in Unbound to allow queries from the other sites. Something wrong in your unbound config or domain override. There are lots of moving parts to get this working between sites and it's even harder with WireGuard since more of it is manually managed than with other methods.
- Check the routing table on each node and ensure it has routes over the appropriate WireGuard interfaces for the appropriate destinations
- Check the WireGuard interface firewall rules to ensure the traffic will pass between the hosts (remember to cover both the LAN(s) and the WireGuard interface addresses)
- Check if you can ping the remote firewall LAN addresses with a source of Localhost and the LAN since that's how you setup Unbound, e.g.
ping -S 127.0.0.1 <other fw LAN IP address>
andping -S <this LAN IP address> <other fw LAN IP address>
- Check Services > DNS Resolver, Access Lists tab and ensure there are entries there for the other firewall LANs and the WireGuard interface subnets. Some of those may be automatically added, check
/var/unbound/access_lists.conf
to confirm - When you ping or send traffic across, check the contents of the state table to ensure the states are on the correct interfaces with the expected addresses
- Your outbound NAT rules are over-matching, they will NAT traffic out an interface with its own address, which can break some things. You have it set to port 53 but even so it's better to make sure you aren't doing it unnecessarily. Make a specific rule for localhost as a source that will NAT all outbound, not just port 53. You shouldn't need to NAT traffic from the LAN that should be handled by routing, no need for NAT.
- Compare the contents of
/var/unbound/host_entries.conf
and/var/unbound/domainoverrides.conf
and look for instances of the domains in question and ensure they match up as expected.
If all else fails, from all of the firewalls involved post the entire contents of
/var/unbound/unbound.conf
,/var/unbound/domainoverrides.conf
,/var/unbound/host_entries.conf
,/var/unbound/access_lists.conf
, the output ofifconfig -a
andnetstat -rn
along with the contents of/tmp/rules.debug
(at least for the wireguard interfaces and localhost). You can redact private info as long as it's done consistently so that people can identify the same address in different places (e.g. 192.168.10.x -> xxx.xxx.xx.x, 192.168.20.x -> xxx.xxx.yy.x, and so on). -
@johnpoz said in Domain overrides not working (was working until I noticed just now):
@kevindd992002 said in Domain overrides not working (was working until I noticed just now):
I don't even see the home.arpa domain being listed under Status -> DNS Resolver
Well then you don't have your domain override setup... Once you setup a domain overrride it would be there all the time, ask unbound how it would lookup home.arpa.
Actually do a query and it would be listed..
Once you do a lookup - it would be listed in the dns resolver status.
And then if I ask unbound again - it would be listed in cache.
Be it that IP even answers for that domain or not, etc.. - mine sure doesn't - I don't even have 192.168.9.42 device..
I do have it though:
[2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: unbound-control -c /var/unbou nd/unbound.conf lookup home.arpa The following name servers are used for lookup of home.arpa. forwarding request: Delegation with 0 names, of which 0 can be examined to query further addresses. It provides 1 IP addresses. 192.168.10.1 not in infra cache.
But I still don't see anything in System -> DNS Resolver for either
192.168.10.1
orhome.arpa
. When I do a second lookup, it's not being put into cache too:[2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: unbound-control -c /var/unbound/unbound.conf lookup home.arpa The following name servers are used for lookup of home.arpa. forwarding request: Delegation with 0 names, of which 0 can be examined to query further addresses. It provides 1 IP addresses. 192.168.10.1 not in infra cache.
-
@kevindd992002 seems your not actually doing a query to unbound then... If it did it would answer, or atleast show it in the cache that it talked to your NS your pointing it too.
Lets see your actual dns query to something in home.arpa.
Here I setup dns query logging, and replies in the custom option box
server:
log-queries: yes
log-replies: yesThen I did a query for for something in home.arpa
$ dig @192.168.9.253 something.home.arpa ; <<>> DiG 9.16.27 <<>> @192.168.9.253 something.home.arpa ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 2609 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;something.home.arpa. IN A ;; Query time: 0 msec ;; SERVER: 192.168.9.253#53(192.168.9.253) ;; WHEN: Tue Apr 19 00:28:56 Central Daylight Time 2022 ;; MSG SIZE rcvd: 48
If your not seeing it actually put into the cache, then it never saw a query for it, and never had to cache it..
-
@jimp said in Domain overrides not working (was working until I noticed just now):
As I mentioned on the Redmine entry there is nothing special about home.arpa in pfSense other than it being the default domain name under System > General Setup. When it is that domain, it has special settings in unbound automatically but if you have changed that then it wouldn't treat it any differently.
You'll need to post a lot more of your setup here. It could be any number of things. Missing routes in the routing table for the firewall itself to reach places both ways. Missing ACLs in Unbound to allow queries from the other sites. Something wrong in your unbound config or domain override. There are lots of moving parts to get this working between sites and it's even harder with WireGuard since more of it is manually managed than with other methods.
- Check the routing table on each node and ensure it has routes over the appropriate WireGuard interfaces for the appropriate destinations
- Check the WireGuard interface firewall rules to ensure the traffic will pass between the hosts (remember to cover both the LAN(s) and the WireGuard interface addresses)
- Check if you can ping the remote firewall LAN addresses with a source of Localhost and the LAN since that's how you setup Unbound, e.g.
ping -S 127.0.0.1 <other fw LAN IP address>
andping -S <this LAN IP address> <other fw LAN IP address>
- Check Services > DNS Resolver, Access Lists tab and ensure there are entries there for the other firewall LANs and the WireGuard interface subnets. Some of those may be automatically added, check
/var/unbound/access_lists.conf
to confirm - When you ping or send traffic across, check the contents of the state table to ensure the states are on the correct interfaces with the expected addresses
- Your outbound NAT rules are over-matching, they will NAT traffic out an interface with its own address, which can break some things. You have it set to port 53 but even so it's better to make sure you aren't doing it unnecessarily. Make a specific rule for localhost as a source that will NAT all outbound, not just port 53. You shouldn't need to NAT traffic from the LAN that should be handled by routing, no need for NAT.
- Compare the contents of
/var/unbound/host_entries.conf
and/var/unbound/domainoverrides.conf
and look for instances of the domains in question and ensure they match up as expected.
If all else fails, from all of the firewalls involved post the entire contents of
/var/unbound/unbound.conf
,/var/unbound/domainoverrides.conf
,/var/unbound/host_entries.conf
,/var/unbound/access_lists.conf
, the output ofifconfig -a
andnetstat -rn
along with the contents of/tmp/rules.debug
(at least for the wireguard interfaces and localhost). You can redact private info as long as it's done consistently so that people can identify the same address in different places (e.g. 192.168.10.x -> xxx.xxx.xx.x, 192.168.20.x -> xxx.xxx.yy.x, and so on).Let me put as much information as I can. Let's forget site 3 for now and focus on sites 1 (main) and 2 (remote site). Also, I have to say that the site 1 also has a domain override for condo.arpa and it's working fine. The one that's not working is the site 2 domain override for home.arpa. Both sites have very similar configs.
- Routing tables are fine. I have the static routes set on both sides:
Site1:
Site 2:
- WG interface FW rules are also fine:
Site 1:
Site 2:
NOTE: I have 0.0.0.0/0 there because it is needed when I'm routing local traffic to the Internet via the remote site (or vice versa).
- Ping
From site 1:
[2.6.0-RELEASE][root@pfSense.home.arpa]/root: ping -S 127.0.0.1 192.168.20.1 PING 192.168.20.1 (192.168.20.1) from 127.0.0.1: 56 data bytes 64 bytes from 192.168.20.1: icmp_seq=0 ttl=64 time=6.967 ms 64 bytes from 192.168.20.1: icmp_seq=1 ttl=64 time=4.985 ms 64 bytes from 192.168.20.1: icmp_seq=2 ttl=64 time=5.029 ms 64 bytes from 192.168.20.1: icmp_seq=3 ttl=64 time=4.638 ms [2.6.0-RELEASE][root@pfSense.home.arpa]/root: ping -S 192.168.10.1 192.168.20.1 PING 192.168.20.1 (192.168.20.1) from 192.168.10.1: 56 data bytes 64 bytes from 192.168.20.1: icmp_seq=0 ttl=64 time=6.866 ms 64 bytes from 192.168.20.1: icmp_seq=1 ttl=64 time=4.910 ms 64 bytes from 192.168.20.1: icmp_seq=2 ttl=64 time=4.991 ms 64 bytes from 192.168.20.1: icmp_seq=3 ttl=64 time=4.873 ms
From site 2:
[2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: ping -S 127.0.0.1 192.168.10.1 PING 192.168.10.1 (192.168.10.1) from 127.0.0.1: 56 data bytes 64 bytes from 192.168.10.1: icmp_seq=0 ttl=64 time=5.569 ms 64 bytes from 192.168.10.1: icmp_seq=1 ttl=64 time=4.970 ms 64 bytes from 192.168.10.1: icmp_seq=2 ttl=64 time=4.767 ms 64 bytes from 192.168.10.1: icmp_seq=3 ttl=64 time=4.899 ms [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: ping -S 192.168.20.1 192.168.10.1 PING 192.168.10.1 (192.168.10.1) from 192.168.20.1: 56 data bytes 64 bytes from 192.168.10.1: icmp_seq=0 ttl=64 time=5.584 ms 64 bytes from 192.168.10.1: icmp_seq=1 ttl=64 time=7.065 ms 64 bytes from 192.168.10.1: icmp_seq=2 ttl=64 time=6.707 ms 64 bytes from 192.168.10.1: icmp_seq=3 ttl=64 time=4.988 ms
- ACL's are fine -> see attached files
NOTE: 10.0.3.0/29 and 10.0.3.0/30 are redundant, I know, but I had to put in a manual entry because there seems to be a bug in WireGuard for DNS Resolver ACL's (which I already reported here)
- States look ok and are on the correct interfaces with the expected addresses when pinging from LAN. Here's an example when pinging from localhost:
Site 1:
Site 2:
- I didn't know there was such a thing as over-matching. I thought it's always better to be more specific in NAT or FW rules. What is the disadvantage of over matching? I changed the outbound NAT rules as per your suggestion but want to understand more about why I needed to do this:
Site 1:
Site 2:
- Contents of
/var/unbound/host_entries.conf
and/var/unbound/domainoverrides.conf
-> see attached files
Site 1:
/var/unbound/host_entries.conf
-> no instances of condo.arpa which is expected (because the domain override should take care of this)
/var/unbound/domainoverrides.conf
-> condo.arpa domain override is thereSite 2:
/var/unbound/host_entries.conf
-> no instances of home.arpa which is expected (because the domain override should take care of this)
/var/unbound/domainoverrides.conf
-> home.arpa domain override is thereSo with all these information, I guess I'm already at the "if all else fails" stage. Here are the contents of the files you mentioned for both sites 1 and 2:
-
@johnpoz said in Domain overrides not working (was working until I noticed just now):
@kevindd992002 seems your not actually doing a query to unbound then... If it did it would answer, or atleast show it in the cache that it talked to your NS your pointing it too.
Lets see your actual dns query to something in home.arpa.
Here I setup dns query logging, and replies in the custom option box
server:
log-queries: yes
log-replies: yesThen I did a query for for something in home.arpa
$ dig @192.168.9.253 something.home.arpa ; <<>> DiG 9.16.27 <<>> @192.168.9.253 something.home.arpa ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 2609 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;something.home.arpa. IN A ;; Query time: 0 msec ;; SERVER: 192.168.9.253#53(192.168.9.253) ;; WHEN: Tue Apr 19 00:28:56 Central Daylight Time 2022 ;; MSG SIZE rcvd: 48
If your not seeing it actually put into the cache, then it never saw a query for it, and never had to cache it..
Yes, that's exactly what's happening. From what I can see, the query is not even being generated on the unbound side.
Ok, so I added those custom options and did a query from the site2 pfsense shell as you did:
[2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: dig @127.0.0.1 pfsense.home.arpa ; <<>> DiG 9.16.26 <<>> @127.0.0.1 pfsense.home.arpa ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 53992 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;pfsense.home.arpa. IN A ;; AUTHORITY SECTION: home.arpa. 10800 IN SOA localhost. nobody.invalid. 1 3600 1200 604800 10800 ;; Query time: 1 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Tue Apr 19 13:38:52 PST 2022 ;; MSG SIZE rcvd: 105 [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: dig @192.168.20.1 pfsense.home.arpa ; <<>> DiG 9.16.26 <<>> @192.168.20.1 pfsense.home.arpa ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 23318 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;pfsense.home.arpa. IN A ;; AUTHORITY SECTION: home.arpa. 10800 IN SOA localhost. nobody.invalid. 1 3600 1200 604800 10800 ;; Query time: 1 msec ;; SERVER: 192.168.20.1#53(192.168.20.1) ;; WHEN: Tue Apr 19 13:39:36 PST 2022 ;; MSG SIZE rcvd: 105
See how I'm getting NXDOMAIN replies. It seems as if it doesn't see the domain override setting even if it's there. If I query directly against the remote NS server (192.168.10.1), no issues:
[2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: dig @192.168.10.1 pfsense.home.arpa ; <<>> DiG 9.16.26 <<>> @192.168.10.1 pfsense.home.arpa ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46639 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;pfsense.home.arpa. IN A ;; ANSWER SECTION: pfsense.home.arpa. 3600 IN A 192.168.10.1 ;; Query time: 37 msec ;; SERVER: 192.168.10.1#53(192.168.10.1) ;; WHEN: Tue Apr 19 13:41:08 PST 2022 ;; MSG SIZE rcvd: 62
-
@kevindd992002 your site one says its domain is home.arpa
local-zone: "home.arpa." transparent