Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Domain overrides not working (was working until I noticed just now)

    Scheduled Pinned Locked Moved DHCP and DNS
    35 Posts 8 Posters 6.5k Views 8 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • johnpozJ Offline
      johnpoz LAYER 8 Global Moderator @kevindd992002
      last edited by johnpoz

      @kevindd992002 seems your not actually doing a query to unbound then... If it did it would answer, or atleast show it in the cache that it talked to your NS your pointing it too.

      Lets see your actual dns query to something in home.arpa.

      Here I setup dns query logging, and replies in the custom option box

      server:
      log-queries: yes
      log-replies: yes

      Then I did a query for for something in home.arpa

      $ dig @192.168.9.253 something.home.arpa                              
                                                                            
      ; <<>> DiG 9.16.27 <<>> @192.168.9.253 something.home.arpa            
      ; (1 server found)                                                    
      ;; global options: +cmd                                               
      ;; Got answer:                                                        
      ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 2609             
      ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1  
                                                                            
      ;; OPT PSEUDOSECTION:                                                 
      ; EDNS: version: 0, flags:; udp: 4096                                 
      ;; QUESTION SECTION:                                                  
      ;something.home.arpa.           IN      A                             
                                                                            
      ;; Query time: 0 msec                                                 
      ;; SERVER: 192.168.9.253#53(192.168.9.253)                            
      ;; WHEN: Tue Apr 19 00:28:56 Central Daylight Time 2022               
      ;; MSG SIZE  rcvd: 48
      

      query.jpg

      If your not seeing it actually put into the cache, then it never saw a query for it, and never had to cache it..

      An intelligent man is sometimes forced to be drunk to spend time with his fools
      If you get confused: Listen to the Music Play
      Please don't Chat/PM me for help, unless mod related
      SG-4860 25.07.1 | Lab VMs 2.8.1, 25.07.1

      K 1 Reply Last reply Reply Quote 0
      • K Offline
        kevindd992002 @jimp
        last edited by

        @jimp said in Domain overrides not working (was working until I noticed just now):

        As I mentioned on the Redmine entry there is nothing special about home.arpa in pfSense other than it being the default domain name under System > General Setup. When it is that domain, it has special settings in unbound automatically but if you have changed that then it wouldn't treat it any differently.

        You'll need to post a lot more of your setup here. It could be any number of things. Missing routes in the routing table for the firewall itself to reach places both ways. Missing ACLs in Unbound to allow queries from the other sites. Something wrong in your unbound config or domain override. There are lots of moving parts to get this working between sites and it's even harder with WireGuard since more of it is manually managed than with other methods.

        • Check the routing table on each node and ensure it has routes over the appropriate WireGuard interfaces for the appropriate destinations
        • Check the WireGuard interface firewall rules to ensure the traffic will pass between the hosts (remember to cover both the LAN(s) and the WireGuard interface addresses)
        • Check if you can ping the remote firewall LAN addresses with a source of Localhost and the LAN since that's how you setup Unbound, e.g. ping -S 127.0.0.1 <other fw LAN IP address> and ping -S <this LAN IP address> <other fw LAN IP address>
        • Check Services > DNS Resolver, Access Lists tab and ensure there are entries there for the other firewall LANs and the WireGuard interface subnets. Some of those may be automatically added, check /var/unbound/access_lists.conf to confirm
        • When you ping or send traffic across, check the contents of the state table to ensure the states are on the correct interfaces with the expected addresses
        • Your outbound NAT rules are over-matching, they will NAT traffic out an interface with its own address, which can break some things. You have it set to port 53 but even so it's better to make sure you aren't doing it unnecessarily. Make a specific rule for localhost as a source that will NAT all outbound, not just port 53. You shouldn't need to NAT traffic from the LAN that should be handled by routing, no need for NAT.
        • Compare the contents of /var/unbound/host_entries.conf and /var/unbound/domainoverrides.conf and look for instances of the domains in question and ensure they match up as expected.

        If all else fails, from all of the firewalls involved post the entire contents of /var/unbound/unbound.conf, /var/unbound/domainoverrides.conf, /var/unbound/host_entries.conf, /var/unbound/access_lists.conf, the output of ifconfig -a and netstat -rn along with the contents of /tmp/rules.debug (at least for the wireguard interfaces and localhost). You can redact private info as long as it's done consistently so that people can identify the same address in different places (e.g. 192.168.10.x -> xxx.xxx.xx.x, 192.168.20.x -> xxx.xxx.yy.x, and so on).

        Let me put as much information as I can. Let's forget site 3 for now and focus on sites 1 (main) and 2 (remote site). Also, I have to say that the site 1 also has a domain override for condo.arpa and it's working fine. The one that's not working is the site 2 domain override for home.arpa. Both sites have very similar configs.

        1. Routing tables are fine. I have the static routes set on both sides:

        Site1:
        ff1fddeb-8586-4d0d-b3c0-fc2dd3520513-image.png

        Site 2:
        a94ffca9-7eae-42a7-ab28-684259b931b2-image.png

        1. WG interface FW rules are also fine:

        Site 1:
        3c181d36-f236-4fcb-85b1-5a12b0783842-image.png

        Site 2:
        e0dceaa5-56de-44d7-8023-0ffeb4a2ca1d-image.png

        NOTE: I have 0.0.0.0/0 there because it is needed when I'm routing local traffic to the Internet via the remote site (or vice versa).

        1. Ping

        From site 1:

        [2.6.0-RELEASE][root@pfSense.home.arpa]/root: ping -S 127.0.0.1 192.168.20.1
        PING 192.168.20.1 (192.168.20.1) from 127.0.0.1: 56 data bytes
        64 bytes from 192.168.20.1: icmp_seq=0 ttl=64 time=6.967 ms
        64 bytes from 192.168.20.1: icmp_seq=1 ttl=64 time=4.985 ms
        64 bytes from 192.168.20.1: icmp_seq=2 ttl=64 time=5.029 ms
        64 bytes from 192.168.20.1: icmp_seq=3 ttl=64 time=4.638 ms
        
        [2.6.0-RELEASE][root@pfSense.home.arpa]/root: ping -S 192.168.10.1 192.168.20.1
        PING 192.168.20.1 (192.168.20.1) from 192.168.10.1: 56 data bytes
        64 bytes from 192.168.20.1: icmp_seq=0 ttl=64 time=6.866 ms
        64 bytes from 192.168.20.1: icmp_seq=1 ttl=64 time=4.910 ms
        64 bytes from 192.168.20.1: icmp_seq=2 ttl=64 time=4.991 ms
        64 bytes from 192.168.20.1: icmp_seq=3 ttl=64 time=4.873 ms
        

        From site 2:

        [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: ping -S 127.0.0.1 192.168.10.1
        PING 192.168.10.1 (192.168.10.1) from 127.0.0.1: 56 data bytes
        64 bytes from 192.168.10.1: icmp_seq=0 ttl=64 time=5.569 ms
        64 bytes from 192.168.10.1: icmp_seq=1 ttl=64 time=4.970 ms
        64 bytes from 192.168.10.1: icmp_seq=2 ttl=64 time=4.767 ms
        64 bytes from 192.168.10.1: icmp_seq=3 ttl=64 time=4.899 ms
        
        [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: ping -S 192.168.20.1 192.168.10.1
        PING 192.168.10.1 (192.168.10.1) from 192.168.20.1: 56 data bytes
        64 bytes from 192.168.10.1: icmp_seq=0 ttl=64 time=5.584 ms
        64 bytes from 192.168.10.1: icmp_seq=1 ttl=64 time=7.065 ms
        64 bytes from 192.168.10.1: icmp_seq=2 ttl=64 time=6.707 ms
        64 bytes from 192.168.10.1: icmp_seq=3 ttl=64 time=4.988 ms
        
        1. ACL's are fine -> see attached files

        NOTE: 10.0.3.0/29 and 10.0.3.0/30 are redundant, I know, but I had to put in a manual entry because there seems to be a bug in WireGuard for DNS Resolver ACL's (which I already reported here)

        1. States look ok and are on the correct interfaces with the expected addresses when pinging from LAN. Here's an example when pinging from localhost:

        Site 1:
        88d48765-2fc9-4082-88a6-85927421f714-image.png

        Site 2:
        e7b318ca-ecb5-4bd1-bcc5-214ff7c64d26-image.png

        1. I didn't know there was such a thing as over-matching. I thought it's always better to be more specific in NAT or FW rules. What is the disadvantage of over matching? I changed the outbound NAT rules as per your suggestion but want to understand more about why I needed to do this:

        Site 1:
        4f105e7b-2a66-4bc5-8f12-70d9a08c2729-image.png

        Site 2:
        1c2b63eb-5bf8-4c8e-92fa-828818e367c7-image.png

        1. Contents of /var/unbound/host_entries.conf and /var/unbound/domainoverrides.conf -> see attached files

        Site 1:
        /var/unbound/host_entries.conf -> no instances of condo.arpa which is expected (because the domain override should take care of this)
        /var/unbound/domainoverrides.conf -> condo.arpa domain override is there

        Site 2:
        /var/unbound/host_entries.conf -> no instances of home.arpa which is expected (because the domain override should take care of this)
        /var/unbound/domainoverrides.conf -> home.arpa domain override is there

        So with all these information, I guess I'm already at the "if all else fails" stage. Here are the contents of the files you mentioned for both sites 1 and 2:

        pfsense_config_files.zip

        jimpJ 1 Reply Last reply Reply Quote 0
        • K Offline
          kevindd992002 @johnpoz
          last edited by

          @johnpoz said in Domain overrides not working (was working until I noticed just now):

          @kevindd992002 seems your not actually doing a query to unbound then... If it did it would answer, or atleast show it in the cache that it talked to your NS your pointing it too.

          Lets see your actual dns query to something in home.arpa.

          Here I setup dns query logging, and replies in the custom option box

          server:
          log-queries: yes
          log-replies: yes

          Then I did a query for for something in home.arpa

          $ dig @192.168.9.253 something.home.arpa                              
                                                                                
          ; <<>> DiG 9.16.27 <<>> @192.168.9.253 something.home.arpa            
          ; (1 server found)                                                    
          ;; global options: +cmd                                               
          ;; Got answer:                                                        
          ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 2609             
          ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1  
                                                                                
          ;; OPT PSEUDOSECTION:                                                 
          ; EDNS: version: 0, flags:; udp: 4096                                 
          ;; QUESTION SECTION:                                                  
          ;something.home.arpa.           IN      A                             
                                                                                
          ;; Query time: 0 msec                                                 
          ;; SERVER: 192.168.9.253#53(192.168.9.253)                            
          ;; WHEN: Tue Apr 19 00:28:56 Central Daylight Time 2022               
          ;; MSG SIZE  rcvd: 48
          

          query.jpg

          If your not seeing it actually put into the cache, then it never saw a query for it, and never had to cache it..

          Yes, that's exactly what's happening. From what I can see, the query is not even being generated on the unbound side.

          Ok, so I added those custom options and did a query from the site2 pfsense shell as you did:

          [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: dig @127.0.0.1 pfsense.home.arpa
          
          ; <<>> DiG 9.16.26 <<>> @127.0.0.1 pfsense.home.arpa
          ; (1 server found)
          ;; global options: +cmd
          ;; Got answer:
          ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 53992
          ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
          
          ;; OPT PSEUDOSECTION:
          ; EDNS: version: 0, flags:; udp: 4096
          ;; QUESTION SECTION:
          ;pfsense.home.arpa.             IN      A
          
          ;; AUTHORITY SECTION:
          home.arpa.              10800   IN      SOA     localhost. nobody.invalid. 1 3600 1200 604800 10800
          
          ;; Query time: 1 msec
          ;; SERVER: 127.0.0.1#53(127.0.0.1)
          ;; WHEN: Tue Apr 19 13:38:52 PST 2022
          ;; MSG SIZE  rcvd: 105
          
          [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: dig @192.168.20.1 pfsense.home.arpa
          
          ; <<>> DiG 9.16.26 <<>> @192.168.20.1 pfsense.home.arpa
          ; (1 server found)
          ;; global options: +cmd
          ;; Got answer:
          ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 23318
          ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
          
          ;; OPT PSEUDOSECTION:
          ; EDNS: version: 0, flags:; udp: 4096
          ;; QUESTION SECTION:
          ;pfsense.home.arpa.             IN      A
          
          ;; AUTHORITY SECTION:
          home.arpa.              10800   IN      SOA     localhost. nobody.invalid. 1 3600 1200 604800 10800
          
          ;; Query time: 1 msec
          ;; SERVER: 192.168.20.1#53(192.168.20.1)
          ;; WHEN: Tue Apr 19 13:39:36 PST 2022
          ;; MSG SIZE  rcvd: 105
          

          f89e2770-d930-46a4-93d9-01516ecc7155-image.png

          See how I'm getting NXDOMAIN replies. It seems as if it doesn't see the domain override setting even if it's there. If I query directly against the remote NS server (192.168.10.1), no issues:

          [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: dig @192.168.10.1 pfsense.home.arpa
          
          ; <<>> DiG 9.16.26 <<>> @192.168.10.1 pfsense.home.arpa
          ; (1 server found)
          ;; global options: +cmd
          ;; Got answer:
          ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46639
          ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
          
          ;; OPT PSEUDOSECTION:
          ; EDNS: version: 0, flags:; udp: 4096
          ;; QUESTION SECTION:
          ;pfsense.home.arpa.             IN      A
          
          ;; ANSWER SECTION:
          pfsense.home.arpa.      3600    IN      A       192.168.10.1
          
          ;; Query time: 37 msec
          ;; SERVER: 192.168.10.1#53(192.168.10.1)
          ;; WHEN: Tue Apr 19 13:41:08 PST 2022
          ;; MSG SIZE  rcvd: 62
          
          johnpozJ 1 Reply Last reply Reply Quote 0
          • johnpozJ Offline
            johnpoz LAYER 8 Global Moderator @kevindd992002
            last edited by

            @kevindd992002 your site one says its domain is home.arpa

            local-zone: "home.arpa." transparent

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 25.07.1 | Lab VMs 2.8.1, 25.07.1

            K 1 Reply Last reply Reply Quote 0
            • K Offline
              kevindd992002 @johnpoz
              last edited by

              @johnpoz said in Domain overrides not working (was working until I noticed just now):

              @kevindd992002 your site one says its domain is home.arpa

              local-zone: "home.arpa." transparent

              Correct.

              Site 1 = home.arpa
              Site 2 = condo.arpa

              The domain override for home.arpa is in Site 2 and that's where the problem is. Was I missing something?

              1 Reply Last reply Reply Quote 0
              • jimpJ Offline
                jimp Rebel Alliance Developer Netgate @kevindd992002
                last edited by

                @kevindd992002 said in Domain overrides not working (was working until I noticed just now):

                1. Routing tables are fine. I have the static routes set on both sides:

                Site1:
                ff1fddeb-8586-4d0d-b3c0-fc2dd3520513-image.png

                Site 2:
                a94ffca9-7eae-42a7-ab28-684259b931b2-image.png

                That route shows 0 for its usage, it isn't being hit. The traffic isn't using the route, so where is it going? Do you have some other config that is conflicting, like an IPsec tunnel going to 192.168.10.0/24 perhaps?

                1. WG interface FW rules are also fine:

                Those are "Allowed IPs" entries, not firewall rules. Though if you have 0.0.0.0/0 in those then all other entries are redundant and can be removed.

                1. Ping

                That looks good at least, though again it's odd if that route shows 0 usage and yet you're getting traffic to the other side. Maybe that screenshot was from after the counters were reset somehow.

                1. ACL's are fine -> see attached files

                Yes, those look OK.

                1. States look ok and are on the correct interfaces with the expected addresses when pinging from LAN. Here's an example when pinging from localhost:

                Looks OK with that ping but what about when you attempt a DNS query? What about when you ping from the LAN address?

                1. I didn't know there was such a thing as over-matching. I thought it's always better to be more specific in NAT or FW rules.

                You want to be as specific as possible, if you're over-matching, it's not specific enough.

                What is the disadvantage of over matching? I changed the outbound NAT rules as per your suggestion but want to understand more about why I needed to do this:

                That isn't following my suggestion, the problem is "This Firewall" is a macro which includes every IP address on the firewall. This includes the address of the WG interface itself as well as the LAN interface address. If you source a packet from the WG interface it will get NAT applied to itself unnecessarily and it can break things. You don't want to NAT from the WG interface itself or from the LAN, the routing will handle those. You only want to NAT from "Localhost" (127.0.0.1) which is different from "This Firewall".

                Next up is to check the states and do packet captures on each interface along the way (e.g. WG on both sides, or other interfaces if you don't see it) and make sure the DNS query is going where you think it's going in both directions. You can also increase the logging level in the DNS Resolver advanced options which might give some more clues.

                Something else I noticed in your rules.debug is that on the site1 location you have a DNS redirect NAT rule setup which also has NAT reflection enabled which is probably not what you want. Especially with a destination of any which is a very bad idea. Again it's over-matching and likely grabbing traffic you don't want, like this traffic for instance.

                rdr on igb1 inet proto { tcp udp } from ! $DirectDNS to any port 53 -> $DNS
                # Reflection redirect
                rdr on { igb2 tun_wg0 openvpn WireGuard } inet proto { tcp udp } from ! $DirectDNS to 192.168.20.0/24 port 53 -> $DNS
                rdr on igb0 inet proto { tcp udp } from any to site2-externalIP port 62958 -> $epsilon
                

                Note that by having reflection enabled it's also grabbing inbound DNS on WireGuard and sending it to whatever is in your DNS alias table which isn't listed in rules.debug.

                Your redirect rule should probably (a) have reflection disabled, and (b) be configured closer to the example in the docs, specifically the destination: https://docs.netgate.com/pfsense/en/latest/recipes/dns-redirect.html

                Port forwards with a destination of any are almost never a good idea.

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                K 1 Reply Last reply Reply Quote 0
                • K Offline
                  kevindd992002 @jimp
                  last edited by kevindd992002

                  @jimp said in Domain overrides not working (was working until I noticed just now):

                  @kevindd992002 said in Domain overrides not working (was working until I noticed just now):

                  1. Routing tables are fine. I have the static routes set on both sides:

                  Site1:
                  ff1fddeb-8586-4d0d-b3c0-fc2dd3520513-image.png

                  Site 2:
                  a94ffca9-7eae-42a7-ab28-684259b931b2-image.png

                  That route shows 0 for its usage, it isn't being hit. The traffic isn't using the route, so where is it going? Do you have some other config that is conflicting, like an IPsec tunnel going to 192.168.10.0/24 perhaps?

                  Yeah, the initial screenshot was just from when the counters were reset. Here's an updated screenshot:

                  9e339c25-bcdc-4aa9-b9d0-5867332f9105-image.png

                  I don't have any other tunnels setup for this subnet, so there is no conflict.

                  1. WG interface FW rules are also fine:

                  Those are "Allowed IPs" entries, not firewall rules. Though if you have 0.0.0.0/0 in those then all other entries are redundant and can be removed.

                  Oops, sorry about that. The Allowed IP's act as both ACLs (for incoming traffic) and allowed "routes" (for outbound traffic) so I thought they were the "rules" you were looking for. And yes, you're right, 0.0.0.0/0 encompasses all so I technically no longer need the other entries there. Here are the actual FW rules:

                  Site 1:
                  8b641d5a-6661-4cb2-b487-0ed76ea150e6-image.png

                  Site 2:
                  0ffe59eb-d2d1-4517-8cd0-e8464e9a640e-image.png

                  There are no rules in the "WireGuard" tab for both sites so reply-to's will work.

                  1. Ping

                  That looks good at least, though again it's odd if that route shows 0 usage and yet you're getting traffic to the other side. Maybe that screenshot was from after the counters were reset somehow.

                  Is the Uses counter counting number of states established?

                  1. States look ok and are on the correct interfaces with the expected addresses when pinging from LAN. Here's an example when pinging from localhost:

                  Looks OK with that ping but what about when you attempt a DNS query? What about when you ping from the LAN address?

                  When I attempt a DNS query,
                  A. From site 1 firewall, querying condo.arpa:

                  15332395-d2ab-46eb-be97-04370ef64be4-image.png

                  B. From site 2 firewall, querying home.arpa -> NO RESULTS

                  When pinging from LAN address:
                  A. Site 1:
                  4349f15c-bcaa-4d0d-9e30-ff81724d4c5a-image.png

                  B. Site 2:
                  c7658a87-495c-456b-8b2c-4421cb0ae51f-image.png

                  1. I didn't know there was such a thing as over-matching. I thought it's always better to be more specific in NAT or FW rules.

                  You want to be as specific as possible, if you're over-matching, it's not specific enough.

                  What is the disadvantage of over matching? I changed the outbound NAT rules as per your suggestion but want to understand more about why I needed to do this:

                  That isn't following my suggestion, the problem is "This Firewall" is a macro which includes every IP address on the firewall. This includes the address of the WG interface itself as well as the LAN interface address. If you source a packet from the WG interface it will get NAT applied to itself unnecessarily and it can break things. You don't want to NAT from the WG interface itself or from the LAN, the routing will handle those. You only want to NAT from "Localhost" (127.0.0.1) which is different from "This Firewall".

                  I see what you mean. I corrected those now.

                  Site 1:
                  422306f8-f4e8-4b62-a923-5abc84d8f488-image.png

                  Site 2:
                  7fe548fe-6a96-4044-bb99-27d68cafa221-image.png

                  Next up is to check the states and do packet captures on each interface along the way (e.g. WG on both sides, or other interfaces if you don't see it) and make sure the DNS query is going where you think it's going in both directions. You can also increase the logging level in the DNS Resolver advanced options which might give some more clues.

                  I've shown the states above when I do a DNS query. When Site 1 is querying for condo.arpa, the domain override works so states are showing. This is not true when Site 2 is querying for home.arpa.

                  Site 1:
                  A. Capture on localhost when querying pfsense.condo.arpa:

                  12:24:30.640552 IP 127.0.0.1.9999 > 127.0.0.1.53: UDP, length 36
                  12:24:30.641060 IP 127.0.0.1.53 > 127.0.0.1.9999: UDP, length 52
                  12:24:31.028241 IP 127.0.0.1.35889 > 127.0.0.1.53: UDP, length 36
                  12:24:31.028720 IP 127.0.0.1.53 > 127.0.0.1.35889: UDP, length 52
                  12:24:31.029375 IP 127.0.0.1.54054 > 127.0.0.1.53: UDP, length 36
                  12:24:31.029667 IP 127.0.0.1.53 > 127.0.0.1.54054: UDP, length 52
                  12:24:31.030131 IP 127.0.0.1.18428 > 127.0.0.1.53: UDP, length 36
                  12:24:31.030304 IP 127.0.0.1.53 > 127.0.0.1.18428: UDP, length 36
                  12:24:31.030537 IP 127.0.0.1.47366 > 127.0.0.1.53: UDP, length 46
                  12:24:31.030809 IP 127.0.0.1.53 > 127.0.0.1.47366: UDP, length 123
                  12:24:31.031119 IP 127.0.0.1.18099 > 127.0.0.1.53: UDP, length 36
                  12:24:31.031261 IP 127.0.0.1.53 > 127.0.0.1.18099: UDP, length 36
                  12:24:31.031436 IP 127.0.0.1.37977 > 127.0.0.1.53: UDP, length 46
                  12:24:31.031619 IP 127.0.0.1.53 > 127.0.0.1.37977: UDP, length 123
                  

                  B. Capture on WG interface when querying pfsense.condo.arpa:

                  12:26:26.188805 IP 10.0.3.1.43481 > 192.168.20.1.53: UDP, length 47
                  12:26:26.189563 IP 10.0.3.1.24105 > 192.168.20.1.53: UDP, length 47
                  12:26:26.194062 IP 192.168.20.1.53 > 10.0.3.1.43481: UDP, length 47
                  12:26:26.194097 IP 192.168.20.1.53 > 10.0.3.1.24105: UDP, length 47
                  

                  Site 2:
                  A. Capture on localhost when querying pfsense.home.arpa:

                  12:27:35.525603 IP 127.0.0.1.16565 > 127.0.0.1.53: UDP, length 35
                  12:27:35.526020 IP 127.0.0.1.53 > 127.0.0.1.16565: UDP, length 94
                  12:27:39.027161 IP 127.0.0.1.56750 > 127.0.0.1.53: UDP, length 35
                  12:27:39.027517 IP 127.0.0.1.53 > 127.0.0.1.56750: UDP, length 94
                  12:27:39.027732 IP 127.0.0.1.42970 > 127.0.0.1.53: UDP, length 46
                  12:27:39.027965 IP 127.0.0.1.53 > 127.0.0.1.42970: UDP, length 122
                  12:27:39.028494 IP 127.0.0.1.17799 > 127.0.0.1.53: UDP, length 35
                  12:27:39.028648 IP 127.0.0.1.53 > 127.0.0.1.17799: UDP, length 94
                  12:27:39.028783 IP 127.0.0.1.59727 > 127.0.0.1.53: UDP, length 46
                  12:27:39.028931 IP 127.0.0.1.53 > 127.0.0.1.59727: UDP, length 122
                  12:27:39.029293 IP 127.0.0.1.61116 > 127.0.0.1.53: UDP, length 35
                  12:27:39.029532 IP 127.0.0.1.53 > 127.0.0.1.61116: UDP, length 94
                  12:27:39.029707 IP 127.0.0.1.41930 > 127.0.0.1.53: UDP, length 46
                  12:27:39.030151 IP 127.0.0.1.53 > 127.0.0.1.41930: UDP, length 122
                  12:27:39.030708 IP 127.0.0.1.17114 > 127.0.0.1.53: UDP, length 35
                  12:27:39.030890 IP 127.0.0.1.53 > 127.0.0.1.17114: UDP, length 94
                  12:27:39.031112 IP 127.0.0.1.9021 > 127.0.0.1.53: UDP, length 46
                  12:27:39.031393 IP 127.0.0.1.53 > 127.0.0.1.9021: UDP, length 122
                  

                  B. Capture on WG interface when querying pfsense.home.arpa -> NO RESULTS

                  Something else I noticed in your rules.debug is that on the site1 location you have a DNS redirect NAT rule setup which also has NAT reflection enabled which is probably not what you want. Especially with a destination of any which is a very bad idea. Again it's over-matching and likely grabbing traffic you don't want, like this traffic for instance.

                  rdr on igb1 inet proto { tcp udp } from ! $DirectDNS to any port 53 -> $DNS
                  # Reflection redirect
                  rdr on { igb2 tun_wg0 openvpn WireGuard } inet proto { tcp udp } from ! $DirectDNS to 192.168.20.0/24 port 53 -> $DNS
                  rdr on igb0 inet proto { tcp udp } from any to site2-externalIP port 62958 -> $epsilon
                  

                  Note that by having reflection enabled it's also grabbing inbound DNS on WireGuard and sending it to whatever is in your DNS alias table which isn't listed in rules.debug.

                  Your redirect rule should probably (a) have reflection disabled, and (b) be configured closer to the example in the docs, specifically the destination: https://docs.netgate.com/pfsense/en/latest/recipes/dns-redirect.html

                  Port forwards with a destination of any are almost never a good idea.

                  Ok, I corrected those as well:

                  Site 1:
                  03576055-88ad-422e-ada0-327b4d06c153-image.png

                  Site 2:
                  a1833e8d-3b42-45b4-91a5-f914275697dd-image.png

                  The DNS alias is my standalone DNS (AdGuard Home).

                  Good catch on the NAT reflection part. Although, I have the sources from the other side of the tunnel (AdGuard Home and WG interface) added to the DirectDNS alias so they get excluded from being redirected when NAT reflection was still enabled. Since I have NAT reflection disabled on the rules now, I can cleanup and remove those extraneous entries in the alias.

                  jimpJ 1 Reply Last reply Reply Quote 0
                  • jimpJ Offline
                    jimp Rebel Alliance Developer Netgate @kevindd992002
                    last edited by

                    @kevindd992002 said in Domain overrides not working (was working until I noticed just now):

                    @jimp said in Domain overrides not working (was working until I noticed just now):

                    @kevindd992002 said in Domain overrides not working (was working until I noticed just now):

                    Oops, sorry about that. The Allowed IP's act as both ACLs (for incoming traffic) and allowed "routes" (for outbound traffic) so I thought they were the "rules" you were looking for. And yes, you're right, 0.0.0.0/0 encompasses all so I technically no longer need the other entries there. Here are the actual FW rules:

                    There are no rules in the "WireGuard" tab for both sites so reply-to's will work.

                    You are not getting reply-to on the WireGuard interface tab rules probably because you don't have the WG gateway picked on the interface config (Interfaces > WIREGUARD_S2S in your setup). Without the gateway picked there it's treated as an internal interface, not an external interface. Though adding the gateway also means it may do some things you don't want, like automatic outbound NAT. Might need to keep an eye on that.

                    Is the Uses counter counting number of states established?

                    No, it's the number of times the firewall has used the route total, which would be closer to per-packet than per-state. (States are a pf concept, the route is in the OS, not pf)

                    1. States look ok and are on the correct interfaces with the expected addresses when pinging from LAN. Here's an example when pinging from localhost:

                    Looks OK with that ping but what about when you attempt a DNS query? What about when you ping from the LAN address?

                    When I attempt a DNS query,
                    A. From site 1 firewall, querying condo.arpa:

                    That looks like I'd expect given your setup.

                    B. From site 2 firewall, querying home.arpa -> NO RESULTS

                    Any DNS states at all on another interface perhaps? The query has to be going somewhere unless it's being answered completely out of the cache. To eliminate that possibility, you should stop the DNS Resolver daemon on site 2, start it again, then run the query and check the states.

                    When pinging from LAN address:

                    Those look OK too.

                    I see what you mean. I corrected those now.

                    That should be good, though you could be a little more lenient with the destination if you wanted (e.g. /24), in case the firewalls may need to contact something else in that subnet in a similar manner in the future.

                    Site 2:
                    B. Capture on WG interface when querying pfsense.home.arpa -> NO RESULTS

                    So that confirms the states observation. Somehow it's getting directed elsewhere. You could capture DNS traffic on the WAN(s) and look to see if the query is being sent upstream. It shouldn't be, but it's possible.

                    Your redirect rule should probably (a) have reflection disabled, and (b) be configured closer to the example in the docs, specifically the destination: https://docs.netgate.com/pfsense/en/latest/recipes/dns-redirect.html
                    Ok, I corrected those as well:

                    That should be better, less likely to cause problems.

                    Since the queries never leave site 2, the problem must be at site 2, though, so those NAT rules and other DNS config were probably unrelated.

                    Next would be more packet captures (higher detail, on other interfaces) and increased logging in unbound to see what it's doing with those queries.

                    There are also a few other .conf files in /var/unbound/ you might look through for any mention of home.arpa.

                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                    Need help fast? Netgate Global Support!

                    Do not Chat/PM for help!

                    K 1 Reply Last reply Reply Quote 0
                    • K Offline
                      kevindd992002 @jimp
                      last edited by kevindd992002

                      @jimp said in Domain overrides not working (was working until I noticed just now):

                      @kevindd992002 said in Domain overrides not working (was working until I noticed just now):

                      @jimp said in Domain overrides not working (was working until I noticed just now):

                      @kevindd992002 said in Domain overrides not working (was working until I noticed just now):

                      Oops, sorry about that. The Allowed IP's act as both ACLs (for incoming traffic) and allowed "routes" (for outbound traffic) so I thought they were the "rules" you were looking for. And yes, you're right, 0.0.0.0/0 encompasses all so I technically no longer need the other entries there. Here are the actual FW rules:

                      There are no rules in the "WireGuard" tab for both sites so reply-to's will work.

                      You are not getting reply-to on the WireGuard interface tab rules probably because you don't have the WG gateway picked on the interface config (Interfaces > WIREGUARD_S2S in your setup). Without the gateway picked there it's treated as an internal interface, not an external interface. Though adding the gateway also means it may do some things you don't want, like automatic outbound NAT. Might need to keep an eye on that.

                      You're absolutely right. I forgot about this. When I first setup WG, I had a gateway set in the interface config and because it created automatic outbound NAT for it, I had to explicitly create a "disable outbound NAT" rule for certain traffic going out the WG interface. I needed reply-to before because I was port forwarding traffic from site 1 to a NAT destination on site 2 before since site 2 didn't have a public IP. But this is not the case anymore so I removed the WG interface gateway setting and cleaned things up.

                      Is the Uses counter counting number of states established?

                      No, it's the number of times the firewall has used the route total, which would be closer to per-packet than per-state. (States are a pf concept, the route is in the OS, not pf)

                      I see. So does that mean that if I do a continous ping from a device in site 2 to a device in site 2, that counter should go up continuously as well? I did a test and when I ping/tracert from the site 2 pfsense GUI to a site 2 device, the counter does go up just fine. However, when I do the same test from a LAN device in site 2, I don't see the counter go up but the route being used is correct:

                      C:\Users\usongk00>tracert 192.168.10.17
                      
                      Tracing route to nuc.home.arpa [192.168.10.17]
                      over a maximum of 30 hops:
                      
                        1     1 ms     3 ms     1 ms  pfSense.condo.arpa [192.168.20.1]
                        2     7 ms     6 ms     6 ms  10.0.3.1
                        3     9 ms     9 ms     6 ms  nuc.home.arpa [192.168.10.17]
                      
                      Trace complete.
                      

                      Not sure if this is normal?

                      1. States look ok and are on the correct interfaces with the expected addresses when pinging from LAN. Here's an example when pinging from localhost:

                      Looks OK with that ping but what about when you attempt a DNS query? What about when you ping from the LAN address?

                      When I attempt a DNS query,
                      A. From site 1 firewall, querying condo.arpa:

                      That looks like I'd expect given your setup.

                      B. From site 2 firewall, querying home.arpa -> NO RESULTS

                      Any DNS states at all on another interface perhaps? The query has to be going somewhere unless it's being answered completely out of the cache. To eliminate that possibility, you should stop the DNS Resolver daemon on site 2, start it again, then run the query and check the states.

                      I restarted the DNS Resolver to delete the cache and queried using the GUI again:

                      e8c02b19-4bda-4316-bff5-4a7457ba11c3-image.png

                      I then checked the states for port 53 and since I have external DNS servers set under System -> General Setup, then there are states in the WAN interface. There are absolutely no DNS states in the WG interface.

                      I see what you mean. I corrected those now.

                      That should be good, though you could be a little more lenient with the destination if you wanted (e.g. /24), in case the firewalls may need to contact something else in that subnet in a similar manner in the future.

                      Ok, got it. That makes sense.

                      Site 2:
                      B. Capture on WG interface when querying pfsense.home.arpa -> NO RESULTS

                      So that confirms the states observation. Somehow it's getting directed elsewhere. You could capture DNS traffic on the WAN(s) and look to see if the query is being sent upstream. It shouldn't be, but it's possible.

                      Correct. I mean, the query gets directed to the external DNS servers as it should but will get an NXDOMAIN response from them, of course. If I do a DNS packet capture on the WAN, how do I isolate the traffic we want if all DNS queries have the same source (because of the existing automatic outbound NAT)?

                      Your redirect rule should probably (a) have reflection disabled, and (b) be configured closer to the example in the docs, specifically the destination: https://docs.netgate.com/pfsense/en/latest/recipes/dns-redirect.html
                      Ok, I corrected those as well:

                      That should be better, less likely to cause problems.

                      Since the queries never leave site 2, the problem must be at site 2, though, so those NAT rules and other DNS config were probably unrelated.

                      Next would be more packet captures (higher detail, on other interfaces) and increased logging in unbound to see what it's doing with those queries.

                      There are also a few other .conf files in /var/unbound/ you might look through for any mention of home.arpa.

                      I completely agree. No traffic is being sent over the WG interface so we can remove a lot of variables from the mix.

                      Packet capture on the localhost interface with full details:

                      10:42:51.514200 AF IPv4 (2), length 67: (tos 0x0, ttl 64, id 28659, offset 0, flags [none], proto UDP (17), length 63, bad cksum 0 (->cb9)!)
                          127.0.0.1.60635 > 127.0.0.1.53: [bad udp cksum 0xfe3e -> 0xdc7a!] 17316+ A? pfsense.home.arpa. (35)
                      10:42:51.546812 AF IPv4 (2), length 126: (tos 0x0, ttl 64, id 54821, offset 0, flags [none], proto UDP (17), length 122, bad cksum 0 (->a64b)!)
                          127.0.0.1.53 > 127.0.0.1.60635: [bad udp cksum 0xfe79 -> 0x36ef!] 17316 NXDomain* q: A? pfsense.home.arpa. 0/1/0 ns: home.arpa. SOA localhost. nobody.invalid. 1 3600 1200 604800 10800 (94)
                      10:42:53.774572 AF IPv4 (2), length 73: (tos 0x0, ttl 64, id 58117, offset 0, flags [none], proto UDP (17), length 69, bad cksum 0 (->99a0)!)
                          127.0.0.1.38767 > 127.0.0.1.53: [bad udp cksum 0xfe44 -> 0xacdb!] 56689+ A? wg.pampanga.duckdns.org. (41)
                      10:42:58.995939 AF IPv4 (2), length 73: (tos 0x0, ttl 64, id 37156, offset 0, flags [none], proto UDP (17), length 69, bad cksum 0 (->eb81)!)
                          127.0.0.1.61199 > 127.0.0.1.53: [bad udp cksum 0xfe44 -> 0xa2cf!] 29917+ AAAA? wg.pampanga.duckdns.org. (41)
                      10:43:03.771236 AF IPv4 (2), length 89: (tos 0x0, ttl 64, id 63566, offset 0, flags [none], proto UDP (17), length 85, bad cksum 0 (->8447)!)
                          127.0.0.1.53 > 127.0.0.1.38767: [bad udp cksum 0xfe54 -> 0xcdbf!] 56689 q: A? wg.pampanga.duckdns.org. 1/0/0 wg.pampanga.duckdns.org. A 120.29.65.242 (57)
                      10:43:05.736458 AF IPv4 (2), length 124: (tos 0x0, ttl 64, id 22902, offset 0, flags [none], proto UDP (17), length 120, bad cksum 0 (->22fd)!)
                          127.0.0.1.53 > 127.0.0.1.61199: [bad udp cksum 0xfe77 -> 0x2c8a!] 29917 q: AAAA? wg.pampanga.duckdns.org. 0/1/0 ns: duckdns.org. SOA ns1.duckdns.org. hostmaster.duckdns.org. 2021081401 6000 120 2419200 600 (92)
                      10:43:19.475654 AF IPv4 (2), length 67: (tos 0x0, ttl 64, id 50880, offset 0, flags [none], proto UDP (17), length 63, bad cksum 0 (->b5eb)!)
                          127.0.0.1.55851 > 127.0.0.1.53: [bad udp cksum 0xfe3e -> 0xe535!] 19865+ A? pfsense.home.arpa. (35)
                      10:43:19.476144 AF IPv4 (2), length 126: (tos 0x0, ttl 64, id 41764, offset 0, flags [none], proto UDP (17), length 122, bad cksum 0 (->d94c)!)
                          127.0.0.1.53 > 127.0.0.1.55851: [bad udp cksum 0xfe79 -> 0x3faa!] 19865 NXDomain* q: A? pfsense.home.arpa. 0/1/0 ns: home.arpa. SOA localhost. nobody.invalid. 1 3600 1200 604800 10800 (94)
                      10:43:19.476395 AF IPv4 (2), length 78: (tos 0x0, ttl 64, id 53248, offset 0, flags [none], proto UDP (17), length 74, bad cksum 0 (->aca0)!)
                          127.0.0.1.30400 > 127.0.0.1.53: [bad udp cksum 0xfe49 -> 0xcafd!] 7186+ A? pfsense.home.arpa.condo.arpa. (46)
                      10:43:19.476643 AF IPv4 (2), length 154: (tos 0x0, ttl 64, id 46475, offset 0, flags [none], proto UDP (17), length 150, bad cksum 0 (->c6c9)!)
                          127.0.0.1.53 > 127.0.0.1.30400: [bad udp cksum 0xfe95 -> 0xadf6!] 7186 NXDomain q: A? pfsense.home.arpa.condo.arpa. 0/1/0 ns: arpa. SOA a.root-servers.net. nstld.verisign-grs.com. 2022042100 1800 900 604800 86400 (122)
                      10:43:19.477214 AF IPv4 (2), length 67: (tos 0x0, ttl 64, id 26074, offset 0, flags [none], proto UDP (17), length 63, bad cksum 0 (->16d2)!)
                          127.0.0.1.31622 > 127.0.0.1.53: [bad udp cksum 0xfe3e -> 0x3307!] 24173+ A? pfsense.home.arpa. (35)
                      10:43:19.477505 AF IPv4 (2), length 126: (tos 0x0, ttl 64, id 46346, offset 0, flags [none], proto UDP (17), length 122, bad cksum 0 (->c766)!)
                          127.0.0.1.53 > 127.0.0.1.31622: [bad udp cksum 0xfe79 -> 0x8d7b!] 24173 NXDomain* q: A? pfsense.home.arpa. 0/1/0 ns: home.arpa. SOA localhost. nobody.invalid. 1 3600 1200 604800 10800 (94)
                      10:43:19.477663 AF IPv4 (2), length 78: (tos 0x0, ttl 64, id 3863, offset 0, flags [none], proto UDP (17), length 74, bad cksum 0 (->6d8a)!)
                          127.0.0.1.25125 > 127.0.0.1.53: [bad udp cksum 0xfe49 -> 0xfe06!] 64931+ A? pfsense.home.arpa.condo.arpa. (46)
                      10:43:19.477944 AF IPv4 (2), length 154: (tos 0x0, ttl 64, id 42995, offset 0, flags [none], proto UDP (17), length 150, bad cksum 0 (->d461)!)
                          127.0.0.1.53 > 127.0.0.1.25125: [bad udp cksum 0xfe95 -> 0xe0ff!] 64931 NXDomain q: A? pfsense.home.arpa.condo.arpa. 0/1/0 ns: arpa. SOA a.root-servers.net. nstld.verisign-grs.com. 2022042100 1800 900 604800 86400 (122)
                      10:43:19.478382 AF IPv4 (2), length 67: (tos 0x0, ttl 64, id 46880, offset 0, flags [none], proto UDP (17), length 63, bad cksum 0 (->c58b)!)
                          127.0.0.1.52885 > 127.0.0.1.53: [bad udp cksum 0xfe3e -> 0xd5b5!] 19887+ AAAA? pfsense.home.arpa. (35)
                      10:43:19.478639 AF IPv4 (2), length 126: (tos 0x0, ttl 64, id 61471, offset 0, flags [none], proto UDP (17), length 122, bad cksum 0 (->8c51)!)
                          127.0.0.1.53 > 127.0.0.1.52885: [bad udp cksum 0xfe79 -> 0x302a!] 19887 NXDomain* q: AAAA? pfsense.home.arpa. 0/1/0 ns: home.arpa. SOA localhost. nobody.invalid. 1 3600 1200 604800 10800 (94)
                      10:43:19.478871 AF IPv4 (2), length 78: (tos 0x0, ttl 64, id 62080, offset 0, flags [none], proto UDP (17), length 74, bad cksum 0 (->8a20)!)
                          127.0.0.1.6672 > 127.0.0.1.53: [bad udp cksum 0xfe49 -> 0xe9fd!] 22951+ AAAA? pfsense.home.arpa.condo.arpa. (46)
                      10:43:19.484263 AF IPv4 (2), length 154: (tos 0x0, ttl 64, id 29699, offset 0, flags [none], proto UDP (17), length 150, bad cksum 0 (->852)!)
                          127.0.0.1.53 > 127.0.0.1.6672: [bad udp cksum 0xfe95 -> 0xccf6!] 22951 NXDomain q: AAAA? pfsense.home.arpa.condo.arpa. 0/1/0 ns: arpa. SOA a.root-servers.net. nstld.verisign-grs.com. 2022042100 1800 900 604800 86400 (122)
                      10:43:19.484763 AF IPv4 (2), length 67: (tos 0x0, ttl 64, id 22410, offset 0, flags [none], proto UDP (17), length 63, bad cksum 0 (->2522)!)
                          127.0.0.1.63544 > 127.0.0.1.53: [bad udp cksum 0xfe3e -> 0xa486!] 27707+ CNAME? pfsense.home.arpa. (35)
                      10:43:19.485243 AF IPv4 (2), length 126: (tos 0x0, ttl 64, id 25687, offset 0, flags [none], proto UDP (17), length 122, bad cksum 0 (->181a)!)
                          127.0.0.1.53 > 127.0.0.1.63544: [bad udp cksum 0xfe79 -> 0xfefa!] 27707 NXDomain* q: CNAME? pfsense.home.arpa. 0/1/0 ns: home.arpa. SOA localhost. nobody.invalid. 1 3600 1200 604800 10800 (94)
                      10:43:19.485403 AF IPv4 (2), length 78: (tos 0x0, ttl 64, id 44235, offset 0, flags [none], proto UDP (17), length 74, bad cksum 0 (->cfd5)!)
                          127.0.0.1.25222 > 127.0.0.1.53: [bad udp cksum 0xfe49 -> 0x7b25!] 32800+ CNAME? pfsense.home.arpa.condo.arpa. (46)
                      10:43:19.491802 AF IPv4 (2), length 154: (tos 0x0, ttl 64, id 40797, offset 0, flags [none], proto UDP (17), length 150, bad cksum 0 (->dcf7)!)
                          127.0.0.1.53 > 127.0.0.1.25222: [bad udp cksum 0xfe95 -> 0x5e1e!] 32800 NXDomain q: CNAME? pfsense.home.arpa.condo.arpa. 0/1/0 ns: arpa. SOA a.root-servers.net. nstld.verisign-grs.com. 2022042100 1800 900 604800 86400 (122)
                      

                      I haven't done the WAN packet capture yet since I'm not sure how to filter out the packets of interest.

                      I also increased the logging level in unbound to level 3. After querying pfsense.home.arpa from the firewall, I checked Status -> System Logs -> DNS Resolver but I don't see any results when filtering with either home.arpa or 192.168.10.1:

                      4ad793cf-9c29-4787-8afb-af39f133fe3c-image.png

                      But if I include the following in the DNS Resolver custom options, I do get events in the logs:

                      log-queries: yes
                      log-replies: yes

                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa.condo.arpa. CNAME IN NXDOMAIN 0.000000 0 122
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: validation success pfsense.home.arpa.condo.arpa. CNAME IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: validator operate: query pfsense.home.arpa.condo.arpa. CNAME IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: finishing processing for pfsense.home.arpa.condo.arpa. CNAME IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: resolving pfsense.home.arpa.condo.arpa. CNAME IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: validator operate: query pfsense.home.arpa.condo.arpa. CNAME IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa.condo.arpa. CNAME IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. CNAME IN NXDOMAIN 0.000000 1 94
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. CNAME IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa.condo.arpa. AAAA IN NXDOMAIN 0.000000 0 122
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: validation success pfsense.home.arpa.condo.arpa. AAAA IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: validator operate: query pfsense.home.arpa.condo.arpa. AAAA IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: finishing processing for pfsense.home.arpa.condo.arpa. AAAA IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: resolving pfsense.home.arpa.condo.arpa. AAAA IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: validator operate: query pfsense.home.arpa.condo.arpa. AAAA IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa.condo.arpa. AAAA IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. AAAA IN NXDOMAIN 0.000000 1 94
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. AAAA IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:2] info: 127.0.0.1 pfsense.home.arpa.condo.arpa. A IN NXDOMAIN 0.000000 0 122
                      Apr 21 11:23:45 	unbound 	4156 	[4156:2] info: validation success pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:2] info: validator operate: query pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:2] info: finishing processing for pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:2] info: resolving pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:2] info: validator operate: query pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:2] info: 127.0.0.1 pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:2] info: 127.0.0.1 pfsense.home.arpa. A IN NXDOMAIN 0.000000 1 94
                      Apr 21 11:23:45 	unbound 	4156 	[4156:2] info: 127.0.0.1 pfsense.home.arpa. A IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa.condo.arpa. A IN NXDOMAIN 1.148588 0 122
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: validation success pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: validator operate: query pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: validator operate: query pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: validator operate: query pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: finishing processing for pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: response for pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: iterator operate: query pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: processQueryTargets: pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: iterator operate: query pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: processQueryTargets: pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: iterator operate: query pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: processQueryTargets: pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: response for pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: iterator operate: query pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: processQueryTargets: pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: resolving (init part 3): pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: resolving (init part 2): pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: resolving pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: validator operate: query pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa.condo.arpa. A IN
                      Apr 21 11:23:44 	unbound 	4156 	[4156:0] info: 127.0.0.1 pfsense.home.arpa. A IN NXDOMAIN 0.000000 1 94
                      Apr 21 11:23:44 	unbound 	4156 	[4156:0] info: 127.0.0.1 pfsense.home.arpa. A IN
                      Apr 21 11:23:41 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. A IN NXDOMAIN 0.000000 1 94
                      Apr 21 11:23:41 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. A IN 
                      

                      Here are mentions of home.arpa in all files in /var/unbound on site 2:

                      [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: grep -R home.arpa /var/unbound
                      /var/unbound/unbound.conf:private-domain: "home.arpa"
                      /var/unbound/unbound.conf:domain-insecure: "home.arpa"
                      /var/unbound/domainoverrides.conf:      name: "home.arpa"
                      
                      jimpJ 1 Reply Last reply Reply Quote 0
                      • jimpJ Offline
                        jimp Rebel Alliance Developer Netgate @kevindd992002
                        last edited by jimp

                        @kevindd992002 said in Domain overrides not working (was working until I noticed just now):

                        I see. So does that mean that if I do a continous ping from a device in site 2 to a device in site 2, that counter should go up continuously as well? I did a test and when I ping/tracert from the site 2 pfsense GUI to a site 2 device, the counter does go up just fine. However, when I do the same test from a LAN device in site 2, I don't see the counter go up but the route being used is correct:

                        It depends. If you ping LAN to LAN and the traffic hits a rule with a gateway set it can skip the OS routing and route inside pf (fast forwarding style) and pf handles sending it out the right way. Some routing decisions are cached by the OS as well, so it's not always exact either way. It's enough that you see it get used at all in most cases.

                        I restarted the DNS Resolver to delete the cache and queried using the GUI again:

                        A query from the GUI is much different than one from a client or at a shell prompt. The GUI DNS test probes configured forwarding servers directly and can ignore/bypass some of the Unbound behavior. You should test from a shell prompt with something like this:

                        $ host -v pfsense.home.arpa. 127.0.0.1
                        

                        And this:

                        $ unbound-control -c /var/unbound/unbound.conf lookup pfsense.home.arpa.
                        

                        Note in both cases the trailing . which will restrict the lookup to that host specifically, it won't attempt to look it up by adding whatever the current search domain(s) are.

                        Packet capture on the localhost interface with full details:

                        10:43:19.477214 AF IPv4 (2), length 67: (tos 0x0, ttl 64, id 26074, offset 0, flags [none], proto UDP (17), length 63, bad cksum 0 (->16d2)!)
                            127.0.0.1.31622 > 127.0.0.1.53: [bad udp cksum 0xfe3e -> 0x3307!] 24173+ A? pfsense.home.arpa. (35)
                        10:43:19.477505 AF IPv4 (2), length 126: (tos 0x0, ttl 64, id 46346, offset 0, flags [none], proto UDP (17), length 122, bad cksum 0 (->c766)!)
                            127.0.0.1.53 > 127.0.0.1.31622: [bad udp cksum 0xfe79 -> 0x8d7b!] 24173 NXDomain* q: A? pfsense.home.arpa. 0/1/0 ns: home.arpa. SOA localhost. nobody.invalid. 1 3600 1200 604800 10800 (94)
                        

                        It's odd that it seems to be responding as if it is itself the SOA for home.arpa which it should not be based on the config.

                        But if I include the following in the DNS Resolver custom options, I do get events in the logs:

                        log-queries: yes
                        log-replies: yes

                        That's the way to go, using those.

                        Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. CNAME IN NXDOMAIN 0.000000 1 94
                        Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. CNAME IN
                        [...]
                        Apr 21 11:23:44 	unbound 	4156 	[4156:0] info: 127.0.0.1 pfsense.home.arpa. A IN NXDOMAIN 0.000000 1 94
                        Apr 21 11:23:44 	unbound 	4156 	[4156:0] info: 127.0.0.1 pfsense.home.arpa. A IN
                        Apr 21 11:23:41 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. A IN NXDOMAIN 0.000000 1 94
                        Apr 21 11:23:41 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. A IN 
                        

                        Similar to the above, it's all internal there when it should be hitting the domain override.

                        Here are mentions of home.arpa in all files in /var/unbound on site 2:

                        [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: grep -R home.arpa /var/unbound
                        /var/unbound/unbound.conf:private-domain: "home.arpa"
                        /var/unbound/unbound.conf:domain-insecure: "home.arpa"
                        /var/unbound/domainoverrides.conf:      name: "home.arpa"
                        

                        That all looks OK, too.

                        For good measure you might want to stop the unbound daemon and then start it again to ensure it's fully dumping whatever is in its running configuration and caches.

                        Is it failing to work from site 2 and site 3 or just site 2? If it works at site 3 then comparing the configuration at Sites 2-3 would be better than looking at differences from site 1.

                        EDIT: I should add that if the query for pfsense.home.arpa was leaking to a public DNS server you'd get a different SOA, like this:

                        16:00:38.356296 IP (tos 0x0, ttl 64, id 54797, offset 0, flags [none], proto UDP (17), length 140, bad cksum 0 (->a651)!)
                            127.0.0.1.53 > 127.0.0.1.55623: [bad udp cksum 0xfe8b -> 0x89e1!] 39962 NXDomain q: A? pfsense.home.arpa. 0/1/0 ns: home.arpa. [1h] SOA prisoner.iana.org. hostmaster.root-servers.org. 1 1800 900 604800 604800 (112)
                        

                        Note that it shows SOA prisoner.iana.org. and not SOA localhost. like yours. And if it was Unbound in general doing that, mine should fail the same way yours is, but it isn't.

                        Something else I noticed is that you have DNSSEC on. Not that it should make a difference here, but you should try with that disabled as well.

                        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                        Need help fast? Netgate Global Support!

                        Do not Chat/PM for help!

                        K 1 Reply Last reply Reply Quote 0
                        • K Offline
                          kevindd992002 @jimp
                          last edited by

                          @jimp said in Domain overrides not working (was working until I noticed just now):

                          @kevindd992002 said in Domain overrides not working (was working until I noticed just now):

                          I see. So does that mean that if I do a continous ping from a device in site 2 to a device in site 2, that counter should go up continuously as well? I did a test and when I ping/tracert from the site 2 pfsense GUI to a site 2 device, the counter does go up just fine. However, when I do the same test from a LAN device in site 2, I don't see the counter go up but the route being used is correct:

                          It depends. If you ping LAN to LAN and the traffic hits a rule with a gateway set it can skip the OS routing and route inside pf (fast forwarding style) and pf handles sending it out the right way. Some routing decisions are cached by the OS as well, so it's not always exact either way. It's enough that you see it get used at all in most cases.

                          Ok, got it. That makes sense.

                          I restarted the DNS Resolver to delete the cache and queried using the GUI again:

                          A query from the GUI is much different than one from a client or at a shell prompt. The GUI DNS test probes configured forwarding servers directly and can ignore/bypass some of the Unbound behavior. You should test from a shell prompt with something like this:

                          $ host -v pfsense.home.arpa. 127.0.0.1
                          

                          And this:

                          $ unbound-control -c /var/unbound/unbound.conf lookup pfsense.home.arpa.
                          

                          Note in both cases the trailing . which will restrict the lookup to that host specifically, it won't attempt to look it up by adding whatever the current search domain(s) are.

                          Ok. Doing it correctly this time:

                          [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: host -v pfsense.home.arpa. 127.0.0.1
                          Trying "pfsense.home.arpa"
                          Trying "pfsense.home.arpa.condo.arpa"
                          Using domain server:
                          Name: 127.0.0.1
                          Address: 127.0.0.1#53
                          Aliases:
                          
                          Host pfsense.home.arpa not found: 3(NXDOMAIN)
                          Received 122 bytes from 127.0.0.1#53 in 0 ms
                          [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: unbound-control -c /var/unbound/unbound.conf lookup pfsense.home.arpa.
                          The following name servers are used for lookup of pfsense.home.arpa.
                          forwarding request:
                          Delegation with 0 names, of which 0 can be examined to query further addresses.
                          It provides 1 IP addresses.
                          192.168.10.1            not in infra cache.
                          

                          For the host command, I'm not sure why it still added the condo.arpa suffix even though I already have the trailing . in the hostname to query. And still no states in the WG interface which means it still thinks that the SOA is itself and is keeping the query locally.

                          Packet capture on the localhost interface with full details:

                          10:43:19.477214 AF IPv4 (2), length 67: (tos 0x0, ttl 64, id 26074, offset 0, flags [none], proto UDP (17), length 63, bad cksum 0 (->16d2)!)
                              127.0.0.1.31622 > 127.0.0.1.53: [bad udp cksum 0xfe3e -> 0x3307!] 24173+ A? pfsense.home.arpa. (35)
                          10:43:19.477505 AF IPv4 (2), length 126: (tos 0x0, ttl 64, id 46346, offset 0, flags [none], proto UDP (17), length 122, bad cksum 0 (->c766)!)
                              127.0.0.1.53 > 127.0.0.1.31622: [bad udp cksum 0xfe79 -> 0x8d7b!] 24173 NXDomain* q: A? pfsense.home.arpa. 0/1/0 ns: home.arpa. SOA localhost. nobody.invalid. 1 3600 1200 604800 10800 (94)
                          

                          It's odd that it seems to be responding as if it is itself the SOA for home.arpa which it should not be based on the config.

                          Odd indeed. If I, say, another override for a fictitious domain (testdomain.arpa), it forwards the query to 192.168.10.1 as expected:

                          [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: host -v pfsense.testdomain.ar                                            pa. 127.0.0.1
                          Trying "pfsense.testdomain.arpa"
                          Trying "pfsense.testdomain.arpa.condo.arpa"
                          Using domain server:
                          Name: 127.0.0.1
                          Address: 127.0.0.1#53
                          Aliases:
                          
                          Host pfsense.testdomain.arpa not found: 3(NXDOMAIN)
                          Received 128 bytes from 127.0.0.1#53 in 1133 ms
                          [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: unbound-control -c /var/unbound/unbound.conf lookup pfsense.testdomain.arpa.
                          The following name servers are used for lookup of pfsense.testdomain.arpa.
                          forwarding request:
                          Delegation with 0 names, of which 0 can be examined to query further addresses.
                          It provides 1 IP addresses.
                          192.168.10.1            rto 461 msec, ttl 838, ping 29 var 108 rtt 461, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
                          

                          40a714ec-27b3-4873-a201-08f784216f1c-image.png

                          But if I include the following in the DNS Resolver custom options, I do get events in the logs:

                          log-queries: yes
                          log-replies: yes

                          That's the way to go, using those.

                          Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. CNAME IN NXDOMAIN 0.000000 1 94
                          Apr 21 11:23:45 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. CNAME IN
                          [...]
                          Apr 21 11:23:44 	unbound 	4156 	[4156:0] info: 127.0.0.1 pfsense.home.arpa. A IN NXDOMAIN 0.000000 1 94
                          Apr 21 11:23:44 	unbound 	4156 	[4156:0] info: 127.0.0.1 pfsense.home.arpa. A IN
                          Apr 21 11:23:41 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. A IN NXDOMAIN 0.000000 1 94
                          Apr 21 11:23:41 	unbound 	4156 	[4156:3] info: 127.0.0.1 pfsense.home.arpa. A IN 
                          

                          Similar to the above, it's all internal there when it should be hitting the domain override.

                          Here are mentions of home.arpa in all files in /var/unbound on site 2:

                          [2.7.0-DEVELOPMENT][root@pfSense.condo.arpa]/root: grep -R home.arpa /var/unbound
                          /var/unbound/unbound.conf:private-domain: "home.arpa"
                          /var/unbound/unbound.conf:domain-insecure: "home.arpa"
                          /var/unbound/domainoverrides.conf:      name: "home.arpa"
                          

                          That all looks OK, too.

                          For good measure you might want to stop the unbound daemon and then start it again to ensure it's fully dumping whatever is in its running configuration and caches.

                          Yeah, I did restart the DNS Resolver service (assuming this is the same as the unbound daemon) every time I change a setting/add additional logging.

                          Is it failing to work from site 2 and site 3 or just site 2? If it works at site 3 then comparing the configuration at Sites 2-3 would be better than looking at differences from site 1.

                          So when I reported this, the exact same thing is happening for both sites 2 and 3. And now that you asked, I tested again in site 3 and the override there is working as expected 🤷

                          In terms of configuration, the only difference between the two firewalls is that the site 3 DNS Resolver has DNS Query Forwarding (with outbound SSL/TLS, DoH) enabled because DNS resolving (directly querying root hints) does not work with the ISP over there. To test, I enabled forwarding in the site 2 DNS Resolver as well and it did not make a difference. So I don't think it's that one and this is expected because it shouldn't be interfering with domain overrides anyway.

                          Also, the obvious difference between the two firewalls is that the site 2 firewall is running at 2.7dev while the site 3 firewall is running at 2.5.2. I didn't put much thought on this when I posted this because, like I said, I'm 100% sure that the issue was also manifesting at site 3 in the past, though there's no way for me to prove that now.

                          EDIT: I should add that if the query for pfsense.home.arpa was leaking to a public DNS server you'd get a different SOA, like this:

                          16:00:38.356296 IP (tos 0x0, ttl 64, id 54797, offset 0, flags [none], proto UDP (17), length 140, bad cksum 0 (->a651)!)
                              127.0.0.1.53 > 127.0.0.1.55623: [bad udp cksum 0xfe8b -> 0x89e1!] 39962 NXDomain q: A? pfsense.home.arpa. 0/1/0 ns: home.arpa. [1h] SOA prisoner.iana.org. hostmaster.root-servers.org. 1 1800 900 604800 604800 (112)
                          

                          Note that it shows SOA prisoner.iana.org. and not SOA localhost. like yours. And if it was Unbound in general doing that, mine should fail the same way yours is, but it isn't.

                          Yes, that's what I'm expecting too. There's something about the home.arpa domain that makes it behave this way.

                          Something else I noticed is that you have DNSSEC on. Not that it should make a difference here, but you should try with that disabled as well.

                          Already tried that and no difference.

                          1 Reply Last reply Reply Quote 0
                          • K Offline
                            kevindd992002
                            last edited by

                            @jimp do you have any other ideas at this point?

                            1 Reply Last reply Reply Quote 0
                            • jimpJ Offline
                              jimp Rebel Alliance Developer Netgate
                              last edited by

                              Nothing comes to mind, except for maybe stripping out everything in the DNS Resolver config and testing it bit by bit, but there isn't much there to do that for. Make it resemble the other site as closely as possible at least. Enable forwarding, disable DNSSEC, etc.

                              Somehow it's getting the idea in its head that the domain in question is local when it shouldn't be, but it's not clear where that's coming from.

                              Dump the whole infra cache and see what other entries are there:

                              unbound-control -c /var/unbound/unbound.conf dump_infra
                              

                              That or setup another test VM and see if it works from scratch and then add bits in one by one until it breaks and see what is doing it there. But that may be trickier.

                              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                              Need help fast? Netgate Global Support!

                              Do not Chat/PM for help!

                              1 Reply Last reply Reply Quote 0
                              • S Offline
                                SeaMonkey
                                last edited by

                                I was experiencing the same problem and discovered that the DNSBL feature of pfBlockerNG was preventing lookups to my domains configured in domain overrides. Unfortuately, these domains can't be added to the DNSBL whitelist, as they are considered invalid.

                                1 Reply Last reply Reply Quote 0
                                • S Offline
                                  stv_gag
                                  last edited by stv_gag

                                  Argh! I wasted a lot of time on this one before finding the solution.

                                  The problem is similar to yours...
                                  I'm using the latest version of pfsense... I have a pfsense on site #1 whose domain is home.arpa. I have another pfsense on site #2 whose domain is s2.home.arpa. IOf course, I want pfsense from site #2 to send DNS queries for home.arpa to the pfsense on site #1.

                                  No matter the request sent, I got an "NXDOMAIN" with nobody.invalid in the AUTHORITY section.

                                  I discovered that this is normal behavior for "unbound" (the DNS resolver). The solution is to indicate that the "home.arpa" domain should be set to nodefault... as indicated in the /usr/local/etc/unbound/unbound.conf file. However, I discovered that modifying this file won't help because pfsense does not use it.

                                  I was finally able to succeed by performing the following procedure, in DNS Resolver/General Settings...
                                  1- Display the customs options and add the following 2 lines (do a copy/paste to make sure it's OK)...
                                  server:
                                  local-zone: "home.arpa." nodefault

                                  2- In the "Domain Overrides" section, specify the pfsense IP address of site #1 as the DNS server for the "home.arpa" domain

                                  3- Restart the DNS resolver (or reboot pfsense))

                                  In my case, omitting step #2 (Domain Overrides) prevents the solution from working even if, in the pfsense on site #2, the pfsense IP address in site #1 is indicated in "General settings" and "DNS query forwarding" is activated.

                                  You can see the result in /var/unbound/unbound.conf

                                  Hope it helps !

                                  1 Reply Last reply Reply Quote 1
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.