Domain override broken since 2.4.4 p2



  • Hi,
    we configured a single domain override in DNS resolver which will forward DNS queries for an internal domain to a Windows DNS server in a different subnet. This was working fine until we upgraded from 2.4.4 p1 to p2.
    Since then, whe have sporadic outages on DNS answers for this internal domain. I can prove this as we use PRTG to monitor this resolution. It will usually fail for a couple of hours (answering "non-existent domain") and then go back to normal.
    For the time being, I "solved" this issue by setting a couple of host overrides for the most important servers (which - apparently - works fine).
    Are you aware of this issue or what could we try to resolve it? Thanks in advance!



  • Hi,

    @luas said in Domain override broken since 2.4.4 p2:

    It will usually fail for a couple of hours (answering "non-existent domain") and then go back to normal.

    You mean : the resolvers (unbound) plays dead for a couple of hours, and wakes up afterwards ?
    Not aware of such an issue - niether on this forum : such an issue would "kill" many networks. Never saw that before.
    You need to post far more (because none right now)) details.


  • LAYER 8 Global Moderator

    Sorry but this works just fine.. As stated your going to need to post some actual info.. If you want any help.

    For a starters lets see the infra cache for the domain in question you have overrriden.

    It prob caching neg response when it gets that from your NS your pointing to.. So no its not going to go ask it until that neg cache has expired.. if it got back NX.. its not going to go ask again and again and again until the cache has expired for that nx, etc.



  • @Gertjan said in Domain override broken since 2.4.4 p2:

    You mean : the resolvers (unbound) plays dead for a couple of hours, and wakes up afterwards ?

    No, unbound works fine all the time - except the override. So, let's say netgate.com will be resolved perfectly at any time, but mydomain.local will sporadically fail to be resolved.

    Sorry for not providing detailed information. I'm not sure what you mean by infra cache, but I found this thread which seems to touch a similar topic.

    At first I tried at Diagnostics/DNS Lookup. It says:

    Host "myserver.mydomain.local" could not be resolved.
    

    With unbound-control, this is what I get:

    [2.4.4-RELEASE][admin@pfSense.mynet]/root: unbound-control -c /var/unbound/unbound.conf lookup myserver.mydomain.local
    The following name servers are used for lookup of myserver.mydomain.local.
    forwarding request:
    Delegation with 0 names, of which 0 can be examined to query further addresses.
    It provides 1 IP addresses.
    192.168.0.10            not in infra cache.
    

    192.168.0.10 is the correct server that should be queried.

    I also turned up logging to level 3 and this is what I find in Status>System Logs>System>DNS Resolver during an outage:

    Mar 18 15:40:11	unbound	24418:0	debug: cache memory msg=1425703 rrset=2547800 infra=1271747 val=221803
    Mar 18 15:40:11	unbound	24418:0	info: validation success myserver.mydomain.local.mynet. CNAME IN
    Mar 18 15:40:11	unbound	24418:0	info: validate(nxdomain): sec_status_secure
    Mar 18 15:40:11	unbound	24418:0	info: validator operate: query myserver.mydomain.local.mynet. CNAME IN
    Mar 18 15:40:11	unbound	24418:0	debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
    Mar 18 15:40:11	unbound	24418:0	info: finishing processing for myserver.mydomain.local.mynet. CNAME IN
    Mar 18 15:40:11	unbound	24418:0	info: resolving myserver.mydomain.local.mynet. CNAME IN
    Mar 18 15:40:11	unbound	24418:0	debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
    Mar 18 15:40:11	unbound	24418:0	info: validator operate: query myserver.mydomain.local.mynet. CNAME IN
    Mar 18 15:40:11	unbound	24418:0	debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
    Mar 18 15:40:11	unbound	24418:0	debug: cache memory msg=1425703 rrset=2547800 infra=1271747 val=221803
    Mar 18 15:40:11	unbound	24418:0	info: validation success myserver.mydomain.local. CNAME IN
    Mar 18 15:40:11	unbound	24418:0	info: validate(nxdomain): sec_status_secure
    Mar 18 15:40:11	unbound	24418:0	info: validator operate: query myserver.mydomain.local. CNAME IN
    Mar 18 15:40:11	unbound	24418:0	debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
    Mar 18 15:40:11	unbound	24418:0	info: finishing processing for myserver.mydomain.local. CNAME IN
    Mar 18 15:40:11	unbound	24418:0	info: resolving myserver.mydomain.local. CNAME IN
    Mar 18 15:40:11	unbound	24418:0	debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
    Mar 18 15:40:11	unbound	24418:0	info: validator operate: query myserver.mydomain.local. CNAME IN
    Mar 18 15:40:11	unbound	24418:0	debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
    Mar 18 15:40:11	unbound	24418:0	debug: cache memory msg=1425703 rrset=2547800 infra=1271747 val=221803
    Mar 18 15:40:11	unbound	24418:0	info: validation success myserver.mydomain.local.mynet. AAAA IN
    Mar 18 15:40:11	unbound	24418:0	info: validate(nxdomain): sec_status_secure
    Mar 18 15:40:11	unbound	24418:0	info: validator operate: query myserver.mydomain.local.mynet. AAAA IN
    Mar 18 15:40:11	unbound	24418:0	debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
    Mar 18 15:40:11	unbound	24418:0	info: finishing processing for myserver.mydomain.local.mynet. AAAA IN
    Mar 18 15:40:11	unbound	24418:0	info: resolving myserver.mydomain.local.mynet. AAAA IN
    Mar 18 15:40:11	unbound	24418:0	debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
    Mar 18 15:40:11	unbound	24418:0	info: validator operate: query myserver.mydomain.local.mynet. AAAA IN
    Mar 18 15:40:11	unbound	24418:0	debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
    Mar 18 15:40:11	unbound	24418:0	debug: cache memory msg=1425703 rrset=2547800 infra=1271747 val=221803
    Mar 18 15:40:11	unbound	24418:0	info: validation success myserver.mydomain.local. AAAA IN
    Mar 18 15:40:11	unbound	24418:0	info: validate(nxdomain): sec_status_secure
    Mar 18 15:40:11	unbound	24418:0	info: validator operate: query myserver.mydomain.local. AAAA IN
    Mar 18 15:40:11	unbound	24418:0	debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
    Mar 18 15:40:11	unbound	24418:0	info: finishing processing for myserver.mydomain.local. AAAA IN
    Mar 18 15:40:11	unbound	24418:0	info: resolving myserver.mydomain.local. AAAA IN
    Mar 18 15:40:11	unbound	24418:0	debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
    Mar 18 15:40:11	unbound	24418:0	info: validator operate: query myserver.mydomain.local. AAAA IN
    Mar 18 15:40:11	unbound	24418:0	debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
    Mar 18 15:40:11	unbound	24418:0	debug: cache memory msg=1425703 rrset=2547800 infra=1271747 val=221803
    Mar 18 15:40:11	unbound	24418:0	info: validation success myserver.mydomain.local.mynet. A IN
    Mar 18 15:40:11	unbound	24418:0	info: validate(nxdomain): sec_status_secure
    Mar 18 15:40:11	unbound	24418:0	info: validator operate: query myserver.mydomain.local.mynet. A IN
    Mar 18 15:40:11	unbound	24418:0	debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
    Mar 18 15:40:11	unbound	24418:0	info: finishing processing for myserver.mydomain.local.mynet. A IN
    Mar 18 15:40:11	unbound	24418:0	info: resolving myserver.mydomain.local.mynet. A IN
    Mar 18 15:40:11	unbound	24418:0	debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
    Mar 18 15:40:11	unbound	24418:0	info: validator operate: query myserver.mydomain.local.mynet. A IN
    

    I hope this is of any help.
    Thanks in advance, I appreciate your support!


  • LAYER 8 Global Moderator

    @luas said in Domain override broken since 2.4.4 p2:

    myserver.mydomain.local.mynet. CNAME IN

    So your wanting to lookup myserver.mydomain.local - which is a cname that points to

    myserver.mydomain.local.mynet

    Do you have an override for that, if not then your going to get a NX since I highly doubt that would resolve on the public net, etc.

    You could bump your loading to higher level, so you see actual queries done, etc.

    Give me a few minutes and I will fire up my 2k12r2 VM and setup dns on it with domain override to duplicate what your doing..

    To help understand your problem.

    Please list the fqdn your trying to resolve.. I take it is myserver.mydomain.local, or did you swap that out to obfuscate your actual FQDN? Lets see q query directly to this 192.168.0.10 to see what it answers back, etc.

    Here I turned on my windows vm running dns, it already had a forward setup for home.lan

    So I can query record host.home.lan directly

    $ dig @192.168.2.220 host.home.lan
    
    ; <<>> DiG 9.12.3-P1 <<>> @192.168.2.220 host.home.lan
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39291
    ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4000
    ; COOKIE: 0f1a65e55e7bf819 (echoed)
    ;; QUESTION SECTION:
    ;host.home.lan.                 IN      A
    
    ;; ANSWER SECTION:
    host.home.lan.          60      IN      A       10.10.10.10
    
    ;; Query time: 4 msec
    ;; SERVER: 192.168.2.220#53(192.168.2.220)
    ;; WHEN: Mon Mar 18 10:12:57 Central Daylight Time 2019
    ;; MSG SIZE  rcvd: 70
    

    If I setup domain override, I can then query pfsense for that same host

    $ dig @192.168.9.253 host.home.lan
    
    ; <<>> DiG 9.12.3-P1 <<>> @192.168.9.253 host.home.lan
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 547
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4096
    ;; QUESTION SECTION:
    ;host.home.lan.                 IN      A
    
    ;; ANSWER SECTION:
    host.home.lan.          3600    IN      A       10.10.10.10
    
    ;; Query time: 404 msec
    ;; SERVER: 192.168.9.253#53(192.168.9.253)
    ;; WHEN: Mon Mar 18 10:15:59 Central Daylight Time 2019
    ;; MSG SIZE  rcvd: 58
    

    I set my log level to 3 in unbound, and you can see who it asked, etc.

    Mar 18 10:15:59 	unbound 	83709:2 	info: validator operate: query host.home.lan. A IN
    Mar 18 10:15:59 	unbound 	83709:2 	debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
    Mar 18 10:15:59 	unbound 	83709:2 	info: finishing processing for host.home.lan. A IN
    Mar 18 10:15:59 	unbound 	83709:2 	info: query response was ANSWER
    Mar 18 10:15:59 	unbound 	83709:2 	info: reply from <home.lan.> 192.168.2.220#53
    Mar 18 10:15:59 	unbound 	83709:2 	info: response for host.home.lan. A IN
    Mar 18 10:15:59 	unbound 	83709:2 	info: iterator operate: query host.home.lan. A IN
    Mar 18 10:15:59 	unbound 	83709:2 	debug: iterator[module 1] operate: extstate:module_wait_reply event:module_event_reply
    Mar 18 10:15:59 	unbound 	83709:2 	debug: cache memory msg=72284 rrset=105364 infra=19043 val=72244
    Mar 18 10:15:59 	unbound 	83709:2 	debug: sending to target: <home.lan.> 192.168.2.220#53
    Mar 18 10:15:59 	unbound 	83709:2 	info: sending query: host.home.lan. A IN
    Mar 18 10:15:59 	unbound 	83709:2 	info: processQueryTargets: host.home.lan. A IN
    Mar 18 10:15:59 	unbound 	83709:2 	info: resolving host.home.lan. A IN
    Mar 18 10:15:59 	unbound 	83709:2 	debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
    Mar 18 10:15:59 	unbound 	83709:2 	info: validator operate: query host.home.lan. A IN 
    

    If I look in unbound for its cache, I see

    [2.4.4-RELEASE][admin@sg4860.local.lan]/: unbound-control -c /var/unbound/unbound.conf lookup host.home.lan
    The following name servers are used for lookup of host.home.lan.
    forwarding request:
    Delegation with 0 names, of which 0 can be examined to query further addresses.
    It provides 1 IP addresses.
    192.168.2.220           rto 284 msec, ttl 721, ping 0 var 71 rtt 284, tA 0, tAAAA 0, tother 0, EDNS 0 assumed.
    

    See how it has infra cache for the 2.220 NS I have set as override, because it actually talked to it when I did query for host.home.lan



  • You should never use .local as this will conflict with mDNS and can cause issues. pfSense does actually warn you about that in the general settings:
    domain settings.jpg


  • LAYER 8 Global Moderator

    Yeah not a fan of that myself, but looks like his not just using single lable .local, but mydomain.local

    But yeah apple broke use of tld .local for everyone ;)



  • @johnpoz said in Domain override broken since 2.4.4 p2:

    Yeah not a fan of that myself, but looks like his not just using single lable .local, but mydomain.local

    As long as it ends in .local it will conflict.


  • LAYER 8 Global Moderator

    Yeah if he is a apple user he could have problems. I agree it's a BAD choice to use.. But still not sure exactly what he has setup, because of the mydomain.local.mynet cname?



  • @johnpoz said in Domain override broken since 2.4.4 p2:

    Yeah if he is a apple user he could have problems.

    Windows 10 has mDNS support included per default in it's newer builds too.

    Back to the OP: Post the details your DNS config, as longs as it's all about internal domains it shouldn't be a problem.


  • LAYER 8 Global Moderator

    Not in mine - I turn that shit off right out of the box ;) hehehe



  • Hi, thanks for your elaborate response!
    Yes, mydomain.local was for obfuscation. In detail, we have two subnets:
    192.168.96.0/20 (.mynet) with a couple of clients which are allowed to access subnet2:
    192.168.0.1/24 (.mydomain.local) - This subnet contains a Windows domain, and 192.168.0.10 is one of its DNS servers.

    I agree that .mydomain.local.mynet looks confusing; I guess that pfsense adds .mynet because I entered it in System>General Setup>Domain. However, this has done no harm in the past. It is not an issue as long as domain override works.

    @johnpoz said in Domain override broken since 2.4.4 p2:

    So your wanting to lookup myserver.mydomain.local - which is a cname that points to
    myserver.mydomain.local.mynet
    Do you have an override for that, if not then your going to get a NX since I highly doubt that would resolve on the public net, etc.

    I have a domain override for mydomain.local, yes. At this very moment, it is working again after a 2 hour outage:

    Mar 18 17:37:38	unbound	24418:0	info: validation success myserver.mydomain.local.mynet. CNAME IN
    Mar 18 17:37:38	unbound	24418:0	info: validate(nxdomain): sec_status_secure
    Mar 18 17:37:38	unbound	24418:0	info: validator operate: query myserver.mydomain.local.mynet. CNAME IN
    Mar 18 17:37:38	unbound	24418:0	debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
    Mar 18 17:37:38	unbound	24418:0	info: finishing processing for myserver.mydomain.local.mynet. CNAME IN
    Mar 18 17:37:38	unbound	24418:0	info: resolving myserver.mydomain.local.mynet. CNAME IN
    Mar 18 17:37:38	unbound	24418:0	debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
    Mar 18 17:37:38	unbound	24418:0	info: validator operate: query myserver.mydomain.local.mynet. CNAME IN
    Mar 18 17:37:38	unbound	24418:0	debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
    Mar 18 17:37:38	unbound	24418:0	debug: cache memory msg=2625586 rrset=4532884 infra=2576142 val=176232
    Mar 18 17:37:38	unbound	24418:0	info: validation success myserver.mydomain.local. CNAME IN
    Mar 18 17:37:38	unbound	24418:0	info: validate(nxdomain): sec_status_secure
    Mar 18 17:37:38	unbound	24418:0	info: validator operate: query myserver.mydomain.local. CNAME IN
    Mar 18 17:37:38	unbound	24418:0	debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
    Mar 18 17:37:38	unbound	24418:0	info: finishing processing for myserver.mydomain.local. CNAME IN
    Mar 18 17:37:38	unbound	24418:0	info: resolving myserver.mydomain.local. CNAME IN
    Mar 18 17:37:38	unbound	24418:0	debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
    Mar 18 17:37:38	unbound	24418:0	info: validator operate: query myserver.mydomain.local. CNAME IN
    Mar 18 17:37:38	unbound	24418:0	debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
    Mar 18 17:37:38	unbound	24418:0	debug: cache memory msg=2625586 rrset=4532884 infra=2576142 val=176232
    Mar 18 17:37:38	unbound	24418:0	info: validation success myserver.mydomain.local.mynet. AAAA IN
    Mar 18 17:37:38	unbound	24418:0	info: validate(nxdomain): sec_status_secure
    Mar 18 17:37:38	unbound	24418:0	info: validator operate: query myserver.mydomain.local.mynet. AAAA IN
    Mar 18 17:37:38	unbound	24418:0	debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
    Mar 18 17:37:38	unbound	24418:0	info: finishing processing for myserver.mydomain.local.mynet. AAAA IN
    Mar 18 17:37:38	unbound	24418:0	info: resolving myserver.mydomain.local.mynet. AAAA IN
    Mar 18 17:37:38	unbound	24418:0	debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
    Mar 18 17:37:38	unbound	24418:0	info: validator operate: query myserver.mydomain.local.mynet. AAAA IN
    Mar 18 17:37:38	unbound	24418:0	debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
    Mar 18 17:37:38	unbound	24418:0	debug: cache memory msg=2625317 rrset=4532884 infra=2576142 val=176232
    Mar 18 17:37:38	unbound	24418:0	info: validation success myserver.mydomain.local. AAAA IN
    Mar 18 17:37:38	unbound	24418:0	info: validate(nxdomain): sec_status_secure
    Mar 18 17:37:38	unbound	24418:0	info: validator operate: query myserver.mydomain.local. AAAA IN
    Mar 18 17:37:38	unbound	24418:0	debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
    Mar 18 17:37:38	unbound	24418:0	info: finishing processing for myserver.mydomain.local. AAAA IN
    Mar 18 17:37:38	unbound	24418:0	info: resolving myserver.mydomain.local. AAAA IN
    Mar 18 17:37:38	unbound	24418:0	debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
    Mar 18 17:37:38	unbound	24418:0	info: validator operate: query myserver.mydomain.local. AAAA IN
    

    The response to the command given above slightly changed:

    [2.4.4-RELEASE][admin@pfSense.zeggnet]/root: unbound-control -c /var/unbound/unbound.conf lookup myserver.mydomain.local
    The following name servers are used for lookup of myserver.mydomain.local.
    forwarding request:
    Delegation with 0 names, of which 0 can be examined to query further addresses.
    It provides 1 IP addresses.
    192.168.0.10            expired, rto 29307524 msec, tA 0 tAAAA 0 tother 0.
    

    Does this "expired" have to bother me? (anyway this is the response I get while it's functional)


  • LAYER 8 Global Moderator

    And no where do I see unbound actually talking to 0.10

    As to working... You sure your clients are just asking it directly... Have seen it 1000 times, client has multiple NS listed..

    if you Level 3 of logging and unbound actually asks and gets a response it would be logged. See my tests above.

    If your clients are in the .mynet domain, then yeah they will add that to search suffix, etc..

    Here is what I can tell you for sure - it works, its works well, it works easy... You need to figure out what you have wrong to why its not working.



  • I agree that your responses look different, pointing out the nameserver talking to. No idea why this is not pointed out in my logs, even though setting the same log level as you did.
    I tried to increase log level to 4 or 5, but this seems to overwhelm my machine (CPU going up and logs no longer recorded)

    @johnpoz said in Domain override broken since 2.4.4 p2:

    You sure your clients are just asking it directly

    For troubleshooting, I'm currently using Diagnostics>DNS lookup, so client configuration should not be an issue.

    One last thing that confuses me: what exactly does pfsense with multiple DNS servers configured?
    98ebd03c-aae3-4207-91df-9973b2c90079-image.png
    This request for an internal server should be answered by 127.0.0.1. It seems that externals servers are still queried (one per WAN line) and I won't get a response before all of them have answered. Shouldn't it be so that requests for internal domains should stay internal? Could this be part of the answer?

    Does it make sense to post my complete DNS configuration here?

    Again, thanks for your support!


  • LAYER 8 Global Moderator

    Looks like your forwarding, or have dns setup in general, or letting get dns from dhcp..

    If your using resolver.. You should have nothing setup in general for dns, you should make sure that you do not allow dns from dhcp.

    And yeah when you do diag like that all you should see is loopback 127.0.0.1



  • I agree by now that this is not a "domain override" issue, because I stumbled across a different problem these days.
    We use DNSBL, which seems to use its own certificate to break SSL encryption. As far as I can see, this certificate was generated during installation and worked for a couple of weeks, until it was suddenly rejected by browsers ("Certificate could not be verified as the issuer is unknown").
    As a quick solution, I disabled DNSBL and we haven't had domain override issues since then. I'm glad I didn't do a downgrade to 2.4.4 p1 :)

    So, this issue can happily be considered as solved, and I'll open a new request for the DNSBL thing...


Log in to reply