DNS Resolver Sometimes Not Resolving Hosts



  • Hello,

    First(ish) post, so go easy on me;)

    I've got PfSense 2.2.5 running on Hyper-V on a server in another room. Unbound is hit and miss on my windows work stations.

    Sometime when I wake my main machine from sleep, my local DNS entries do not resolve or they resolve to their public IPs… This is very frustrating as loopback is very slow and blocked by most of my servers.

    I booted up a Windows 7 machine that has been idle since the inception of my PfSense machine and it resolved DNS entries just fine... then after about two minutes it quit working too!

    ipconfig /flushdns never helps.

    ipconfig /all shows that my DNS entries are:

    10.0.0.254 (PfSense)
    8.8.8.8
    8.8.4.4

    all seems good...

    Still can't ping LAN hosts. >:(

    Do a reboot and then it works fine...get up for a few minutes and then come back... DNS broken again. WTF!

    Logs for the resolver show it is running fine. Pretty sure OSX resolves the hosts all of the time.

    
    Nov 30 21:36:48	unbound: [10498:0] info: lower(secs) upper(secs) recursions
    Nov 30 21:36:48	unbound: [10498:0] info: 0.000000 0.000001 344
    Nov 30 21:36:48	unbound: [10498:0] info: 0.000256 0.000512 1
    Nov 30 21:36:48	unbound: [10498:0] info: 0.001024 0.002048 1
    Nov 30 21:36:48	unbound: [10498:0] info: 0.004096 0.008192 4
    Nov 30 21:36:48	unbound: [10498:0] info: 0.008192 0.016384 252
    Nov 30 21:36:48	unbound: [10498:0] info: 0.016384 0.032768 222
    Nov 30 21:36:48	unbound: [10498:0] info: 0.032768 0.065536 1372
    Nov 30 21:36:48	unbound: [10498:0] info: 0.065536 0.131072 2786
    Nov 30 21:36:48	unbound: [10498:0] info: 0.131072 0.262144 4253
    Nov 30 21:36:48	unbound: [10498:0] info: 0.262144 0.524288 2696
    Nov 30 21:36:48	unbound: [10498:0] info: 0.524288 1.000000 903
    Nov 30 21:36:48	unbound: [10498:0] info: 1.000000 2.000000 267
    Nov 30 21:36:48	unbound: [10498:0] info: 2.000000 4.000000 268
    Nov 30 21:36:48	unbound: [10498:0] info: 4.000000 8.000000 112
    Nov 30 21:36:48	unbound: [10498:0] info: 8.000000 16.000000 24
    Nov 30 21:36:48	unbound: [10498:0] info: 16.000000 32.000000 3
    Nov 30 21:36:49	unbound: [49482:0] notice: init module 0: validator
    Nov 30 21:36:49	unbound: [49482:0] notice: init module 1: iterator
    Nov 30 21:36:49	unbound: [49482:0] info: start of service (unbound 1.5.4).
    Nov 30 21:37:59	unbound: [49482:0] info: service stopped (unbound 1.5.4).
    Nov 30 21:37:59	unbound: [49482:0] info: server stats for thread 0: 45 queries, 34 answers from cache, 11 recursions, 0 prefetch
    Nov 30 21:37:59	unbound: [49482:0] info: server stats for thread 0: requestlist max 3 avg 0.727273 exceeded 0 jostled 0
    Nov 30 21:37:59	unbound: [49482:0] info: average recursion processing time 0.584698 sec
    Nov 30 21:37:59	unbound: [49482:0] info: histogram of recursion processing times
    Nov 30 21:37:59	unbound: [49482:0] info: [25%]=0.180224 median[50%]=0.393216 [75%]=1.3125
    Nov 30 21:37:59	unbound: [49482:0] info: lower(secs) upper(secs) recursions
    Nov 30 21:37:59	unbound: [49482:0] info: 0.065536 0.131072 2
    Nov 30 21:37:59	unbound: [49482:0] info: 0.131072 0.262144 2
    Nov 30 21:37:59	unbound: [49482:0] info: 0.262144 0.524288 3
    Nov 30 21:37:59	unbound: [49482:0] info: 1.000000 2.000000 4
    Nov 30 21:38:00	unbound: [89061:0] notice: init module 0: validator
    Nov 30 21:38:00	unbound: [89061:0] notice: init module 1: iterator
    Nov 30 21:38:00	unbound: [89061:0] info: start of service (unbound 1.5.4).
    Nov 30 21:40:06	unbound: [89061:0] info: service stopped (unbound 1.5.4).
    Nov 30 21:40:06	unbound: [89061:0] info: server stats for thread 0: 37 queries, 2 answers from cache, 35 recursions, 0 prefetch
    Nov 30 21:40:06	unbound: [89061:0] info: server stats for thread 0: requestlist max 2 avg 0.142857 exceeded 0 jostled 0
    Nov 30 21:40:06	unbound: [89061:0] info: average recursion processing time 0.284517 sec
    Nov 30 21:40:06	unbound: [89061:0] info: histogram of recursion processing times
    Nov 30 21:40:06	unbound: [89061:0] info: [25%]=0.148236 median[50%]=0.20285 [75%]=0.257463
    Nov 30 21:40:06	unbound: [89061:0] info: lower(secs) upper(secs) recursions
    Nov 30 21:40:06	unbound: [89061:0] info: 0.032768 0.065536 1
    Nov 30 21:40:06	unbound: [89061:0] info: 0.065536 0.131072 5
    Nov 30 21:40:06	unbound: [89061:0] info: 0.131072 0.262144 21
    Nov 30 21:40:06	unbound: [89061:0] info: 0.262144 0.524288 5
    Nov 30 21:40:06	unbound: [89061:0] info: 0.524288 1.000000 1
    Nov 30 21:40:06	unbound: [89061:0] info: 1.000000 2.000000 2
    Nov 30 21:40:07	unbound: [38030:0] notice: init module 0: validator
    Nov 30 21:40:07	unbound: [38030:0] notice: init module 1: iterator
    Nov 30 21:40:07	unbound: [38030:0] info: start of service (unbound 1.5.4).
    
    

    I've set windows so it cannot disable my network card on sleep.  I've even deleted all adapters so all that is remaining is my two onboard ports… Was the solution in other threads.

    DNS resolver is configured as follows:

    Enable: yes
    Network Interfaces: All
    Outgoing Network Interfaces: All
    DNSSEC: yes
    DNS Query Forwarding: yes
    DHCP Reg: No
    Static DHCP: Yes
    TXT Comment Support: No
    Nothing in advanced
    Host Overrides: various servers and domains.
    Everything else is pretty much default, and I'd imagine not causing the issue...

    Any help would be appreciated!

    Edit: Now my ping works after submitting this post. This is super weird... Please help!

    
    Pinging pfsense.domain.ca [public-ip] with 32 bytes of data:
    Reply from public-ip: bytes=32 time=1ms TTL=254
    Reply from public-ip: bytes=32 time=1ms TTL=254
    
    Ping statistics for public-ip:
        Packets: Sent = 2, Received = 2, Lost = 0 (0% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 1ms, Maximum = 1ms, Average = 1ms
    Control-C
    ^C
    C:\Users\domain>ping pfsense
    
    Pinging pfsense.domain.ca [10.0.0.254] with 32 bytes of data:
    Reply from 10.0.0.254: bytes=32 time<1ms TTL=64
    Reply from 10.0.0.254: bytes=32 time<1ms TTL=64
    Reply from 10.0.0.254: bytes=32 time<1ms TTL=64
    
    Ping statistics for 10.0.0.254:
        Packets: Sent = 3, Received = 3, Lost = 0 (0% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 0ms, Maximum = 0ms, Average = 0ms
    Control-C
    
    

  • LAYER 8 Netgate

    You cannot mix your private and public name servers on a host.

    There is no guarantee which DNS server your host is going to use.

    Change your hosts to only use pfSense to resolve names and all your problems will vanish.



  • I've noticed something that I suspect may be similar so I figured I would add it to this thread (apologies if I'm misunderstanding something instead!). As some background:

    I have Unbound enabled as the DNS resolver:
    Network Interfaces: All
    Outgoing Network Interfaces: LAN
    DNS Query Forwarding: Enabled
    (That's about it for the configuration, the rest is default).

    System > General Setup -
    DNS Servers: [Internal DNS Server IP, using LAN gateway]

    Clients on the LAN network query the Internal DNS directly. I push the IP of the internal DNS for clients that connect using OpenVPN, so they should be querying the internal DNS directly as well.

    As such, it might well be that I don't need to have either the forwarder or the resolver enabled on the pfSense (since nothing is really asking the pfSense directly). However, it is handy to be able to resolve internal and external FQDN's from the firewall for ping, traceroute and via Drill and for this reason i assume i need to have either the forwarder or resolver configured.

    So with the background out of the way, my story:

    When I use drill to query an internal FQDN, presumably Unbound forwards the request to the internal DNS as configured. However, what I'm seeing is that approximately a tenth of the time, rather than seeing a correct resolution as reported by the internal DNS, i see a blank resolution from 127.0.0.1.

    The command I'm using is:

    drill fully.qualified.internal.domain
    

    But occasionally I get:

    $ drill fully.qualified.internal.domain
    ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 52900
    ;; flags: qr rd ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 
    ;; QUESTION SECTION:
    ;; fully.qualified.internal.domain.	IN	A
    
    ;; ANSWER SECTION:
    
    ;; AUTHORITY SECTION:
    
    ;; ADDITIONAL SECTION:
    
    ;; Query time: 12 msec
    ;; SERVER: 127.0.0.1
    ;; WHEN: Tue Dec  1 17:06:44 2015
    ;; MSG SIZE  rcvd: 42
    
    

    Yet 9 times out of ten i see the correct resolution from the internal DNS (this is when running exactly the same drill command and just hitting 'execute' again).

    Am I missing a setting somewhere?


  • LAYER 8 Netgate

    You'll need to provide more details about what you've done. Are you using DNS Resolver host overrides? Domain overrides? No overrides?



  • Hi,

    Sorry, here are some further details:

    There are no overrides in place for either domains or hosts. Nothing additional is specified in the Advanced box of the General Settings.

    The Advanced settings tab are all defaults.

    Under the Access Lists I have created a single record that contains all the subnets that will be communicating with the firewall from the LAN side (including the OpenVPN tunnel subnet) and set it with an Allow action.


  • LAYER 8 Netgate

    How are you doing internal domains with no overrides? Where is the DNS zone authority?



  • The answer probably comes back either to the setup outlined above, or
    I'm doing something wrong, or
    my terminology might be misleading (I have the potentially confusing habit of using 'Internal DNS' to mean 'the DNS server on the internal network' {i.e. the LAN} but it just struck me that 'Internal DNS' might be read by others as meaning 'hosting DNS records on the pfSense itself' so I'll make an effort here to be a little more explicit in the answer below).

    Basically no client is configured to direct DNS queries at the pfSense (2.2.5 by the way) itself. Hosts on the LAN direct DNS queries to the DNS server on the LAN. OpenVPN clients have a tunnel to the LAN and direct queries to the LAN DNS (as configured in their OpenVPN config). As such, for day to day purposes the firewall doesn't have much to do with DNS (my issue comes about when using some of the diagnostic tools built into the pfSense to reach devices via their FQDN).

    For my setup, if the firewall were to have a DNS query directed at it then all it needs to know how to do is forward the request to the DNS server on the LAN which will then either have the record, or act as a recursive resolver for the query (hence why I have forwarding enabled in Unbound). The LAN DNS manages the resolution of all the private DNS hostnames (i.e. it holds the A, AAAA records etc, for hosts on the LAN domain) and if it can't resolve the FQDN directly (i.e. the query is for something in the domains that the LAN DNS is not authoritative for) then the LAN DNS will recursively resolve the query via the public DNS infrastructure.

    Looking at the definitions of the two functions you mentioned:

    Host Overrides allows creation of custom DNS responses/records to create new entries that do not exist in DNS outside the firewall, or to override DNS responses for other hosts.
    

    Host overrides don't sound applicable to my situation as the records exist on the LAN DNS server (and they are valid, they do not need to be overridden).

    Domain Overrides are for domains that should be queried by a specific remote server. For example, if all records for mysite.example.com exist on a private DNS server at 192.0.2.5, then a domain override can be set to forward all queries for that domain to that server.
    

    Again, this is not required day to day since DNS queries are directed at the LAN DNS which knows what domain(s) it is authoritative for, and knows how to resolve queries for domains that it is not authoritative for.

    Based on the (hopefully clearer) description of my environment, does it sound to you like I do need entries in either of these areas given that Unbound is being asked to operate as a forwarder?

    EDIT:

    Anyway, all of this a bit of an aside and is hijacking the OP's thread so I will butt-out again. The intent was to mention that I too am seeing potential DNS resolution issues with Unbound, in my case when using drill on the pfSense command prompt (all the rest of my waffle was to give some context so that smarter might be able to work out if its a bug, or a config mistake I have made). Thanks all.


  • LAYER 8 Global Moderator

    One thing you need to understand about a resolver… It has to walk the tree to resolve someting, roots, ns for tld, name server for domain in question, etc..  if your doing a cold resolve and especially if say the ns for a specific domain are on the other side of the planet from you are just plain suck in response, etc.  then its quite possible a query might time out the first time if the client wants it FAST and doesn't wait long enough.

    You have added just a little bit more time to that since your clients are asking your internal, who then asks unbound on pfsense.

    Now to the OP issue...  Here is a PROBLEM!!

    10.0.0.254 (PfSense)
    8.8.8.8
    8.8.4.4

    if you want your client to resolve your internal hosts -- then the ONLY dns they should point to is dns that knows about your local hosts... if you ask 8.8.8.8 he is not going to know shit about your local hosts local IPs..

    Point your clients to ONLY your internal, let your internal look up the stuff it doesn't know about from external.  Pointing clients to name servers that don't contain the same info is just asking for issues since you can never be sure which dns the client will ask..



  • Hey Guys,

    I was actually waiting for an email with a reply… turns out this forum doesn't send emails haha.

    Thank you all very much for your responses. I still got to read through them more carefully.

    I'm away from home currently, but I'll be sure to remove Google's DNS servers from the list when a client requests an IP and such... when I return.

    Does the resolver forward external queries (say twitter.com) to the DNS servers configured in system>advanced? Or is that a forwarder thing?


  • LAYER 8 Netgate

    Forwarder. The resolver resolves from the root down. Described as walking the tree earlier.


  • LAYER 8 Global Moderator

    "turns out this forum doesn't send emails haha."

    What?? Yeah it does.. Did you setup notifications?

    "Does the resolver forward external queries"

    <rolleyes>JFC…  maybe there should just be a simple test before you allow the resolver to be turned on in pfsense.. Answer question of the difference between a resolver and forwarder and if they don't get it right they can not enable it..  Maybe should just go back to the forwarder as default because use of an actual resolver just seems way to complicated for what is sad to say a large portion of the user group... Its like the basic concept has to be explained every freaking day...</rolleyes>



  • I have a similar issue, not sure… but I do point hosts to only pfsense DNS...

    When I try to ask the DNS about a local domain by nslookup i got:

    
     nslookup gmail.com 192.168.0.1
    ;; connection timed out; no servers could be reached
    
    

    however the service is up! after reboot TWICE… magically it resolves everything as should,
    resolver log says:

    Dec 6 11:02:27 	unbound: [17320:0] info: start of service (unbound 1.5.4).
    Dec 6 11:02:27 	unbound: [17320:0] info: service stopped (unbound 1.5.4).
    Dec 6 11:02:27 	unbound: [17320:0] info: start of service (unbound 1.5.4).
    Dec 6 11:02:26 	unbound: [17320:0] info: service stopped (unbound 1.5.4).
    Dec 6 11:02:26 	unbound: [17320:0] info: start of service (unbound 1.5.4).
    Dec 6 10:50:05 	unbound: [58211:0] info: start of service (unbound 1.5.4).
    Dec 6 10:50:05 	unbound: [58211:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:50:04 	unbound: [58211:0] info: start of service (unbound 1.5.4).
    Dec 6 10:50:02 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:49:40 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:49:40 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:49:09 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:49:09 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:48:46 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:48:46 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:48:43 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:48:43 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:48:11 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:48:11 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:47:56 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:47:56 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:47:37 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:47:37 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:47:10 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:47:10 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:47:04 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:47:04 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:46:50 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:46:50 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:46:42 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:46:41 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:46:39 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:46:39 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:46:31 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:46:31 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:46:11 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:46:11 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:45:59 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:45:59 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:45:54 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:45:54 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:45:45 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:45:44 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:45:40 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:45:40 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:45:35 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:45:35 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:45:31 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    Dec 6 10:45:31 	unbound: [77165:0] info: service stopped (unbound 1.5.4).
    Dec 6 10:45:25 	unbound: [77165:0] info: start of service (unbound 1.5.4).
    

  • LAYER 8 Global Moderator

    well this is not that its not returning an answer because it didn't know or couldn't find – that looks like you just got a timeout.. Because it sure looks like your unbound is starting and stopping all the time.. So either it was off you asked.

    Uncheck "Register DHCP leases in the DNS Resolver" in the resolver settings and see if that helps it from starting and stopping every few minutes.


Log in to reply