Dns Forwarder Issues



  • Currently trying to get DNS to forwards over a specific interface.

    The following is a packet capture from the interface in question. Using Pfsense own dns resolver tool, the first one works, the second one doesn't even though the capture shows two valid (and identical) replies.

    Works
    -----
    10.232.100.63.33903 > 10.232.100.27.53: [udp sum ok] 21904+ A? portal.cpn.vwg. (32)
    10.232.100.27.53 > 10.232.100.63.33903: [udp sum ok] 21904 q: A? portal.cpn.vwg. 1/0/0 portal.cpn.vwg. A 10.112.198.242 (48)
    
    Doesnt
    ------
    10.232.100.63.33090 > 10.232.3.131.53: [udp sum ok] 36518+ A? portal.cpn.vwg. (32)
    10.232.3.131.53 > 10.232.100.63.33090: [udp sum ok] 36518 q: A? portal.cpn.vwg. 1/0/0 portal.cpn.vwg. A 10.112.198.242 (48)
    
    

    Any ideas?



  • How do you know it doesn't work?  The output of both queries looks identical.



  • portal.cpn.vwg doesn't resolve when the DNS is set to 10.232.3.131



  • Your first DNS is on the same subnet as the client; the second is not.  You're allowing UDP traffic on port 53 between 10.232.100.0 and 10.232.3.0?  What's the netmask set to on both the DNS and the client?



  • The pfsense server is 10.233.105.10/26

    The interface I have to use for the dns query is 10.232.100.63/25

    There is a static route for 10.232.0.0/16 routed over the gateway 10.232.100.1

    I have Bypass firewall rules for traffic on the same interface and Disable DNS Rebinding Checks selected under system->advanced.


  • Rebel Alliance Global Moderator

    huh??

    "portal.cpn.vwg doesn't resolve when the DNS is set to 10.232.3.131"

    Sure looks like it resolves to me from this snip

    Doesnt
    –----
    10.232.100.63.33090 > 10.232.3.131.53: [udp sum ok] 36518+ A? portal.cpn.vwg. (32)
    10.232.3.131.53 > 10.232.100.63.33090: [udp sum ok] 36518 q: A? portal.cpn.vwg. 1/0/0 portal.cpn.vwg. A 10.112.198.242 (48)

    Clearly that is 10.232.3.131 sending answer to 10.232.100.63

    Where was that sniff taken?  And what are you thinking doesn't resolve it?



  • Sniffed on the pfsense server (10.232.100.63), I understand the its resolving on a packet level, but then why is pfsense completely ignoring the response from 10.232.3.131?

    Once the DNS is changed to 10.232.3.131 under general, all resolution for the .vwg fails, it only returns when 10.232.100.27 is used.



  • Could this be due to pfsense not being able to handle overlapping netblocks correctly? Is there likely to be a fix? I realise this is a very specific situation.

    Dig performed on the pfsense box:

    $ dig @10.232.3.132 portal.cpn.vwg any
    
    ; <<>> DiG 9.6.-ESV-R5-P1 <<>> @10.232.3.132 portal.cpn.vwg any
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56559
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
    
    ;; QUESTION SECTION:
    ;portal.cpn.vwg.			IN	ANY
    
    ;; ANSWER SECTION:
    portal.cpn.vwg.		771	IN	A	10.112.198.242
    
    ;; Query time: 70 msec
    ;; SERVER: 10.232.3.132#53(10.232.3.132)
    ;; WHEN: Tue Jan 13 19:41:02 2015
    ;; MSG SIZE  rcvd: 48
    


  • @spies:

    Could this be due to pfsense not being able to handle overlapping netblocks correctly?

    Presume you're referring to the /16 static route you have. That's fine, the most specific route always wins.

    There should be something in the resolver log if it's not accepting a reply for any reason.

    What are you doing to forward those queries with the DNS forwarder?



  • There's nothing in the resolver log relating to not accepting the response.

    I'm currently testing on the pfsense box itself using the dns lookup tool.

    I need to use the forwarder as pfsense doesn't do the dhcp for the network, a windows server handles AD/Exchange so the DNS has to go to that first, the server then forwards the DNS requests on to pfsense.

    I've tried BIND and Unbound as the forwarder and I get the same problem so it looks like its pfsense rather than the DNS module causing the issue.



  • BIND wouldn't resolve a non-public TLD unless you configure some forwarding for that domain. Unbound the same unless it's in forwarder mode and you only have internal name servers configured for forwarding.

    Check the destination MAC on the reply packets as shown in your first post, just because it shows up doesn't mean it's targeted to the correct MAC.



  • I will check this tomorrow and report back.



  • 09:03:10.717062 00:30:f1:12:fe:7e > f0:f7:55:b3:7b:e0, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 5320, offset 0, flags [none], proto UDP (17), length 60)
        10.232.100.63.1433 > 10.232.3.131.53: [udp sum ok] 59738+ A? portal.cpn.vwg. (32)
    09:03:10.717105 00:30:f1:12:fe:7e > f0:f7:55:b3:7b:e0, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 18571, offset 0, flags [none], proto UDP (17), length 60)
        10.232.100.63.31497 > 10.232.3.132.53: [udp sum ok] 59738+ A? portal.cpn.vwg. (32)
    09:03:10.789723 f0:f7:55:b3:7b:e0 > 00:30:f1:12:fe:7e, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 43, id 48181, offset 0, flags [none], proto UDP (17), length 76)
        10.232.3.132.53 > 10.232.100.63.31497: [udp sum ok] 59738 q: A? portal.cpn.vwg. 1/0/0 portal.cpn.vwg. A 10.112.198.242 (48)
    09:03:10.802999 f0:f7:55:b3:7b:e0 > 00:30:f1:12:fe:7e, ethertype IPv4 (0x0800), length 90: (tos 0x0, ttl 41, id 11316, offset 0, flags [none], proto UDP (17), length 76)
        10.232.3.131.53 > 10.232.100.63.1433: [udp sum ok] 59738 q: A? portal.cpn.vwg. 1/0/0 portal.cpn.vwg. A 10.112.198.242 (48)
    

    00:30:f1:12:fe:7e is the correct interface.



  • Is there any more detail I can give you cmb to try and figure this out?



  • Is any further help available on this?


  • Rebel Alliance Global Moderator

    Still at a loss to understand what you think is not working.. Clearly from your last sniff you posted the query for portal.cpn.vwg is getting answered.  What else do you want the dns forwarder to do?



  • I don't know how I can put it any clearer.

    Yes on a packet level it's working fine but pfsense is completely ignoring the response from the DNS server so portal.cpn.vwg does NOT resolve. If I point pfsense at the Windows server on the same range, which has the very same DNS servers configured, it doesn't ignore the response (which on the packet capture, is 100% identical) so it then resolves portal.cpn.vwg successfully.

    pfsense is ignoring a completely valid response from the vwg DNS server for reasons I cannot understand and nobody can answer. The only way it works is if another forwarded is placed between pfsense and the vwg DNS servers.

    The only difference is the IP range that the requests are coming from.

    I can't dig down and get low level logs so I can't supply any more information, theres nothing of any use on the webgui logging, I would love to get to the bottom of this.


  • Rebel Alliance Global Moderator

    So show me in pfsense it not working.. Lets see your host command to the IP and the query..

    Lets see the actual capture in wireshark to see if there is a problem with the packet..

    So your saying I do a host record server like this to that IP it fails
    [2.2-RC][root@pfSense.local.lan]/root: host www.google.com 4.2.2.2
    Using domain server:
    Name: 4.2.2.2
    Address: 4.2.2.2#53
    Aliases:

    www.google.com has address 64.233.181.147
    www.google.com has address 64.233.181.106
    www.google.com has address 64.233.181.99
    www.google.com has address 64.233.181.103
    www.google.com has address 64.233.181.105
    www.google.com has address 64.233.181.104
    www.google.com has IPv6 address 2607:f8b0:4001:c08::67

    So when you do to one server it works, and other server it fails even though clearly from the sniff the query returned traffic.



  • @johnpoz:

    So when you do to one server it works, and other server it fails even though clearly from the sniff the query returned traffic.

    Exactly, I will work on getting a wireshark output later this week and post back here.


  • Rebel Alliance Global Moderator

    you can just download the capture you do on pfsense diag, it opens in wireshark just fine.



  • In the packet capture after the failed nslookup, it actually shows a standard query response.

    When I capture the LAN interface it is responding with 0x0005 no such name.

    Still non the wiser as to why pfsense isn't passing the result on.





  • Rebel Alliance Global Moderator

    So sounds like pfsense is answering the client with NX??

    Can we see the actual wireshark on both the interface the client is asking from, and then any other interfaces pfsense has that it might send out a query for this request.  So we can follow what is happening.

    Do you have sequential dns on, or is the forwarding asking all the dns servers and the first one to answer is saying sorry NX..

    In the dns forwarder section.  Seems like you have some sort of issue with your first query did not even respond..  So its busy or network issues?

    Query DNS servers sequentially
    If this option is set, pfSense DNS Forwarder (dnsmasq) will query the DNS servers sequentially in the order specified (System - General Setup - DNS Servers), rather than all at once in parallel.



  • If sequential is ticked, the request doesn't make it over the VWG interface, presumably because it gets a response from the 8.8.8.8 NS and just gives up (Surely that isn't correct behavior?)

    I've taken several captures, these are all done with parallel.

    https://www.dropbox.com/s/isw2fv3ale6vsts/Packet_Captures.zip?dl=0

    I personally can't see anything wrong apart from the LAN interface responding with Not found when it clearly is.


  • Rebel Alliance Global Moderator

    yes it is – if your dns server says NX, ie that domain does not exist.. Why should you go ask another one??

    Normal practice, if you have non public domains you need to resolve is to POINT to that dns only!!!  And let it forward your requests or ask roots for public domains, etc..

    You can not point to multiple dns and ask for something doesn't exist on some of them and expect it to work..  Because one out of the list knows about that domain.

    If you want to ask seqentially, put the owner of the vwg domain first - but you have a problem if it doesn't answer fast enough you move on to the next one and get nx and not good.

    So whatever dns you ask, needs to know about this vwg domain.  Or you need who ever you ask to have a conditional forwarder to go and ask the owning dns of your .vwg domain.

    We use to have to resolve that same domain in company use to work for in our AD.  So our AD dns, that all users used had conditional fowarders to the owning ns of that domain down a vpn connection.  User ask dns for something.vwg the dns went and asked ns of of vwg..  If not .vwg and it did not know the request it forwarded it up and got forwarded to public dns, etc..

    I don't think there is anything wrong with pfsense - you just need to design your dns correctly.



  • Thank you very much John, that's some food for thought.



  • @johnpoz:

    I don't think there is anything wrong with pfsense - you just need to design your dns correctly.

    Exactly. What's shown here is the correct behavior and how any DNS server will behave in the circumstance. When it gets the NXDOMAIN reply first, it uses that. If it got no reply at all, or a SERVFAIL, it'd continue on and use another option.



  • Ok, so it makes sense that if a public DNS replies with 'not found' it's not then going to try the other DNS's to see if its on those, however why then when the DNS server (which runs on the AD) is added to the config does the .vwg domain then resolve?

    The public DNS is still giving the same answer 'not found' but for some reason pfsense is taking note of the response it gets from the DNS running on the AD and forwarding that to the client.

    Doesn't matter what order they're in either and it works in parallel mode too! Is the AD DNS doing something special?



  • I suspect that if you run DNS queries in parallel, an NXDOMAIN response won't stop the search before it asks the other servers since it's asking them all at once.



  • Actually, thinking about it, response time must have something to do with it as the DNS running on the AD, is obviously local…