Strange resolution issue - ping/apache can't resolve host/dig/nslookup can



  • Hi;
    First off let me say that I've only been using pfsense for about 6 months but I'm very impressed. It's allowed us to provide a vast amount of functionality out the box. We're currently using it to connect our global office sites running Meraki edge devices to our private cloud virtual test environments through site to site VPNs just through Pfsense running on some VMs. Very nice.

    However I've recently encountered a very strange issue:

    ping and apache fail to resolve one of our cnames when within one of our pfsense networks

    
    >ping admin-test.lingo24.cloud
    ping: unknown host admin-test.lingo24.cloud
    
    
    
    >/etc/init.d/httpd start
    Starting httpd: [Thu Feb 18 21:30:23 2016] [error] (EAI 2)Name or service not known: Could not resolve host name admin-test.lingo24.cloud -- ignoring!
    [Thu Feb 18 21:30:23 2016] [error] (EAI 2)Name or service not known: Could not resolve host name admin-test.lingo24.cloud -- ignoring!
                                                               [  OK  ]
    
    

    host, dig and nslookup all work correctly

    
    >host admin-test.lingo24.cloud
    admin-test.lingo24.cloud is an alias for admin.vte2.lingo24.cloud.
    
    
    
    >nslookup admin-test.lingo24.cloud
    Server:		10.102.0.1
    Address:	10.102.0.1#53
    
    Non-authoritative answer:
    admin-test.lingo24.cloud	canonical name = admin.vte2.lingo24.cloud.
    
    
    
    > dig @10.102.0.1 admin-test.lingo24.cloud
    
    ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.4 <<>> @10.102.0.1 admin-test.lingo24.cloud
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27895
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
    
    ;; QUESTION SECTION:
    ;admin-test.lingo24.cloud.	IN	A
    
    ;; ANSWER SECTION:
    admin-test.lingo24.cloud. 217	IN	CNAME	admin.vte2.lingo24.cloud.
    
    ;; Query time: 9 msec
    ;; SERVER: 10.102.0.1#53(10.102.0.1)
    ;; WHEN: Thu Feb 18 21:38:22 2016
    ;; MSG SIZE  rcvd: 67
    
    

    A ping to the 'A' record works fine:

    
    >ping admin.vte2.lingo24.cloud.
    PING admin.vte2.lingo24.cloud (10.102.0.10) 56(84) bytes of data.
    64 bytes from admin.vte2.lingo24.cloud (10.102.0.10): icmp_seq=1 ttl=64 time=0.396 ms
    64 bytes from admin.vte2.lingo24.cloud (10.102.0.10): icmp_seq=2 ttl=64 time=0.353 ms
    ^C
    --- admin.vte2.lingo24.cloud ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1970ms
    rtt min/avg/max/mdev = 0.353/0.374/0.396/0.028 ms
    
    

    The same query works fine outside the network:

    
    >ping admin-test.lingo24.cloud
    PING admin.vte2.lingo24.cloud (10.102.0.10) 56(84) bytes of data.
    64 bytes from 10.102.0.10: icmp_seq=1 ttl=128 time=176 ms
    
    
    
    >host admin-test.lingo24.cloud
    admin-test.lingo24.cloud is an alias for admin.vte2.lingo24.cloud.
    admin.vte2.lingo24.cloud has address 10.102.0.10
    
    

    Environment

    • This is on every machine (Centos 6.7) we have behind a pfsense box (2.2.6-RELEASE (amd64) ).

    • The pfsense box is providing DHCP, DNS Resolver and Gateway for the machines.

    • The DNS records are hosted on Cloudflare's DNS servers.

    • The A record the CNAME points to (admin.vte2.lingo24.cloud) IS within pfsense's DHCP records. So it would know about it locally as well as from an upstream DNS query.

    • This happens across 3 separate instances of pfsense, all configured very similarly.

    • The tld (.cloud) is very new in the interwebs.

    Here's some config from one of the the machines:

    
    > cat /etc/resolv.conf
    ; generated by /sbin/dhclient-script
    search vte2.lingo24.cloud
    nameserver 10.102.0.1
    
    
    
    >cat /etc/hosts
    127.0.0.1	localhost	localhost.localdomain localhost4 localhost4.localdomain4
    ::1	localhost	localhost.localdomain localhost6 localhost6.localdomain6
    
    

    TCP Dumps

    What I find very interesting/extremely frustrating!, is that the tcpdumps for a ping and an nslookup look identical in packets:

    ping tcpdump

    
    >tcpdump 'not tcp port 22'
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
    21:12:12.110777 IP foreman.vte2.lingo24.cloud.52875 > gateway.vte2.lingo24.cloud.domain: 32896+ A? admin-test.lingo24.cloud. (42)
    21:12:12.111184 IP foreman.vte2.lingo24.cloud.56507 > gateway.vte2.lingo24.cloud.domain: 13530+ PTR? 1.0.102.10.in-addr.arpa. (41)
    21:12:12.111438 IP gateway.vte2.lingo24.cloud.domain > foreman.vte2.lingo24.cloud.56507: 13530* 1/0/0 PTR gateway.vte2.lingo24.cloud. (81)
    21:12:12.111511 IP foreman.vte2.lingo24.cloud.60743 > gateway.vte2.lingo24.cloud.domain: 47165+ PTR? 5.0.102.10.in-addr.arpa. (41)
    21:12:12.111758 IP gateway.vte2.lingo24.cloud.domain > foreman.vte2.lingo24.cloud.60743: 47165* 1/0/0 PTR foreman.vte2.lingo24.cloud. (81)
    21:12:12.268174 IP gateway.vte2.lingo24.cloud.domain > foreman.vte2.lingo24.cloud.52875: 32896 1/0/0 CNAME admin.vte2.lingo24.cloud. (67)
    21:12:12.270401 IP foreman.vte2.lingo24.cloud.41615 > gateway.vte2.lingo24.cloud.domain: 37593+ A? foreman.vte2.lingo24.cloud. (44)
    21:12:12.270724 IP gateway.vte2.lingo24.cloud.domain > foreman.vte2.lingo24.cloud.41615: 37593* 1/0/0 A 10.102.0.5 (60)
    ^C
    8 packets captured
    8 packets received by filter
    0 packets dropped by kernel
    
    

    dig tcpdump

    
    >tcpdump 'not tcp port 22'
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
    21:15:47.461938 IP foreman.vte2.lingo24.cloud.44543 > gateway.vte2.lingo24.cloud.domain: 42178+ A? admin-test.lingo24.cloud. (42)
    21:15:47.462256 IP foreman.vte2.lingo24.cloud.42077 > gateway.vte2.lingo24.cloud.domain: 40419+ PTR? 1.0.102.10.in-addr.arpa. (41)
    21:15:47.462604 IP gateway.vte2.lingo24.cloud.domain > foreman.vte2.lingo24.cloud.42077: 40419* 1/0/0 PTR gateway.vte2.lingo24.cloud. (81)
    21:15:47.462743 IP foreman.vte2.lingo24.cloud.58062 > gateway.vte2.lingo24.cloud.domain: 17006+ PTR? 5.0.102.10.in-addr.arpa. (41)
    21:15:47.462952 IP gateway.vte2.lingo24.cloud.domain > foreman.vte2.lingo24.cloud.58062: 17006* 1/0/0 PTR foreman.vte2.lingo24.cloud. (81)
    21:15:47.471440 IP gateway.vte2.lingo24.cloud.domain > foreman.vte2.lingo24.cloud.44543: 42178 1/0/0 CNAME admin.vte2.lingo24.cloud. (67)
    21:15:47.474259 IP foreman.vte2.lingo24.cloud.38865 > gateway.vte2.lingo24.cloud.domain: 58114+ A? foreman.vte2.lingo24.cloud. (44)
    21:15:47.474565 IP gateway.vte2.lingo24.cloud.domain > foreman.vte2.lingo24.cloud.38865: 58114* 1/0/0 A 10.102.0.5 (60)
    ^C
    8 packets captured
    8 packets received by filter
    0 packets dropped by kernel
    
    

    So ping is getting the same response from pfsense as nslookup is, which would make me think it points to ping, however apache has the same issue and this issue does not appear outside the pfsense networks?

    An strace of ping doesn't seem to show anything exciting other than it receiving a response:

    
    >strace ping admin-test.lingo24.cloud
    execve("/bin/ping", ["ping", "admin-test.lingo24.cloud"], ) = 0
    .....
    socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4
    connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.102.0.1")}, 16) = 0
    poll([{fd=4, events=POLLOUT}], 1, 0)    = 1 ([{fd=4, revents=POLLOUT}])
    sendto(4, "V~\1\0\0\1\0\0\0\0\0\0\nadmin-test\7lingo24\5"..., 42, MSG_NOSIGNAL, NULL, 0) = 42
    poll([{fd=4, events=POLLIN}], 1, 5000)  = 1 ([{fd=4, revents=POLLIN}])
    ioctl(4, FIONREAD, [67])                = 0
    recvfrom(4, "V~\201\200\0\1\0\1\0\0\0\0\nadmin-test\7lingo24\5"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.102.0.1")}, [16]) = 67
    close(4)                                = 0
    write(2, "ping: unknown host admin-test.li"..., 44ping: unknown host admin-test.lingo24.cloud
    ) = 44
    exit_group(2)                           = ?
    +++ exited with 2 +++
    
    

    Does anyone have any thoughts on the cause or suggestions for further debugging (other than pulling apart the ping c code, which I'm getting too! :)



  • It may be a hide into nothing, but I notice that there's something a little odd about your dig query:

    dig @10.102.0.1 admin-test.lingo24.cloud

    ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.4 <<>> @10.102.0.1 admin-test.lingo24.cloud
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27895
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

    ;; QUESTION SECTION:
    ;admin-test.lingo24.cloud. IN A

    ;; ANSWER SECTION:
    admin-test.lingo24.cloud. 217 IN CNAME admin.vte2.lingo24.cloud.

    ;; Query time: 9 msec
    ;; SERVER: 10.102.0.1#53(10.102.0.1)
    ;; WHEN: Thu Feb 18 21:38:22 2016
    ;; MSG SIZE  rcvd: 67

    Unless you've edited the output, you ought to be seeing the A record for admin.vte2.lingo24.cloud immediately below the CNAME line. Can you post the contents of the zone file for 'lingo24.cloud' on the 10.102.0.1 server?



  • Hey muswellhillbilly;

    hmm, very interesting, you're right, well spotted!

    If I do a google query and a pfsense query I get different responses:

    Google Query

    > dig @8.8.8.8 admin-test.lingo24.cloud
    
    ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.4 <<>> @8.8.8.8 admin-test.lingo24.cloud
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35840
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
    
    ;; QUESTION SECTION:
    ;admin-test.lingo24.cloud.	IN	A
    
    ;; ANSWER SECTION:
    admin-test.lingo24.cloud. 299	IN	CNAME	admin.vte2.lingo24.cloud.
    admin.vte2.lingo24.cloud. 299	IN	A	10.102.0.10
    
    ;; Query time: 23 msec
    ;; SERVER: 8.8.8.8#53(8.8.8.8)
    ;; WHEN: Fri Feb 19 11:55:36 2016
    ;; MSG SIZE  rcvd: 83
    

    pfsense Query

    >dig @10.102.0.1 admin-test.lingo24.cloud
    
    ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.4 <<>> @10.102.0.1 admin-test.lingo24.cloud
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39504
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
    
    ;; QUESTION SECTION:
    ;admin-test.lingo24.cloud.	IN	A
    
    ;; ANSWER SECTION:
    admin-test.lingo24.cloud. 16	IN	CNAME	admin.vte2.lingo24.cloud.
    
    ;; Query time: 21 msec
    ;; SERVER: 10.102.0.1#53(10.102.0.1)
    ;; WHEN: Fri Feb 19 11:55:47 2016
    ;; MSG SIZE  rcvd: 67
    

    It doesn't explain why I get the same packet response from a tcpdump when I do a ping but it's something to go on!
    Cheers



  • Your description of the issue doesn't mention whether you're hosting DNS locally for your LAN users or whether your PFS is simply forwarding DNS queries. Is the 10.102.0.1 address your PFS, or is it a separately running dedicated DNS server? Do you run the 10.102.0.1 host yourself or is it being hosted for you?

    If 10.102.0.1 is managed by you, then I'd suggest looking closely at the zone file for lingo24.cloud. Otherwise, get someone who manages it to investigate further. This almost certainly looks like a DNS issue on that host.


  • LAYER 8 Global Moderator

    If you do a query to the NS for that domain for the A record, they both respond..

    ;; QUESTION SECTION:
    ;lingo24.cloud.                IN      NS

    ;; ANSWER SECTION:
    lingo24.cloud.          86400  IN      NS      max.ns.cloudflare.com.
    lingo24.cloud.          86400  IN      NS      kristin.ns.cloudflare.com.

    So their zone file is working… What is BROKEN is the fact that your trying to serve up rfc1918 space on a public domain.. Rebinding protection would for sure prevent that from being returned to the client...  So yeah when you query pfsense you can get back the cname, but its not going to return the 10.x.x.x address because that could be a rebinding attack.

    I see a few ways to fix this.

    1) Resolve your rfc1918 space locally not on some public nameserver BEST CHOICE!!!!
    2) Disable rebinding protection completely
    3) Set an private domain in what name server your using on pfsense https://doc.pfsense.org/index.php/DNS_Rebinding_Protections

    see example attached.  I my pfsense running unbound, just cname.  I set your domain as private in advanced box in pfsense and then query again and get back A record as well for that cname




  • JP is absolutely right. I hadn't thought to check the DNS for that zone myself. So your public DNS is serving up private addresses?!!


Log in to reply