Certain external DNS servers cannot query our public DNS server
-
thank you all for your suggestions!
@johnpoz: i am not sure what your last sentence meant; it seemed to conflict the rest of your reply. thank you for bringing up for everyone that ping (traceroute) and DNS are not the same. we do realize that ICMP and DNS are two different protocols as well as we realize some deployments of DNS servers/forwarders utilize UDP rather than TCP.
@podilarius: TWC has not been much help. we are indeed able to trace to the DNS servers that cannot communicate with us on queries.
@mike: indeed it seems to be a strange routing issue. i have no idea if there is a long hop length between the systems on these requests. i would love to know. if our next couple plans do not work to resolve the issue and any suggestions here yield no fruit, then we will likely have to chock it up to an odd routing issue that will decimate our customers and our upcoming projects. we will likely have to move all DNS outside our cage which i do not want to think about because we need absolute control for our projects. :/
we are requesting our colocation facility remove HSRP so we can switch between our pfSense deployment and the previous router we used, which worked fine but was too old to be comfortable leaving in place.
thank you all for usin some of your brain juice on this issue,
-bo -
"i am not sure what your last sentence meant;"
I don't know how much clear I could make it..
You stated
"EDIT: we packet sniff all DNS traffic (tcp and udp) on pfsense and find nothing from these servers hitting us"Then WTF does pfsense have to do with it?? If your not seeing the dns traffic at the WAN interface of pfsense - then pfsense has NOTHING TO DO WITH YOUR PROBLEM!! NOTHING!!
-
@john: "then the issue is between your pfsense and them." was my point. was not looking to correct but understand. if you prefer to attack, please move on to another thread.
We have experienced that firewall software can prevent our packet sniffing software on the same server from reporting what is blocked. This is the reason I have posted on the pfSense forums. We have not determined a way to put a machine between the pfSense box and the upstream router without breaking HSRP and thus routing entirely.
It is entirely possible someone has experienced this in a similar deployment. It is entirely possible this is not related to the pfSense distro. Posting this question makes it easy for the next guy searching the web on this problem to find a useful answer. I will post whatever the outcome reveals itself to be.
-
What johnpoz is inferring is that you have knowledge on how pf (basis of pfSense) works. This may not be the case. Most don't. So it is understandable where the disconnect is. Filtering happens in the kernel. When you do a tcpdump (packet filtering on pfSense) on WAN interface, you are seeing the traffic before filtering has even taken place. So, if you don't see the traffic there, it is not making it to the interface. This usually indicates a problem upstream. This is why you would ask your ISP to packet sniff on the next hop up from yours. If the traffic is not making it to that interface, then you have to keep going up.
I had this problem also (or behavior was similar but not quite the same). As it turned out, I miss typed the DNS server address at the registrar. So all that tried to resolve to the primary (ns1) failed to get the name servers, those hitting the secondary name server (ns2), was resolving and working correctly.
I would double and triple check everything when it comes to DNS. Once its right, then it works flawlessly, but one typo can set you back a couple of hours. Can those who cannot resolve manually enter your DNS server and resolve? ie "dig @ <ipaddress><server.domain.tld>". As a general rule of thumb, you cannot packet sniff anything on the WAN side of a firewall without being on that side of the FW. I would use either the command line tcpdump or its counter part in the GUI in diagnostics on the WAN address while someone tries to make the connection.</server.domain.tld></ipaddress> -
^ great write up podilarius
"When you do a tcpdump (packet filtering on pfSense) on WAN interface, you are seeing the traffic before filtering has even taken place."
Where would it not be like this?? I have never seen a system where if I am sniffing on the inbound interface it would be after the filtering. That makes no sense.
I like your theory of the mistake in the IP at the registrar – very logical breakdown of how that could cause their symptoms.
-
Thank you all for continuing to press into this.
My statement, "firewall software can prevent our packet sniffing software on the same server from reporting what is blocked" was information shared with me by the persons troubleshooting it. It maybe have been a mis-communication and unfortunately they are not available to query.
Indeed, triple checking every stage and step involved is wise and has been done. The DNS entries are correct and were correct before the change over to new routers (pfSense). Again, no idea what is the actual cause and not placing blame (for those that feel attacked).
My hunch is the issue is outside our cage but I am unable to get additional help from the providers in the path.
EDIT: Their dig commands have been unfruitful, and our dig commands from within our cage to their DNS servers have been unfruitful (no response).
ex: dig @208.180.42.68 samware.net
-bo
-
How about sticking a switch between your pfsense box that performs port mirroring?
That way you could use Wireshark to sniff the external interface unhindered.
-
@mike: I would love to do that but we have HSRP setup. We are scheduling with our colo to remove HSRP to further test the problem.
-bo
-
from within our cage to their DNS servers have been unfruitful (no response).
ex: dig @208.180.42.68 samware.netSo you can not get to them either on 53?? But you can ping them? Was that just an example ip and query? Or was that actual IP and domain? I don't show samware.net on that IP.
hint: when doing examples of something like that its better to be clear its an example www.example.tld, foo.bar ip 1.2.3.4 or <theirip>, etc.
So to be clear can they ping your public IP(s) that nameservers are on - those packets show up on sniff, but dns query does not show up in sniff. And you can ping their dns IP or not? But can do dns query?</theirip>
-
My apologies.
That was an example dig command for one of the DNS servers that cannot reach us for DNS queries. That format of the dig command on our linux and BSD boxes tells dig to ask the "@x.x.x.x" DNS server to make the query. It's as if I set my machine's network settings to use 208.180.42.68 as it's DNS server.
When you modify that command to query for a different domain than one we are authority over (samware.net), that DNS server can get a result. The proper result for "samware.net" is 66.228.141.20 but usually the reply is ";; connection timed out; no servers could be reached" and some rare times they will reply with an advertisement IP.
All of these DNS servers that cannot reach us can ping us and we can ping them. Ping packets show up on the sniff. Their incoming DNS queries do not show up on the sniffer.
-
"and some rare times they will reply with an advertisement IP."
Really?? That makes no sense - can we get a result from them doing dig +trace
So for example you mention one of your zones are samware.net
So I dns for that as ns1 and ns2.samware.net doing a whois
Domain name: samware.net
Name Servers:
ns1.samware.net
ns2.samware.netI show them as
ns1.samware.net ['66.228.140.5'] [TTL=10800]
ns2.samware.net ['66.228.141.5'] [TTL=10800]So if I do a trace for say www.samware.net
; <<>> DiG 9.8.1-P1 <<>> www.samware.net +trace ;; global options: +cmd . 168690 IN NS a.root-servers.net. . 168690 IN NS j.root-servers.net. . 168690 IN NS e.root-servers.net. . 168690 IN NS l.root-servers.net. . 168690 IN NS f.root-servers.net. . 168690 IN NS i.root-servers.net. . 168690 IN NS d.root-servers.net. . 168690 IN NS k.root-servers.net. . 168690 IN NS g.root-servers.net. . 168690 IN NS m.root-servers.net. . 168690 IN NS h.root-servers.net. . 168690 IN NS b.root-servers.net. . 168690 IN NS c.root-servers.net. ;; Received 508 bytes from 192.168.1.253#53(192.168.1.253) in 481 ms net. 172800 IN NS e.gtld-servers.net. net. 172800 IN NS c.gtld-servers.net. net. 172800 IN NS f.gtld-servers.net. net. 172800 IN NS l.gtld-servers.net. net. 172800 IN NS b.gtld-servers.net. net. 172800 IN NS g.gtld-servers.net. net. 172800 IN NS h.gtld-servers.net. net. 172800 IN NS a.gtld-servers.net. net. 172800 IN NS j.gtld-servers.net. net. 172800 IN NS m.gtld-servers.net. net. 172800 IN NS i.gtld-servers.net. net. 172800 IN NS k.gtld-servers.net. net. 172800 IN NS d.gtld-servers.net. ;; Received 490 bytes from 202.12.27.33#53(202.12.27.33) in 733 ms samware.net. 172800 IN NS ns1.samware.net. samware.net. 172800 IN NS ns2.samware.net. ;; Received 101 bytes from 192.52.178.30#53(192.52.178.30) in 249 ms www.samware.net. 10800 IN A 66.228.141.20 ;; Received 49 bytes from 66.228.141.5#53(66.228.141.5) in 47 ms
So this time 141.5 answered - if I do it again this time the 140.5 answered
<snipped>;; Received 101 bytes from 192.52.178.30#53(192.52.178.30) in 132 ms www.samware.net. 10800 IN A 66.228.141.20 ;; Received 49 bytes from 66.228.140.5#53(66.228.140.5) in 49 ms</snipped>
I would love to see a query +trace that comes back with an advertisement IP??
BTW while looking nameservers for samware.net – they allow recursive, this is normally very bad juju for authoritative ns to allow recursive from public
; <<>> DiG 9.8.1-P1 <<>> @66.228.140.5 www.google.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32492 ;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;www.google.com. IN A ;; ANSWER SECTION: www.google.com. 300 IN A 74.125.227.113 www.google.com. 300 IN A 74.125.227.115 www.google.com. 300 IN A 74.125.227.114 www.google.com. 300 IN A 74.125.227.112 www.google.com. 300 IN A 74.125.227.116 ;; Query time: 99 msec ;; SERVER: 66.228.140.5#53(66.228.140.5) ;; WHEN: Thu Oct 11 15:16:07 2012 ;; MSG SIZE rcvd: 112
You almost never allow that from public IP to do recursive - unless your running a public DNS server.. You leave yourself open to DOS attacks with allowing that. And now your boxes that are authoritative for your zones will not be able to answer because they are too busy doing recursive for people that shouldn't be using your box for recursive.
So if they are getting answers "advertisement IPs" back when they are doing specific @queries to your IPs - and your not seeing the packets, that tells me something else is answering.. What is answering is the question.. when you say your behind a HRSP – could this traffic destined for your IP being sent to some other NS??? How else would they be getting a response.
-
Yes, unless you are really meaning to, running a recursive public DNS server is bad for business. I did that for a while, but learned before it burned me.
Is samware.com on your DNS server? Where you able to trace on the WAN interface as opposed to doing it on the LAN? If the traffic is not getting to your WAN, then you have a problem upstream. If you are getting it, then we can check within pfSense to find out what is going on. You are going to have look at bit more I am afraid. What you really want to do is check WAN sniffing for the packets. You don't need to wait on HSRP to be removed, unless you thinking that is part of the problem. Either way, please let us know what you find. -
Thank you all for your wonderful ideas and for pointing out the public availability of our DNS servers. At one point, we were fine with recursion for various reasons but over the past year our servers have been hammered!
Anyways, the problem was due to converting our DNS from FreeBSD to CentOS, adding IP aliases to the NIC, and not having the proper subnet assigned to those aliases. It was working fine on the old router system but since our colo made some routing changes and we implemented pfSense, the faulty subnet settings popped up.
Again, thank you all!