NAT reflection or split DNS with short TTLs



  • Small business with a website published on our internal DMZ.
    Most employee have mobile devices that need to access it while roaming back and forth between 4G/LTE and company WiFi.

    1. Enabling pfSense "Pure NAT Reflection" on port forwarding 80/443 works like a charm but I read lots of negative comments about the solution and its performance impact (which to be honest we don't notice).

    2. We tried split DNS with 60s TTLs (the lowest Google Domains accepts) but I'm concerned about the added DNS traffic and nevertheless we still see the occasional access failure depending on which side of the DNS is currently being cached by Chrome on the mobile.

    Other ways to solve this problem?


  • LAYER 8 Global Moderator

    What would google domains have to do with split dns and your local side dns that resolves your fqdn website to your their internal addresses?  I don't think your local IPs would be changing very often at all.. You could make those TTLs as long as what you might want to be honest.  Yeah a 60s TTL is going to cause way more dns traffic than typical.

    The only issue I could see with making a really long TTL internally would be if you have devices that are internal and then go to external and for whatever reason maintain that cache.  Then yeah too long of a ttl could cause you problems for those users that transition from external to internal or vise versa while maintaining the cache.  If they make transition between external and internal you would want to make your TTL both internal and external short enough to not be a problem with whatever their transition time is, etc.  Or just make sure that when they transition they flush their machines/browsers cache.  Restart of browser would do it, I would have to do testing on windows machine as it changes networks if it maintains any previous dns cache.  I wouldn't think so - but I have never actually ran into needing to test that ;)



  • Google Domains DNS is handling the external side of our domain. The internal is on our own Microsoft AD server.

    The idea was to minimize DNS caching by setting the lowest possible TTL on both sides.

    Unfortunately we still see the occasional failure depending how quickly we transition between internal/external and vice versa.

    I know of no easy way to flush the DNS cache on an Android device, other than restarting it, which is frankly unacceptable.


  • LAYER 8 Global Moderator

    What browser are they using on the android device.  Chrome and Firefox have ways of clearing the browser cache without a reset of the phone I am pretty sure.



  • We standardized on Chrome on all devices.

    As far as I know, on Android, you can clear the Chrome cache ( 3-dots menu -> settings -> privacy -> clear browsing data) but even that doesn't clear the system DNS cache. Only a system restart does that.


  • LAYER 8 Global Moderator

    So if your cache is 60 seconds.  Why would device transition between wan/lan and need to access the same site so fast?

    Ok did you try switching to airplane mode - this should bring down all networks, and what I have read flushes the cache.  I wouldn't be able to test that myself until my son comes over next and I can use his nexus phone to do some testing.

    If your using chrome can you go here chrome://net-internals/#dns from what I read this allows to flush the cache as well.  A test of my desktop browser shows it does clear.




  • Why trying to find a solution to a problem you don't have?  Like you said.  The NAT reflection works as you expect, provides seamless experience for your users and hairpinning efficiency/performance hit is not an issue for you.

    Doubt you will find a device agnostic seamless means of clearing the mobile clients DNS.  If NAT reflection works well for your use case then see no good reason to put your users through that.

    Seems like a bad ROE (Return on Effort) to me.



  • @NOYB:

    Why trying to find a solution to a problem you don't have?  Like you said.  The NAT reflection works as you expect, provides seamless experience for your users and hairpinning efficiency/performance hit is not an issue for you.

    Doubt you will find a device agnostic seamless means of clearing the mobile clients DNS.  If NAT reflection works well for your use case then see no good reason to put your users through that.

    Seems like a bad ROE (Return on Effort) to me.

    I was seeking expert advice on whether a better solution existed. But you are absolutely right on this one: while waiting for a IPv6 only NATless future, NAT reflection will do it for the time being.


  • LAYER 8 Global Moderator

    In your case with the fast transition between networks, I have to concur that this is a case where nat reflection seems to be a work around that solves the problem.

    While I am not a fan to be sure.. If nat reflection is working and removes the issue of fast transition between networks - then it would be the logical choice it seems.  Be it a sub optimal solution or not.

    I am curious on the use case to why devices make such a fast transition and need access to the site in question - if you would be willing to share some details of why this occurs.

    While some here think I am a zealot for my way is the only right way, this is far from the truth.  I am for the most logical, efficient way to accomplish a task.  If nat reflection works out to be the only really workable solution, then that is what it is.  Others here seem to think it should be the first choice and looking at other more optimal ways to do something is waste of time and effort ;)



  • @johnpoz:

    I am curious on the use case to why devices make such a fast transition and need access to the site in question - if you would be willing to share some details of why this occurs.

    With a default 1h TTL I frankly don't see this as such a rare occurrence. You step out of the office for lunch and most likely you won't be able to access your site/app from your smartphone. And if you wait long enough for the DNS cache entry to expire, when you make it back to the office your site/app will, once again, not be accessible for another hour ;)

    Admittedly a TTL of 60s would minimize the issue but at the cost of a much higher DNS traffic and latency (since the name has to be resolved practically every time).



  • Perhaps there should be a new DHCP DNS option that could be pushed to clients along with the DNS settings, that would instruct the client to flush its DNS cache when attaching and detaching to/from "this" network AND DNS servers are also changing.

    Anyone up for writing an RFC?



  • I don't get it. I thought all devices were supposed to flush their DNS cache for security reasons every time they change their network/ip stack settings.

    Is't this a security issue (although an exploit is very very complicated) when a user walks from one net to the other?


  • LAYER 8 Global Moderator

    Yeah at a loss to why the transition would not flush the dns.. Most likely something wrong with the client not following standards..

    I have never run into such an issue before..  So for example - I create a host override for my www.cnn.com to point to 192.168.9.7, a ubuntu box on my server running apache that is serving up http..

    
    > dig www.cnn.com
    
    ; <<>> DiG 9.11.0-P3 <<>> www.cnn.com
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48395
    ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
    
    ;; QUESTION SECTION:
    ;www.cnn.com.                   IN      A
    
    ;; ANSWER SECTION:
    www.cnn.com.            2977    IN      A       192.168.9.7
    
    ;; Query time: 3 msec
    ;; SERVER: 192.168.3.10#53(192.168.3.10)
    ;; WHEN: Sun Feb 26 06:45:17 Central Standard Time 2017
    ;; MSG SIZE  rcvd: 45
    
    

    If I transition my my phone from wifi to LTE it correctly resolves as I move back and forth..  See attached - on my wifi, go to www.cnn.com get my ubuntu site.  Turn off wifi so on cell and refresh page and get real cnn page.. Go back to wifi and get my ubuntu box.. This is all in a few seconds - long enough to click to wifi and hit refresh..  Way less than even 60 seconds..




  • As far as I know it's completely up to the OS vendor to decide what to do with the local DNS cache when the DHCP assigned IP addresses change on any of the interfaces. The DHCP standard is completely void of any information or recommendations on what to do with the DNS cache when a lease with completely new IP address/gateway/DNS forwarders is acquired.



  • The act of turning on/off wifi may have an impact as well rather that a transition that occurs with wifi remaining on.  Just speculation.  Think a DHCP option to tell client how to handle transitions to/from "this" network would be nice.  Then it could be up to the network operator.


  • LAYER 8 Global Moderator

    Well I don't have to turn it off - I can test when I leave this morning.. Once I am outside the range of wifi and it has to use LTE and sees what happen.

    The OP has still not explained why there would ever be such a fast transition back and forth..  Just tell the users that if they transition so fast to go to airplane mode or something.



  • @johnpoz:

    Just tell the users that if they transition so fast to go to airplane mode or something.

    Sure glad you've never been my network engineer.  Impact user experience in favor of accommodating network equipment.  Poor trade off IMO.  The network is there to accommodate people and the apps they use.  Not the other way around.  If that means NAT reflection.  Then so be it.

    Oh and I disagree about the OP still not having explained the need for quick transition.  Yes the OP did.  User experience.


  • LAYER 8 Global Moderator

    While I agree you would want the user exp to be as best as possible.  But we have yet to get any actual details of why there is such a transition..

    To be honest the problem I still say is PEBKAC – my guess is he has not configured his local dns correctly or has something wrong in his public dns.

    I have been using devices tech devices since there has been tech devices that could use wireless, etc.. And working DNS for just as long - and have never seen such an issue before.  I can test with my son's android next time he comes over.

    But this seems to be one off for sure - if nat reflection is working for him and his users.. Than as I said before he can use that - but I really don't understand this use case.. As you can see from my test the instant I transition it uses the dns query it makes on that network.



  • @johnpoz:

    I still say is PEBKAC

    You should turn your chair over to someone else who can alleviate that problem.

    @johnpoz:

    …we have yet to get any actual details of why there is such a transition..

    The OP did explain that.

    @johnpoz:

    I really don't understand this use case.

    Then stop making personal insults regarding something you don't understand.

    @johnpoz:

    As you can see from my test the instant I transition it uses the dns query it makes on that network.

    No I cannot see that from your test.  The only test results you have presented was invalid.  As I pointed out earlier.


Log in to reply