DNS Resolver Caching Issues
I've noticed this a few times since setting up a vanilla pfSense hardware device on a flat network. The DNS resolver seems to be overly aggressive with caching, i.e. if I'm migrating a web server from Cloud Provider A to Cloud Provider B, and simply updating the DNS A Record at the Registrar, any device behind the pfSense firewall (i.e. my laptop accessing the internet), just doesn't update and I have to restart the DNS Resolver in pfSense for it to catch up.
Is there a setting somewhere (or is this a bug...) where I can completely disable caching, or reduce it to a very low number of minutes?
It's annoying more than anything as when you're doing this type of work, it adds intermittent and annoying delays into the real work you're actually trying to do.
For info: The pfSense hardware firewall has virtually zero configuration on there that would be getting in the way of things. So from a DNS perspective, it's a vanilla out of the box setup.
You understand the dns resolver only caches for the length of the TTL, which "YOU" set at your dns..
Unless you have altered the advanced setting in unbound and set a min TTL.. Which wouldn't be a vanilla setting for sure.
or reduce it to a very low number of minutes?
You set that at the DNS record.. not the resolver or forwarder..
if your moving stuff about - then yeah you want your dns records to say have a 5 minute ttl.. Once everything is stable and wouldn't be moving you can up the TTL of those records.
Here this might help
for example www.google.com has a ttl of 300 seconds (5 minutes)
;www.google.com. IN A ;; ANSWER SECTION: www.google.com. 300 IN A 184.108.40.206
While say www.netgate.com has 1 hour (3600 seconds)
;; QUESTION SECTION: ;www.netgate.com. IN A ;; ANSWER SECTION: www.netgate.com. 3600 IN A 220.127.116.11
This is controlled where you manage your dns at, ie the name servers for your domain.
You could override the max TTL in the advanced settings. It defaults to 1 day 86400 seconds.. But I really would not suggest you mess with that, and just correctly set how long you want the records you playing with specifically at your nameserver.
That's what I would expect to happen too. So when I did some testing when this issue came up last, I ran an ipconfig /flushdns on the laptop - made no difference (this has helped in the past with sometimes the laptop caching things for too long). Then I tried on my mobile device, connected to WiFi (aka. routing out to the internet via pfSense) and this had the same problem, was viewing the old server. The I disconnected from WiFi on mobile device so it was running over 5G and it magically worked. Reconnected to WiFi again, problem back.
I guess I'll have to just keep an eye on this as it's extremely intermittent. It's an SG-3100 too, so it's not one of the entry models.
What is a fqdn your using - PM it to me if you don't want to make it public.
TTL is ttl - resolvers nor forwarders cache an entry for longer than the ttl receive when they lookup something.domain.tld Unless! They have specifically been tweaked to do so, say setting a min TTL in the advanced section of unbound.
If you resolve - you would always get the full ttl, since you talk to the authoritative NS, if you forward you will get the TTL of their cached entry to where you forward too.
Flushing a local clients cache, doesn't mean you will get new entry, unless where your asking ttl has also expired..
edit: See above where the ttl of www.netgate.com is 3600.. But if I ask google I get something less than that
;; QUESTION SECTION: ;www.netgate.com. IN A ;; ANSWER SECTION: www.netgate.com. 2636 IN A 18.104.22.168 ;; Query time: 20 msec ;; SERVER: 22.214.171.124#53(126.96.36.199)
There are normally 3 caches at play with anything that uses a browser - the browses cache, the local OS cache, and then where the OS asks for dns cache.
Flushing your browser or your local cache, doesn't flush the upstream NS cache.. Everything still comes down to the app, or os, or the NS cache based on the TTL they got for that record, be it from the authoritative NS directly, or some value lower than that from some other NS they forwarded too..
edit: The latest wrinkle in all of this is browsers doing doh.. Since they forward, they will user whatever they forward too IP for some fqdn. For the ttl of whatever they asked. Which then has nothing to do with your OS local dns cache for an item, or your local NS cache for the item. But where they are asking entry for the item..
If you are changing up your records on where some fqdn points to, and your doing this a lot. Then set your TTL at your authoritative NS for this record to something low.. 30 seconds if you want.. All NS that look that up, be it unbound locally, or where you forward to should respect that TTL, and never cache that record for more than 30 seconds.
Unbound is going to cache a record for the length of the TTL when it got an answer, be it resolved or forwarded somewhere and they answered with some ttl..
If you are changing up these records frequently - and want any client anywhere to get the new entry asap.. Then you set the ttl on that record at your NS to be a value your wanting to work with, be it 30 seconds, 1 minute, 5 minutes, etc..
if you PM me the fqdn your working with - I can lookup what the TTL that is being handed out from the authoritative NS for that record.
Thanks for the info John. I've just double checked the TTL on the domain in question and it's TTL is actually set to 86400 (aka. 24hrs). That's a simple solution then, just make sure the TTLs for the FQDNs in question are set to a super low number in advance of doing the work. I could have sworn that these were already set to a low number at some point in the distant past, but I guess this domain was missed, hence why it seemed to be an intermittent pfSense issue.
Funny the things you forget to check when debugging intermittent problems at times. Always remember rule number 1 when debugging - disable caching everywhere :-)
Thanks for the help, all sorted now.
disable caching everywhere
No would never ever do that.. What you do if you have a question on what something is resolving as, is do a directed query to the authoritative ns for what your looking for.. So you get the answer from the horses mouth.
Which is what unbound does out of the box..
Also use proper tools ;) dig for example will always show you what the ttl of what you queried..