Slow DNS after 22.05

Gertjan

I have currently changed my setting to allow access via my HE ipv6 tunnel, and lets see if that causes an issue.

I'm using he.net for many years now. It's my main IPv6 access, as my ISP doesn't have good IPv6 support (bad IPV6 really messes up things). I'm posting on this forum, using the he.net IPv6.
25 % of my incoming & outgoing traffic is IPv6 over he.net. Same thing for DNS related traffic.
It's a bit slow, as I'm limited to the he.net tunnel access point in Paris.
No other issues.

RTT is a bit slow, as I use a my server in a data centre nearby Paris as the dpinger monitoring point.

edit :

@mikymike82 said in Slow DNS after 22.05:

Talking albout the so said thousends of other users that are not experiencing this problem is not helping in resolving this clear issue in version 22.05 as is clearly stated that even with a clean install i can clearly replicate the issue when upgrading to 22.05 from 22.01

I agree.
An approach could be :
Let's enumerate all common - and not-common settings of all those thousands of 22.05 users.
An then compare these finding with yours.
Comparing would be even easier yet if there were users, or several users, using the same ISP - same uplink connection type, and even the same area as you have.

You get it : that's hard to do.

The easy thing would be : it's up to you to tell/show/mention what is different with your location/setup/hardware/uplink connection.
But again, without you really knowing what the other 'thousands' are using.

A test that might shed some more light :
When you re install, you have to login a first time with the admin user, and pfsense default password.
What about this set up :
Do not change the default LAN - keep DHCP etc.
Do not change WAN, keep it on DHCP-client mode (can you ?)
No other changes, do not use the keyboard anymore.
Do not import settings.
No packages.
Nothing.
Just the plain vanilla default Netgate initial config - with one LAN and one WAN assigned.

I understand, this setup might not be rally useful for you. It's just for testing.

If DNS fails at this moment, we will all know whats not the issue.
As settings are equal.
Hardware is equal.
LAN side is equal.
I presume that the device you use is a PC with default network settings (== DHCP client).

The only difference will be : your uplink (ISP or what ever you use as a connection).
For example, I would understand that if you said "I use Starlink" then that would explain a lot, as default settings won't be good for such a connection (I'm just guessing).

Also, if you use an arm device : do you have a small Intel desktop PC in a corner ? Add a extra 5$ NIC (no realtek, please) - slide in an empty, small SSD, throw pfSense at it and retest.
Issues are still the same ? Then Intel <> arm goes out of the windows. You most probably have an uplink / WAN side issue.

Use the mentioned no-ipv6 unbound option,
Remove this check :

Remove (disable) IPv6 from your test LAN device.
Now you have a close-to-IPv4-only network.
Retest.

Kempain

@lohphat said in Slow DNS after 22.05:

What is the current tally that disabling IPv6 was implicated in resolving the issue?

Just me so far but it doesn't look like anyone else has tried it yet and re-tested.
Still resolved for me. It honestly made a dramatic difference.

For those experiencing it if you can disable IPv6 then I'd follow the steps below:

Run 'unbound-control -c /var/unbound/unbound.conf stats_noreset | grep totalding 0%) unbound-control -c /var/unbound/unbound.conf stats_noreset | grep total' and record your recursion levels (they will likely be high if you're experiencing the same issue as me):

total.recursion.time.avg=0.079624
total.recursion.time.median=0.0387577

Disable IPv6: Status - DNS Resolver - General Settings. Add the below to Custom Options:

server:
do-ip6: no

Run the command in 1. again and hopefully your recursion results will be much improved.

This has resolved all DNS issues for me and I haven't had any issues since.

Edit: Just seen @Gertjan has posted a more complete guide above so maybe follow that instead

Kempain

@gertjan said in Slow DNS after 22.05:

Remove this check :

Remove (disable) IPv6 from your test LAN device.

Just for reference I don't actually have Allow IPv6 un-checked and the no-ipv6 unbound option worked to resolve my issue.

I do however only specify IPv4 protocol in my allow rules which deviates from the OOTB config which I believe has default any rules for both protocols.

Thought this may be part of my issue but it sounds like some are having the issue even with the default config.
If the no-ipv6 unbound setting resolves this for them also then I assume this narrows it down a bit.

johnpoz

@kempain said in Slow DNS after 22.05:

I don't actually have Allow IPv6 un-checked

Yeah that wouldn't really matter for dns, if you set do-ip6 no then unbound shouldn't use IPv6 for a transport. But that would prevent say your browser from using IPv6.

That would be just an attempt to make sure there is no IPv6 being used for anything.

Kempain

@johnpoz

Yup makes sense that setting wouldn't matter for DNS.
Seems like the suggestion was to try and rule out any IPv6 issues by taking it out of the equation as much as possible.

Just wanted to let people know I didn't have to do this for it to be resolved for me.

Gertjan

@All

If "do-ipv6 no" solves an issue, then I would suspect the IPv6 connectivity.

Here IPv6 Tunnel Broker everybody can get a free IPv6 access.
They will give you a /64 - and, why not, a /48.

edit : if you can proof that you know what IPv6 is, they will give you a free T-Shirt !! That is, they did so in the past.

To use it : de activate your ISP IPv6.

Set up a "he IPv6 tunnel using the pfSense doc".

You wind up having this :

The question is : is (one of the) unbound issues related to bad IPv6 connectivity ?
Switch your IPv6 connectivity to a known good one, like he.net, and you'll find out.

Btw : most ISPs have now, in 2022, a good IPv4 implementation. ( so now they can start ditched this )
This is not the same for IPv6 : most ISP 'do it wrong', with boatloads of nasty side effect.

IMHO I think Hurricane Electric is one of the rare IPv6 suppliers that implemented IPv6 correctly - the way it should work.

No rocket-sience degree needed to implement it locally.

edit :
How to use "do-ip6" :

edit :

For what its worth :

And I can ping my PC from everywhere on the Internet :

No (ICMP) NAT needed !

( but I'm not sure that this is a good idea ... )

johnpoz

So in my morning routine of going to my different sites, reading morning news, viewing threads and looking for stuff.

In general I didn't really have any major issues. But I did run into something odd, trying to go to support.apple.com

This is a horrible dns setup if you ask me..

;; QUESTION SECTION:
;support.apple.com.             IN      A

;; ANSWER SECTION:
support.apple.com.      3589    IN      CNAME   prod-support.apple-support.akadns.net.
prod-support.apple-support.akadns.net. 3589 IN CNAME support-china.apple-support.akadns.net.
support-china.apple-support.akadns.net. 3589 IN CNAME support.apple.com.edgekey.net.
support.apple.com.edgekey.net. 21589 IN CNAME   e2063.e9.akamaiedge.net.
e2063.e9.akamaiedge.net. 3589   IN      A       23.218.161.64

that is a lot of cnames in the chain..

And a +trace was showing me a failure to get NS for

couldn't get address for 'a.ns.apple.com': not found
couldn't get address for 'b.ns.apple.com': not found
couldn't get address for 'c.ns.apple.com': not found
couldn't get address for 'd.ns.apple.com': not found
dig: couldn't get address for 'a.ns.apple.com': no more
[22.05-RELEASE][admin@sg4860.local.lan]/root:

And i was getting a SERVFAIL

$ dig support.apple.com @192.168.9.253

; <<>> DiG 9.16.30 <<>> support.apple.com @192.168.9.253
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 61133
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;support.apple.com.             IN      A

;; Query time: 1 msec
;; SERVER: 192.168.9.253#53(192.168.9.253)
;; WHEN: Thu Aug 11 07:15:19 Central Daylight Time 2022
;; MSG SIZE  rcvd: 46

I flipped back on the do-ip6: no setting - and no issues with that site..

Was this some issue with unbound and IPv6? I don't think so - because the dig +trace would have zero to do with unbound..

And doing another +trace it seems to be working fine - and notice those were obtained via IPv6 transport

apple.com.              172800  IN      NS      a.ns.apple.com.
apple.com.              172800  IN      NS      b.ns.apple.com.
apple.com.              172800  IN      NS      c.ns.apple.com.
apple.com.              172800  IN      NS      d.ns.apple.com.
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q2D6NI4I7EQH8NA30NS61O48UL8G5 NS SOA RRSIG DNSKEY NSEC3PARAM
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20220815042401 20220808031401 32298 com. If06bSVXL7llnV+iyjrWR5yStSfeLZzeCKgsfVsqNQKKnP35dCsaGffz dRWOIGK+WzMKxVCiJZLiNyG5iYR9RybjH9jXMhDyYqro3M8eplcZtHnd DN0XXqhP/UDjMThDkJHxFERmmzraaU1wQLMcse/uOLMzmZVmOalWkbC+ TtXIfd8f4KB+h/M6X23C9UZ7oyYI78gqwS3Rq0fKInDxXA==
S0NU5UHI013IVS6TTUSM4FC3OQJB2EJF.com. 86400 IN NSEC3 1 1 0 - S0NU9MFCV1Q1NSBMAPQMU0H7MHN47JDT NS DS RRSIG
S0NU5UHI013IVS6TTUSM4FC3OQJB2EJF.com. 86400 IN RRSIG NSEC3 8 2 86400 20220818054103 20220811043103 32298 com. S3RZuwhflaFxPtxa1gjp5aRyT31BpAfWm9hdOXwx+JSGmHTyGfDaSCMa coQlorU8pjmvpGdQ4SnQKWkj81DdjIo7aCu9BeKRGG7wDkulttlovYcp vrg2qmHQZIEIP5pdjk8Ht15AyzZIi9vxcx6tg51WzkMV2SApd/AUFMzM 13iULU+d5+tll6yK6iGyD9ZT17SOTGOsZXFroo0CehBOCw==
;; Received 838 bytes from 2001:502:1ca1::30#53(e.gtld-servers.net) in 34 ms

What does this tell us - not sure as of yet. But clearly I had an issue with a +trace pulling info, and this was just for the 1st cname in the long chain required to get to the actual IP for support.apple.com

And unbound is not involved in doing a dig +trace.. Other than pulling the roots. when the client asks loopback.

I will leave do-ip6 no set for now, and see if run into any more issues, and then will flip it back off and allow ipv6 transport and see if see problems again.

My advice for anyone having issues with dns currently is set the no-ipv6: no for unbound - does this make it better?

mvikman

Is the unbound ipv6 setting "do-ip6" or do-ipv6" ?

I've seen in these forums, that some say "do-ipv6" and others "do-ip6", also google search results mostly "do-ip6"...

Kempain

@mvikman

Maybe both work but I have

johnpoz

@mvikman it is

My bad if was saying do-ipv6, guess habit of ipv6 which is common representation of it.

And seems @Gertjan has it listed as do-ipv6 in his post.

https://nlnetlabs.nl/documentation/unbound/unbound.conf/

I will go through my posts and make sure fix any of my typos - great catch

Gertjan

@mvikman

Nice catch !
You should not automatically believe what people propose here. Especially me included.
After all, this is unbound land, not really related to pfSense.

Settings, mentioned in the pfSense doc, or not, could influence your security, you should do the new thing : fact checking for yourself.

For example : the FreeBSD official man pages : unbound.conf
Or go to the source : the doc from the author : https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html
Real dingos would check the code and compile from source.

The image I've shown above isn't dangerous. unbound would simply bail out on the syntax error.

So it is

server: 
    do-ip6: no

and check that

edit !

@johnpoz said in Slow DNS after 22.05:

And seems @Gertjan has it listed as do-ipv6 in his post.

Yeah, was looking just fine was it ?
I was cutting corners, wanted to do it fast, and didn't really test (apply) it.
As it looks so simple and innocent.
I've tested it now.
This is what I got :

and yes, if parameters are not recognized at the pre config test, then if will show a message.
pfSense uses the unbound config test tools to check first. Nice.

johnpoz

@gertjan yeah it would scream at you if something wrong in your config options text box.

i think I have edited all my mistakes of using the wrong setting name in actual post text.

As check - if you look in your resolver status menu item in status - if you see IPv6 in the cache - then setting is not there. See my above examples showing the IPv6 address in the resolver status..

Now that I have put it back to no, no IPv6 listed there.. And have yet to see any issues in browser having issues with sites, etc. But then again, I only found that 1 issue when I was allowing use of my HE tunnel.. But different users go to different sites, etc. So if there is an issue with transport over ipv6, it might be so pronounced for some users that to them dns is just borked, or could be for others 1 out of 100 sites or something and simple refresh of page and it works so they don't even notice it, etc.

My recommended settings for resolving would be to

turn on prefetch
serve 0

These can help reduce any sort of first query delay when a resolver has to start from scratch and work all the way down from roots. Should they be required no, are they default - I don't think so. But prefetch can allow for something that is close to expiring in the cache to get a refresh before someone asks for it and cache is completely empty and has to be resolved from scratch again.

Same thing with serve 0 ttl - this allows for a client asking for whatever.something.com to get the last entry that was cached for it, and then in the background unbound will resolve it again.

This can also reduce any sort of delay with initial resolve all the way from roots.

Keep in mind even a full resolve all the way from roots shouldn't take very long - but these settings should help if whatever reason there is delays in a full resolve, long cname chains, connectivity issues with specific NS in the path to get to the specific fqdn, etc.

I have been using these for years, and have not seen any problems with them.

lohphat

Isn't disabling IPv6 in unbound just masking the problem? I may be a great work-around, but it seems a little short-sighted.

This behavior changed between 22.01 and 22.05 due to unbound itself changing. Since pfSense defaults to it instead of bindd then that dependency is clear and the problem needs to be escalated properly for review.

Yes, we're in a IPv6 transition period and yes, today, you MIGHT be able to disable it and walk away, but I would consider it a bit of a head-in-the-sand response.

What is the proper channel for escalating observations to the unbound team?

serbus

@lohphat said in Slow DNS after 22.05:

What is the proper channel for escalating observations to the unbound team?

Hello!

Isnt the fix for the do-ip6 workaround (as derived from https://github.com/NLnetLabs/unbound/issues/670) already in the pfsense development snaphots?

John

Gertjan

@lohphat said in Slow DNS after 22.05:

Isn't disabling IPv6 in unbound just masking the problem

See the "670" link below.
Short answer : yes, of course, it's a sledge hammer solution.
But read the bug thread @redmine from unbound : it looks like buffers fill up to the max, and outstanding requests get aborted. Reading the patch proposed also makes me think the code wasn't isn't resilient enough. The issue might even be bad DNS record / bad DNS setup on the DNS server side. And the result is : unbound bails out.
The "670" mentioned other variables you can set to bigger values so cache memory becomes bigger.

IPv6 isn't always well implemented. Or less tested. And there is more to test.

@serbus said in Slow DNS after 22.05:

Isnt the fix for the do-ip6 workaround (as derived from https://github.com/NLnetLabs/unbound/issues/670) already in the pfsense development snaphots?

If that unbound version can be back ported to 12.3 or whatever FreeBSD it has to use.
Netgate isn't using the latest and greatest. As the latest has also less known bugs ^^

@johnpoz said in Slow DNS after 22.05:

I only found that 1 issue when I was allowing use of my HE tunnel..

In the past, I had way more issues using he.net.
The web site of my own ISP, Netflix, and yes, Apple (the cloud stuff) : when I was using IPv6 ( using he.net ) pages won't load, stalled, CSS was broken.

So I used pfBlockerng-devel with the No-AAAA option, so I could list sites to which I wanted to talk to using IPv4 only.

That now over :

My Netflix works well over tghe he.net ipv6 tunnel. More and more sites are IPv6 without any IPv4 pollution.

And remember : the IPv6 of he.net can be considered as a VPN access.
These are the access points : https://tunnelbroker.net/status.php. I took the closest to me of course, but I could take an tunnel server in the states, and then I would have a US IPv6 ?
No need to recall : some sites don't like to be accessed using VPN !? :)

@johnpoz said in Slow DNS after 22.05:

turn on prefetch
serve 0

Yeah, reading the description made me think : these are to good to be true !
I've checked these from day 1.

Kempain

@johnpoz said in Slow DNS after 22.05:

My recommended settings for resolving would be to
turn on prefetch
serve 0

Would you recommend these instead of 'do-ip6: no' or as well as?

Happy to play around with settings and see what impact it has.

Strangely lost internet last night and didn't come back on it's own. I had to bounce pfSense and all my other networking gear to get it back for some reason. I did noticed the dpinger gateway monitoring service had died.

Suspect all my kit was overheating because it's pretty hot here in the UK at the moment and my makeshift comms cabinet has limited airflow. One of the things on my very long list to sort out! For now I've strung a load of computer fans together and chucked them in there which seems to be doing the job for now

lohphat

@gertjan This is the reply I was hoping for. Thank you.

It's clear there IS a problem with unbound running out of resources and thus is affecting 22.05. My hunch that all (or close to all of) the reports of broken DNS had IPv6 enabled as a common symptom has been corroborated.

The fact that there's been a significant and open unbound bug since April is interesting that this known problem wasn't somehow included in a "known issue" version release notes of pfSense is of minor concern. We all know there are open issues in all products in our modern software driven world. But we shouldn't have to make upgrade decisions blindly.

Could this be a wake-up call that upon pfSense releases that there's an inclusion of known open issues from the upstream BSD or component (e.g. unbound) bug trackers so that we can make a better informed decision to apply the update or not?

Mikymike82

Okay all,
After alle the comments and usefull information provided we can come / came to the conclusion that the essence lies within IPv6, DNS (unbound) and 22.05.

With that in mind i reversed my previous temporary solution of for enabling query forwarding (essentialy forwarding all my requests to an upstream provider). Essentialy back to basic.

With a lot of settings, changes, and some involved headache because my provider isn't that informative when it comes to having your own firewall/router. I configured IPv6 to completly function. Testing this with different websites like https://ipv6-test.com/ and https://test-ipv6.com/ i cloud confirm that my configuration was a success.

I can fully confirm that with a non working IPv6 configuration or a provider supporting that.... that you should look at elliminating IPv6 from your current config as suggested in this forum post.

Ok... i must say i have been only running for an hour... but all seems fine now.

So my suggesttion would be to check if your provider supports IPv6, if so, check your settings and follow the test websites to see if your resolving ok and your config is as expected.... if yes; you are probaly in the clear and smiling, if not... then;

Option 1: start all over again with your IPv6 config as i did severall times (TEST TEST TEST)
Option 2: just follow the instructions for diabling IPv6 in the resolver and wait for your provider to fully support IPv6 as they should.

Basicly: Standard PFsense configuration with a good ISP IPv6 config.

tentpiglet

@johnpoz said in Slow DNS after 22.05:

@pcol-it-admin said in Slow DNS after 22.05:

said that they had "stock" pfSense DNS resolver settings

I find this is rarely the case to be honest..

I run pfSense in a proxmox VM on a Dell workstation. These are my DNS settings, which exhibit the issue.

As previous indicated I believe the only two "non-stock" options are the ones related to dhcp.

My problem is 100% reproducible. If I use the pfSense resolver after a random period of time I still start to get NXDOMAIN errors in Chrome for common websites. Hitting refresh/reload a couple of times will clear the error and the page will load. It's not because I have fat fingers and am typing facebook.c0m rather than facebook.com.

This is less of an issue for me as I simply spun up a pi-hole lxc on my proxmox server and redirected all my dns inquires to the pi-hole (which has no issues w/ resolution), but obviously not everyone has this option.

Gertjan

@tentpiglet
Do some more testing with this option removed :

as it is perfectly normal to see NXDOMAIN popping up ones in a while : unbound is restarting because of DHCP leases activity.
Add some DHCP MAC static leases for devices that you always want to have the same IP, like printers, servers, NAS etc.