unbound quits resolving, have to restart
-
@SteveITS Before the unbound restart, no IP number back at all. I suppose I should have saved a copy of the complete output, but my eyes focus on the IP number(s).
-
@beerguzzle Oops missed your first comment.
Under General Settings, "Enable DNSSEC support" is on, but "Enable Forwarding Mode" is off.
-
@beerguzzle DNS is so bad/sketchy I sometimes use DNS over TLS and block all port 53 in/out WAN after jump starting it (allowing 53 for a moment.) Sometimes I block application DNS too. Plus dns over tls seems to support ECN more completely. The only issue is that now some DNS providers are using DTLS and QUIC on port 853 And ICMP redirects from ISPs send it to random servers. Elliptic curves, DNS harvesting and marketing, and anycast are major dns issues and sometimes its caching is too fast for my applications on regular routers over gigabit.
Some people have said that using your own DNS server is more often than not a better idea. I tried setting up a separate DNS server with OPNSense and somehow its ifconfig IP assignment literally became 8.8.8.8 so I tried using it for a few days (maybe some weird auto edge detection settings). And it worked even with my smart tv and got LLMNR working with it. BUT, after tweaking some settings I started getting routed dns requests from outside my home to port 53 bound for Google and had to shut it down.
While I wouldn't mind a Google DNS server here in my remote locale, actually setting one up correctly and stuff (and maybe getting paid for it) seems above my paygrade
My NICS are compatible with DPDK in freebsd and mobo is compatible with infiniband and I really need to audit kld modules when not using Opensolaris based pfSense.
-
I think, depending on your isp, sha-1 and md5 hashing are to blame. And maybe it is a problem solveable by implementing TPM with UEFI correctly. ZFS vs UFS filesystems may play a role too.
-
It'd be nice if firewall rules could block/allow/report what encryption algorithms are being used for connections. Nbase-T goes wild.
-
The Smart TV DID play doom eternal 4k hdr 120hz over geforce now with like 0 latency though.
-
@beerguzzle You can raise the Log Level in DNS Resolver advanced settings.
-
@SteveITS https://kb.isc.org/docs/aa-01219
and
https://docs.netgate.com/pfsense/en/latest/services/dns/resolver-advanced.html
EDNS Buffer Size
Number of bytes size to advertise as the EDNS reassembly buffer size. This value is placed in UDP datagrams sent to peers.
The default is Automatic and is calculated based on the MTU values of active interfaces. A variety of other common values are provided in a drop-down list.
Automatic mode sets optimal buffer size by using the smallest MTU of active interfaces and subtracting the IPv4/IPv6 header size. If fragmentation reassembly problems occur, usually seen as timeouts, then try a value of 1432.
The 512, 1220, and 1232 values bypass most IPv4 and IPv6 MTU path problems but can generate an excessive amount of TCP fallback
-
Reading the comments above, I bumped up the DNS resolver log level from 1 to 2 (detailed operational information) to see if I can get any clue the next time this happens. It only happens every few days, not hourly, so it is an annoyance not a crisis.
FWIW, the DNS servers listed on the pfSense Dashboard under System information give the servers as: 127.0.0.1 (the DNS cache I assume), ::1 (even though I don't use ipv6}; the DNS servers provided by the upstream ISP, and 8.8.8.8 (added by me).
Within PFBlockerNG/DNSBL/DNSBL Safesearch, I have DoH/DoT/DoQ Blocking enabled, and all of the choices right below (the blocklist) highlighted.
I have never seen port 853 traffic, it would be blocked in my setup with some logging.
Back in my working days, I used Bind 9 to keep an entire college on the beam. DNSSEC was on the horizon for TLDs but we hadn't done it. DNS was port 53, period. Browsers doing DNS was not a thing. Ahh, the good old days, 5 years ago.
-
@beerguzzle said in unbound quits resolving, have to restart:
Every few days when I log onto my computer (wired to the Netgate)
Is this physically the case : your PC is wired directly to pfSense, as this would be 'not good'.
It would be better if there was a switch between the 1100 and your the PC.
A PC (any device) connected directly to LAN would, on powerup, trigger a network or NIC link-up event, and this would restart many pfSense processes - unbound (resolver) being one of them.Also, if you have checked this one :
then, when you switch on your PC it will engage a DHCP request to get a lease.
As the resolver option mentions : it will get restarted at that moment.Another reason why unbound can get restarted : pfBlockerng.
Set up pfBlockerng so it reloads the DNSBL feeds every hour, and chances are great that unbound gets restarted every hour.I've now already mentioned 3 reasons why unbound can get restarted.
What makes things worse : these situations happen randomly, so its very possible that conditions will happen at the 'nearly' same time. This is known as a race condition. This is something that you don't want.Your mission, as a pfSense admin : create a situation where unbound (the resolver) never restarts.
This can't be done of course, but 'ones a week' is a good start.
I'm using pfBlockerng, and one of the reasons I sync my DNSBL ones a week is part of this.My resolver never 'blocks' or fails. never had issues with it.
My networks, ISP, pfSense etc uses IPv4/IPv6.
All my web sites (domain names) are DNSSEC protected, so my unbound does also DNSSEC checking, which is nothing more as "some more DNS requests". Normal DNS request do A, AAAA, MX, NS, and SOA requests. DNSSEC just adds two - just 2 - more.Btw : ok, the reality is somewhat different. As I mess around al lot with my pfSense, my unbound does restart more then ones a week. This graph, the one that shows the memory it uses shows the restarts clearly.
And I hate to say it, but want to mention it anyway : unbound running on a classic Intel iron processor, IMHO ( ! ) better as the same unbound running on ARM. And I repeat : I've nothing to validate this presumption.
I've a Netgate 4100 btw, using 24.03.My resolver settings are pretty 'default' :
and works fine for the last decade or so (since unbound is part of pfSense).
Maybe I'm lucky, and the ISP I chose doesn't mess with my DNS traffic - which is open, non encrypted traffic, after all. But if you have a doubt : stop resolving and go Forwarding over TLS - or do what is far more logic : get another ISP.
-
Gertjan, Ok this is a lot of information to digest at the moment. I am running Kea DHCP, so I don't have the "Register DHCP leases in DNS" checkbox. I could not find any reference to it; this feature seems to be part of ISC DHCP.
I have had my Netgate for a couple of years now and my Mac mini (Intel) has always been the only thing connected to the LAN interface, wire to wire. This problem started with 24.03. I did recently buy a switch for the OPT interface, because I added more things on that network. But the unbound restart problem predated the switch.
One wrinkle is that I move around a lot in the summer, and my 1100 goes with me. It is in one of two places, using two different ISPs (I recently changed one of them). So it gets shut down via the terminal interface, packed up, and restarted at the other site. Maybe it is time for me to buy a second 1100.
Your Munin graph has prodded me to think of Munin or MRTG. I will try to pick out unbound restart times from the syslogs first. So my mission now is to look at how often unbound restarts and maybe why. Thanks.
-
@beerguzzle said in unbound quits resolving, have to restart:
Gertjan, Ok this is a lot of information to digest at the moment. I am running Kea DHCP, so I don't have the "Register DHCP leases in DNS" checkbox. I could not find any reference to it; this feature seems to be part of ISC DHCP.
KEA is fine, I guess .... It can't restart unbound, so that 's ok.
@beerguzzle said in unbound quits resolving, have to restart:
my Mac mini (Intel) has always been the only thing connected to the LAN interface, wire to wire.
Then understand that on every LAN (pfSense, MAC Mini) event == up and down events, a lot of process gets restarted. This includes unbound.
Normally, that ok-ish.
Just, I'm not a fan of such a setup, as this can creates issues that I don't have / don't know.@beerguzzle said in unbound quits resolving, have to restart:
So it gets shut down via the terminal interface
That's the way to do it.@beerguzzle said in unbound quits resolving, have to restart:
So my mission now is to look at how often unbound restarts and maybe why.
Exact.
You don't need Munin to see these.
Here is a one line :cat /var/log/resolver.log | grep 'start'
which means : list the file /var/log/resolver.log and pass it through 'grep' that searches the lines which contain 'start'.
Munin : i'm not sure if I can advise it's usage, as most perl and other Munin "FreeBSD" dependent packages are not in the pfSense-FreeBSD packages server anymore.
This means you have to point the package server 'pointer' to the main "FreeBSD 15" packages servers, and that can be dangerous. I had to pull in many packages that needed other dependency packages, a small hundred of them. At any moment, a package could 'destroy' the system as core FreeBSD of pfSense is somewhat difference as a native FreeBSD system.
At any moment I was ready to 're install' from the ground up, if needed.
I like Munin as I know how it works (it's old), have I have other stuff already using munin also, and know how to add more functionality = it's always a matter of 'writing another script file in whatever language avaible'. -
bumping my log level up to 2 cluttered up my ability to see restarts, so I put it back to level 1. Doing a
cd /var/log
grep -i start resolver.log* (yes I speak Unix)Just showed some of today's activity -- 2 restarts. Plus I changed locations today (pack up 1100, Mac mini, wifi router, then travel)
Per advice, I put my Mac Mini on the switch when I set things up this afternoon. So I'll see if a switch helps.
-
@beerguzzle said in unbound quits resolving, have to restart:
bumping my log level up to 2 cluttered up my ability
Not only that, it will 'explode' the size of the file.
But, as soon as the pre set maximum size is reached, it will get rotated.It's possible to take counter measures : make the file size bigger, but that's probably not really an option on the 1100, it has a small storage.
shows my resolver.log has the details since juin 19, or something like 3 weeks.
When I set debug level 2, it will be less then a day, even with a non default, way bigger 2048 Mbytes log file size - the default is 512 Kb I guess. -
@Gertjan have you packet captured the ISP's DHCP options and tried to make them match in the FreeBSD/pfSense files? Also, is you unbound allowing a remote control option on port 953 over localhost? I am not 100% certain of conflicts there, BUT, I know there is a remote control feature buried in there and the combination of Kea instead of ISC and whatever localhost is doing may cause issues.
Also, my ISP has had some success changing my interface local hostnames, checkable with netstat -r while using DHCP. Could be they are picky, or it could be a nonce or off-by-one c code issue.
Sometimes changing everything to attlocal.net or hostname and domain to pfsense.attlocal.net or dsldevice has worked for me if you are allowing the isp to override dns. If not maybe not bothering with dhcp is more valid or just using dnsmasq.
One decent way to set up dnsmasq is to port forward everything LAN to localhost port 54 and forward everything to the ISP's assigned DNS servers or the ISPs router, however, sometimes the ISP's router has some pretty crazy foreign vpns and crap connected to it and strict NAT options. The ISP may send ICMP error messages to you based on bad checksums.
Also, are you using IPv6? It may be a better option to disable it until you get IPv4 working.
-
CVE-2021-23017
6.8A security issue in nginx resolver was identified, which might allow an attacker who is able to forge UDP packets from the DNS server to cause 1-byte memory overwrite, resulting in worker process crash or potential other impactFixed: Update NGINX to address CVE-2021-23017 #12061
https://docs.netgate.com/pfsense/releases/22-01_2-6-0.html
https://redmine.pfsense.org/issues/12061
-
@Gertjan I'd reinstall all firmware but that is just me.
-
@Gertjan if your ISP has syslogs and you aren't scared of modern ASLR attacks you could send them to a local server and see what errors they have. Or send them directly to the pfSense or send pfSense's syslogs to the ISP router.
-
@Gertjan Do you hide netgate version identity or have any weird ISP IP passthrough or cascaded router options? Sometimes those work pretty well. A gaming console can tell you if NAT is actually working correctly. I can get either to work but it takes time.
-
@HLPPC there is also NAT pinhole-ing in some ISP routers for public IPs. https://en.wikipedia.org/wiki/Firewall_pinhole