21.02.02 on SG-5100 - Every Reboot Requires Restart of DNS Resolver
-
I do have the same problem with v.2.5.1. I have to restart my unbound after a reboot of the firewall.
@grumple
Do have any VPN clients running? Try to disable them, and reboot again. See if it helps. -
Had a maintenance window...
I rebooted 5 times with DHCP Registration (Register DHCP leases in the DNS Resolver) enabled and 5 times with it disabled. All 10 times I had to restart unbound (DNS Resolver) before LAN clients could get DNS queries resolved. So the issue is not related to DHCP lease registration, and is consistently reproducible.
I did notice that the state of unbound (from ps command) was I (Idle kernel thread) immediately after boot with the problem present and no additional cpu time was accumulating (frozen at 0.22S). After restarting unbound the state was S (interruptible sleep, pending event) as expected and cpu time did accumulate on subsequent ps queries (with unbound obviously waking and doing work).
Further, as grumpie noted, from sockstat I can see that when the issue is present unbound is listening on 127.0.0.1 but not on the firewall’s primary IP as would be necessary to satisfy LAN DNS requests. After restarting unbound it is listening and responding on the firewall’s primary IP as well.
So, clearly, following reboot, unbound has stopped processing events and is permanently out to lunch on the LAN side. This behavior is new.
-
I have exactly the same issue (custom build socket 1151 PC). I have to restart unbound after every reboot of the PC - unbound service itself appears running, but none of the lan clients can resolve names until manual unbound restart. It all started after upgrade to 2.5.0 (update to 2.5.1 changes nothing)
Looks like workaround is to select "all" in the both interfaces settings on Services-DNS Resolver-General Settings page. [need more testing, waiting for maintenance window]
I have open vpn server enabled.
I have "Register DHCP static mappings in the DNS Resolver" option enabled.
I have "Register DHCP leases in the DNS Resolver" option disabled.Issue is 100% repeatable.
-
Hello!
Pfsense generates the unbound.conf on the fly from its config under a variety of circumstances, including when you boot (rc.bootup). Assuming you specified which network interfaces to listen on (not ALL), it will not add an interface to the config that is disabled or has no carrier.
I wonder if your LAN interface is showing no carrier for some period of time at boot. Once the interface is UP and you restart unbound, pfsense sees it and adds it to the unbound.conf...?
I see this behavior if I dont have an active cable/device plugged into the LAN port at boot. For kicks, you could try putting a long sleep in front of the
services_unbound_configure() call in rc.bootup and see if it makes a difference.John
-
@densilent said in 21.02.02 on SG-5100 - Every Reboot Requires Restart of DNS Resolver:
I have "Register DHCP static mappings in the DNS Resolver" option enabled.
The default value. That should be ok.
Take my word for it : Unchecking is even better.Set these to "All" :
-
Just to make it a bit more complicated ;-) I do see exactly that behaviour all the time if my WAN connection has been broken and comes back. At the moment we do have a not so stable cable connection so I had that several time in the last 5 days. No reboot of the pfSense was box has been done, jut the WAN connection was down and came back.
also have pfSenseNG running. Wondering if this might have some influence here.
-
Sorry, I'm having a bunch of network issues and I guess I posted this on the wrong thread... I meant to post this here:
I have two SG-5100s and two SG-4860s. I did an upgrade from 2.5 to 21.02.2-RELEASE on both SG-4860s and one of the SG-5100's.
I am now seeing this same unbound DNS resolver crash issue on both SG-5100s (even the one that I did not upgrade) and one of the SG-4860s.
I am also running pfBlockerNG and Suricata.
-
I still have this very annoying issue as do many others. As surmised by posts here and in other forum topics, I am guessing that something has changed in the boot sequence with 21.x/2.5.x causing unbound to start before the LAN interfaces are fully up. Thus, the resolver starts up not listening for LAN DNS requests. This happens 100% of the time for me. Every boot/reboot means LAN clients cannot access the Internet using domain names until I restart unbound. This was never an issue with prior pfsense versions.
What is the simplest temporary fix for this? I need to either delay initial unbound start or force an unbound restart sometime after boot completes. It saddens me to have to resort to a hack for something that should just work but it is either that or roll back.
Peter
-
I believe I have the very same issue, other than ubound will crash or get hung up for a very long time, multiple times after the firewall is up and running. Each time I am failing to resolve names on the LAN and I need to restart DNS Resolver and then things are fine. I am running on 21.02.2-RELEASE on an SG-2220 that has historically been incredibly reliable until this issue.
-
I now tried the following:
System -> Routing -> Gateways:
For all gateways I checked "Disable Gateway Monitoring" and "Disable Gateway Monitoring Action"
Services -> DNS Resolver -> General Settings:
In "Network Interfaces" selected every single entry and not only "All" which I had before.
I tried:
- Reboot pfSense
- Broken WAN connection
Both times UNBOUND die work afterwards without any manual intervention.
Not sure if it is by accident or if it changed something. So keep fingers crossed ;-)
-
I've seen similar behavior with 2.5.0 and 2.5.1: after a reboot of my pfsense box, I had to manually restart the DNS Resolver.
For me, the issue appears to have been some of my IoT zoo members: I have a bunch of Sensibo Skys to control my a/c.
With their MAC address being aa:bb:cc:dd:ee:ff, they register a client hostname with the DHCP server of "Sensibo Sky ff:ee:dd:cc:bb:aa" (yes, reverse byte order; and to increase confusion: bytes<0x10 lack the leading 0 in the hostname).
As I can see, the ":" is not a valid character in a hostname. The DHCP server doesn't seem to mind that much, however I had system log entries "bad name in dhcpd.leases".excerpt from /var/dhcpd/var/db/dhcpd.leases
lease 10.94.0.102 {
starts 5 2021/06/04 09:13:02;
ends 5 2021/06/04 11:13:02;
tstp 5 2021/06/04 11:13:02;
cltt 5 2021/06/04 09:13:02;
binding state active;
next binding state free;
rewind binding state free;
hardware ethernet bc:dd:c2:11:f6:0f;
client-hostname "Sensibo Sky f:f6:11:c2:dd:bc";
}Since I've mapped these devices to fixed DHCP settings and gave them valid hostnames on the DHCP server tab, I can happily reboot my pfsense box, and it comes back up with a running and functional DNS resolver.
From that, I'd assume there's at least three culprits in my case:
- the IoT devices registering invalid names
- the DHCP server not filtering that and putting them in its lease database as is
- the DNS resolver not being able to deal with this, and apparently refusing to come up on boot.
The DNS resolver has "Register DHCP leases in DNS Resolver", and "Register DHCP static mappings in the DNS Resolver" both active.
I have no clue why the DNS resolver would work despite the issue after a manual restart.
Maybe this helps someone.
-
One of the fixes in 21.05 was to revert Unbound to an older version due to "instability." (Presumably there will be another 2.x release shortly...)
-
There is no change or improvement for me with 21.05. I must still restart unbound after reboot.
Peter
-
@steveits said in 21.02.02 on SG-5100 - Every Reboot Requires Restart of DNS Resolver:
One of the fixes in 21.05 was to revert Unbound to an older version due to "instability." (Presumably there will be another 2.x release shortly...)
I have noticed a huge improvement with unbound after upgrading to 21.05. I was getting recurring errors in my unbound logs that no longer appear. I also have noticed my overall disk usage is way down with the same packages and configuration with 21.05. I had the feeling there was something up with the previous version (was it 21.02?) that was filling logs much faster and it seemed to all be related to DNS Resolver and DHCP Static maps. My steady state disk usage went from 94% to 64% and there is no apparent disk usage growth with 21.05. Although I did have a serious problem initially with 21.05. It somehow got to the point it was reporting 105% disk usage. I did a factor reset, installed the same packages and loaded my config and it has been solid since. Again, it all felt like it was DNS Resolver and DHCP Static map related, but I don't have proof of that.
-
@plfinch said in 21.02.02 on SG-5100 - Every Reboot Requires Restart of DNS Resolver:
There is no change or improvement for me with 21.05. I must still restart unbound after reboot.
Peter
Same with 21.05.1. I must still restart unbound after reboot.
Peter
-
@plfinch The closest I've come to that is that unbound usually stops during pfBlocker package installation, which is a known issue. What kind of WAN connection do you have? Seems like there has to be something specific to your setup that's different.
I think the only change in 21.05.1 for non-3100 hardware is the captive portal fix, at least per the readme.
-
I’m pretty sure this started with the move to 21.02 and has continued with all updates since. It is quite annoying, obviously, since manual action is required after every reboot.
WAN is 500Mb Xfinity cable via an Arris SB8200.
Packages are:
apcupsd
arpwatch
bandwidthd
darkstatI did verify problem still exists with arpwatch removed but not the others.
This firewall (SG-5100) is overkill for the traffic and config I have and typically loafs at 1-2% CPU.
Peter
-
I finally updated my spare firewall, an SG-2440, directly from 2.4.5_1 to 21.05.1. No issues with upgrade and the DNS Resolver works fine immediately following a reboot. This firewall has the exact same packages and configuration settings as my SG-5100. I still think this is a race condition in startup between when the LAN comes up and when the DNS Resolver comes up that leaves the DNS Resolver not answering queries from the LAN. Not sure where to go from home. Maybe a complete re-install and see what happens…
-
Ive got an SG3100 with 21.05.1-RELEASE and also have to manually restart unbound whenever I reboot the router, power goes out, etc. I have everything set to factory defaults.
None of my other non-pfsense routers require any touch labor when power cycled. Why would this behavior make it to a released version of pfsense running on Netgate-developed and -tested hardware for one release, let alone several?
This needs to get fixed. Installing a cron plugin to restart the unbound server after boot up is a hack.
-
I'm just a user, offering opinions and observations, not connected to Netgate/pfSense, so take it for what it's worth.
So what is the biggest difference when you restart by hand? All the interfaces are up.
As @Gertjan points out setting the interfaces to ALL puts a listening socket everywhere.
Why does that matter? Well if an interface isn't up when unbound starts, eventually when it does, unbound will be listening there.If everyone having trouble has not selected all, but specifically LAN (as in only service requests coming in on LAN) maybe the unbound rc script needs to somehow wait for LAN to be up. As a simple test, try selecting ALL and doing the reboot and see if it works. It would be a workaround, cleaner than a cron job and something you could easily mark as "fix the config when the issue gets fixed".
As for unbound listening on all interfaces, don't forget that WAN has a default deny in rule: even with the listen socket on WAN:53, an inbound request should be dropped because of default deny. That inbound request is not tied to anything outbound so there is no state and it should be dropped.
Yes, I know, good security says "don't have any open ports where you don't need them" but in this case at least the door may be shut, just not locked.