21.02.02 on SG-5100 - Every Reboot Requires Restart of DNS Resolver
-
Sorry, I'm having a bunch of network issues and I guess I posted this on the wrong thread... I meant to post this here:
I have two SG-5100s and two SG-4860s. I did an upgrade from 2.5 to 21.02.2-RELEASE on both SG-4860s and one of the SG-5100's.
I am now seeing this same unbound DNS resolver crash issue on both SG-5100s (even the one that I did not upgrade) and one of the SG-4860s.
I am also running pfBlockerNG and Suricata.
-
I still have this very annoying issue as do many others. As surmised by posts here and in other forum topics, I am guessing that something has changed in the boot sequence with 21.x/2.5.x causing unbound to start before the LAN interfaces are fully up. Thus, the resolver starts up not listening for LAN DNS requests. This happens 100% of the time for me. Every boot/reboot means LAN clients cannot access the Internet using domain names until I restart unbound. This was never an issue with prior pfsense versions.
What is the simplest temporary fix for this? I need to either delay initial unbound start or force an unbound restart sometime after boot completes. It saddens me to have to resort to a hack for something that should just work but it is either that or roll back.
Peter
-
I believe I have the very same issue, other than ubound will crash or get hung up for a very long time, multiple times after the firewall is up and running. Each time I am failing to resolve names on the LAN and I need to restart DNS Resolver and then things are fine. I am running on 21.02.2-RELEASE on an SG-2220 that has historically been incredibly reliable until this issue.
-
I now tried the following:
System -> Routing -> Gateways:
For all gateways I checked "Disable Gateway Monitoring" and "Disable Gateway Monitoring Action"
Services -> DNS Resolver -> General Settings:
In "Network Interfaces" selected every single entry and not only "All" which I had before.
I tried:
- Reboot pfSense
- Broken WAN connection
Both times UNBOUND die work afterwards without any manual intervention.
Not sure if it is by accident or if it changed something. So keep fingers crossed ;-)
-
I've seen similar behavior with 2.5.0 and 2.5.1: after a reboot of my pfsense box, I had to manually restart the DNS Resolver.
For me, the issue appears to have been some of my IoT zoo members: I have a bunch of Sensibo Skys to control my a/c.
With their MAC address being aa:bb:cc:dd:ee:ff, they register a client hostname with the DHCP server of "Sensibo Sky ff:ee:dd:cc:bb:aa" (yes, reverse byte order; and to increase confusion: bytes<0x10 lack the leading 0 in the hostname).
As I can see, the ":" is not a valid character in a hostname. The DHCP server doesn't seem to mind that much, however I had system log entries "bad name in dhcpd.leases".excerpt from /var/dhcpd/var/db/dhcpd.leases
lease 10.94.0.102 {
starts 5 2021/06/04 09:13:02;
ends 5 2021/06/04 11:13:02;
tstp 5 2021/06/04 11:13:02;
cltt 5 2021/06/04 09:13:02;
binding state active;
next binding state free;
rewind binding state free;
hardware ethernet bc:dd:c2:11:f6:0f;
client-hostname "Sensibo Sky f:f6:11:c2:dd:bc";
}Since I've mapped these devices to fixed DHCP settings and gave them valid hostnames on the DHCP server tab, I can happily reboot my pfsense box, and it comes back up with a running and functional DNS resolver.
From that, I'd assume there's at least three culprits in my case:
- the IoT devices registering invalid names
- the DHCP server not filtering that and putting them in its lease database as is
- the DNS resolver not being able to deal with this, and apparently refusing to come up on boot.
The DNS resolver has "Register DHCP leases in DNS Resolver", and "Register DHCP static mappings in the DNS Resolver" both active.
I have no clue why the DNS resolver would work despite the issue after a manual restart.
Maybe this helps someone.
-
One of the fixes in 21.05 was to revert Unbound to an older version due to "instability." (Presumably there will be another 2.x release shortly...)
-
There is no change or improvement for me with 21.05. I must still restart unbound after reboot.
Peter
-
@steveits said in 21.02.02 on SG-5100 - Every Reboot Requires Restart of DNS Resolver:
One of the fixes in 21.05 was to revert Unbound to an older version due to "instability." (Presumably there will be another 2.x release shortly...)
I have noticed a huge improvement with unbound after upgrading to 21.05. I was getting recurring errors in my unbound logs that no longer appear. I also have noticed my overall disk usage is way down with the same packages and configuration with 21.05. I had the feeling there was something up with the previous version (was it 21.02?) that was filling logs much faster and it seemed to all be related to DNS Resolver and DHCP Static maps. My steady state disk usage went from 94% to 64% and there is no apparent disk usage growth with 21.05. Although I did have a serious problem initially with 21.05. It somehow got to the point it was reporting 105% disk usage. I did a factor reset, installed the same packages and loaded my config and it has been solid since. Again, it all felt like it was DNS Resolver and DHCP Static map related, but I don't have proof of that.
-
@plfinch said in 21.02.02 on SG-5100 - Every Reboot Requires Restart of DNS Resolver:
There is no change or improvement for me with 21.05. I must still restart unbound after reboot.
Peter
Same with 21.05.1. I must still restart unbound after reboot.
Peter
-
@plfinch The closest I've come to that is that unbound usually stops during pfBlocker package installation, which is a known issue. What kind of WAN connection do you have? Seems like there has to be something specific to your setup that's different.
I think the only change in 21.05.1 for non-3100 hardware is the captive portal fix, at least per the readme.
-
Iām pretty sure this started with the move to 21.02 and has continued with all updates since. It is quite annoying, obviously, since manual action is required after every reboot.
WAN is 500Mb Xfinity cable via an Arris SB8200.
Packages are:
apcupsd
arpwatch
bandwidthd
darkstatI did verify problem still exists with arpwatch removed but not the others.
This firewall (SG-5100) is overkill for the traffic and config I have and typically loafs at 1-2% CPU.
Peter
-
I finally updated my spare firewall, an SG-2440, directly from 2.4.5_1 to 21.05.1. No issues with upgrade and the DNS Resolver works fine immediately following a reboot. This firewall has the exact same packages and configuration settings as my SG-5100. I still think this is a race condition in startup between when the LAN comes up and when the DNS Resolver comes up that leaves the DNS Resolver not answering queries from the LAN. Not sure where to go from home. Maybe a complete re-install and see what happensā¦
-
Ive got an SG3100 with 21.05.1-RELEASE and also have to manually restart unbound whenever I reboot the router, power goes out, etc. I have everything set to factory defaults.
None of my other non-pfsense routers require any touch labor when power cycled. Why would this behavior make it to a released version of pfsense running on Netgate-developed and -tested hardware for one release, let alone several?
This needs to get fixed. Installing a cron plugin to restart the unbound server after boot up is a hack.
-
I'm just a user, offering opinions and observations, not connected to Netgate/pfSense, so take it for what it's worth.
So what is the biggest difference when you restart by hand? All the interfaces are up.
As @Gertjan points out setting the interfaces to ALL puts a listening socket everywhere.
Why does that matter? Well if an interface isn't up when unbound starts, eventually when it does, unbound will be listening there.If everyone having trouble has not selected all, but specifically LAN (as in only service requests coming in on LAN) maybe the unbound rc script needs to somehow wait for LAN to be up. As a simple test, try selecting ALL and doing the reboot and see if it works. It would be a workaround, cleaner than a cron job and something you could easily mark as "fix the config when the issue gets fixed".
As for unbound listening on all interfaces, don't forget that WAN has a default deny in rule: even with the listen socket on WAN:53, an inbound request should be dropped because of default deny. That inbound request is not tied to anything outbound so there is no state and it should be dropped.
Yes, I know, good security says "don't have any open ports where you don't need them" but in this case at least the door may be shut, just not locked.
-
I have the SG-2220 and do not have this issue. I know this doesn't help a whole lot but someone suggested it could be hardware specific. I hadn't used my SG-2220 for about two years due to divorce and just recently got it going again which is what led me here. I did have this problem and when I did an update when it came out I still had some troubles but not this trouble. I did a factory reset twice and for whatever reason the second reset is what made everything happy. I started with all new settings and didn't restore a thing. I know this doesn't necessarily help a whole lot, but I wanted to offer additional relevant info. It isn't failing on my Netgate SG-2220. What can you do with that? I don't know exactly, but I don't think it is just the software. It might be hardware specific race conditions as another user noted.
-
-