Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unbound DNS intermittent failure

    Scheduled Pinned Locked Moved DHCP and DNS
    21 Posts 7 Posters 4.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R
      romainp
      last edited by

      Hi,
      I got too some really strange dns issue…
      From time to time, the dns resolution does not work at all or take very long time. It could happen several time per hour, all system connected to pfsense are affected. It is not from my ISP or my DSL router on which I am connected to because if I do a nslookup google.com 8.8.8.8, it works perfectly but if I use the internal pfsense dns, it fail.

      It happens some weeks ago. At that time I thought that because I did several upgrade of pfsense without a real good clean installation it could be the root cause. So I made backup, install from scratch and restore my config and everything was fine until today. The only thing I change yesterday was to install the trafic total package.
      I don't see any obvious reason why I got this issue but I will try to investigate more.

      Thanks.
      R.

      1 Reply Last reply Reply Quote 0
      • johnpozJ
        johnpoz LAYER 8 Global Moderator
        last edited by

        How many entries?  FQDN have to be looked up every so often - if you have hundreds of fqdn and they all return lots of IPs then sure could be a contributing factor..

        Not sure if it still an issue but register dhcp restarts unbound - so if you have hundreds of dhcp clients and or very short lease times you could have unbound starting every few minutes which would for sure cause a problem with clients actually being able to lookup anything ;)

        Also if filterdns is having to lookup 1000's of fqdn every few minutes that have very short ttl's etc.. This also could be a problem depending…

        An intelligent man is sometimes forced to be drunk to spend time with his fools
        If you get confused: Listen to the Music Play
        Please don't Chat/PM me for help, unless mod related
        SG-4860 24.11 | Lab VMs 2.7.2, 24.11

        1 Reply Last reply Reply Quote 0
        • R
          romainp
          last edited by

          Hi,
          It's just an home setup with max 30 fixed dns entries. I use pfblockerng also but even if I stop it I still have this strange behaviour. I understand that unbound could be restarted when a dhcp client register itself to the dns but it should not take 30 sec to the dns to work again…

          The problem is that I don't see obvious reasons in the logs that could explain this...

          1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator
            last edited by

            Is it restarting or not?  I have been running unbound on a home setup in resolver mode in pfsense since before it was included and was a package.  Have never had any such issues other than the dhcp restart thing.

            I really see no point of registering dhcp in a home setup.  All my devices I care about have reservations so I know what IP they are and yes the static entries are registered.  Devices that are just going to get some random IP out of the pool are going to be guest sort of devices and don't give 2 shits what what their name is or IP is, etc.  They are only going to to be on the network temp… If they were always going to be on the network and I wanted to resolve them they would have reservations for an IP, etc.

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

            1 Reply Last reply Reply Quote 0
            • GrimsonG
              Grimson Banned
              last edited by

              @romainp:

              I use pfblockerng also but even if I stop it I still have this strange behaviour. I understand that unbound could be restarted when a dhcp client register itself to the dns but it should not take 30 sec to the dns to work again…

              Are you using the TLD feature of pfBlockerNG? If yes, did you read the infoblock? Especially this:

              The 'Unbound Resolver Reloads' can take several seconds or more to complete and may temporarily interrupt DNS Resolution until the Resolver has been fully Reloaded with the updated Domain changes.

              1 Reply Last reply Reply Quote 0
              • L
                Liath.WW
                last edited by

                Myself, I have 3 aliases with domains in them.
                The biggest one is the eve online one, the other two point to voice servers and only resolve to one place.

                Also, since switching off unbound and using the forwarder only, I've not had a single peep with browsing issues, and my family is off my butt.

                This further points to unbound being part of the problem. Not sure how or why, but if unbound is the only thing that fails then that kinda points to unbound being at fault either itself, or by failing due to some other process and its inability to not choke on it.

                However, I would like to use unbound dns as dnssec is something that i believe in, and my clients would require.  If only we could get to the bottom of the issue, and put me in a place of confidence in the product again, I'd start pitching it.  Heck I have one client that lately requests daily changes to rules that consume time by requiring a login on each sonicwall individually over 18 sites… with differing firmware to make life more interesting.  If I could run all of the sites with small appliances running pfsense, it would cut down at least 12 hours a week of unproductive time.

                1 Reply Last reply Reply Quote 0
                • R
                  romainp
                  last edited by

                  Thanks for the infos.

                  Because I use PfblockerNG and need unbound but event if I stop it I still have the issue. I will try to set the debug level higher and have the stats and logs managed by telegraf (I saw a plugin for unbound but not sure if it can work) or use collected (I see an article on how to use collectd on pfsense).
                  If I can output those logs and the stats to an ELK stack I can at least see a pattern because I do not see any error messages in the logs…

                  R.

                  1 Reply Last reply Reply Quote 0
                  • L
                    Liath.WW
                    last edited by

                    If you can come up with why its crashing on your end I'd love to hear about it.  I wonder if it is something hardware related, or some obscure setting that we've used.

                    I just can't figure it out.

                    1 Reply Last reply Reply Quote 0
                    • R
                      romainp
                      last edited by

                      Hi,
                      I do not have a proof of it but it seems related to the fact that when some dhcp client request a new IP, the dhcp server send a signal to the dns server (which is correct since I ask the dns resolver to accept that, somewhere in the config), but when the sighup occurs, the dns do not proceed any request for 20-30 secs.

                      I will try to have some logs/detail info about that but I am pretty sure of this.

                      R.

                      1 Reply Last reply Reply Quote 0
                      • ?
                        Guest
                        last edited by

                        @Liath.WW:

                        I think I may have stumbled upon something in the ISP modem config that could be causing this, though the times are different than the pfSense 5 minute issues.
                        In the IP-passthrough page, there is a Passthrough DHCP Lease. Default value is 10 minutes.  I changed to 1 day, hopefully this is the root cause and will fix things.

                        FYI, the modem is this one:

                        Manufacturer ARRIS
                        Model Number BGW210-700

                        I have many of the problems discussed here on this thread and also an ARRIS modem on a poor signal quality cable ISP connection.
                        Maybe we can share remedies and results

                        Some of the steps to remedy the situation I have taken are extreme for the time being:

                        Removed as many FQDNs from my firewall rules aliases tables as possible and used specific IP #'s instead
                        Disabled CRON automatic updates in pfblockerng (with 2 TLD Blacklist entries)
                        Disabled Gateway Pinger
                        Disabled Gateway monitoring "Action"
                        Disabled default blocks on RFC 1918 on WAN - my ISP uses 192.168.0 to establish DHCP
                        Defined about 7 or 8 public resolvers, including the ISP assigned ones for Unbound to forward Queries to

                        I am not happy about having to do any of this but perhaps all I need to do is disable gateway monitoring action on WAN to prevent all the subsequent issues cause by unbound restarting

                        How did you get into the ARRIS to increase the length of DHCP leases ?
                        My solution was to spoof a fixed IP config in the WAN interface - which seems to work for a while but I have backed that out as a solution

                        Perhaps if we studied the WAN DHCP client Advanced options in pfSense there might be something there of value to us ? I don't know much about what is listed there as of now.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.