Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0

    Scheduled Pinned Locked Moved DHCP and DNS
    17 Posts 6 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      f.meunier @DaddyGo
      last edited by

      @daddygo
      Here is what I have in system log (general) filtered on "unbound"
      (restarted the service at 10:11:40)

      2022-03-12 09:28:31.886087+00:00 	php-fpm 	10571 	/services_unbound.php: Configuration Change: admin@10.21.11.10 (Local Database Fallback): DNS Resolver configured.
      2022-03-12 08:58:51.561997+00:00 	php 	337 	rc.bootup: sync unbound done.
      2022-03-12 08:58:50.780246+00:00 	php 	337 	rc.bootup: Unbound /var/unbound/root.key file is corrupt, removing and recreating.
      Mar 12 08:56:59 	php-fpm 	367 	/rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1647075419] unbound[82383:0] error: can't bind socket: Can't assign requested address for 10.21.1.254 port 53 [1647075419] unbound[82383:0] fatal error: could not open ports'
      Mar 12 08:56:22 	php-fpm 	367 	/rc.newwanip: Unbound /var/unbound/root.key file is corrupt, removing and recreating.
      Mar 12 08:56:08 	php-fpm 	367 	/rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1647075368] unbound[49925:0] error: can't bind socket: Can't assign requested address for 10.21.1.254 port 53 [1647075368] unbound[49925:0] fatal error: could not open ports'
      Mar 12 08:56:03 	php-fpm 	9076 	/rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1647075363] unbound[72574:0] error: can't bind socket: Can't assign requested address for 10.21.1.254 port 53 [1647075363] unbound[72574:0] fatal error: could not open ports'
      Mar 12 08:55:31 	php-fpm 	367 	/rc.newwanip: Unbound /var/unbound/root.key file is corrupt, removing and recreating.
      Mar 12 08:55:27 	php-fpm 	9076 	/rc.newwanip: Unbound /var/unbound/root.key file is corrupt, removing and recreating.
      Mar 12 08:50:29 	php-fpm 	9076 	/rc.linkup: Unbound /var/unbound/root.key file is corrupt, removing and recreating.
      Mar 12 08:49:49 	php-fpm 	367 	/rc.linkup: Unbound /var/unbound/root.key file is corrupt, removing and recreating.
      Mar 12 08:48:45 	php 	428 	rc.bootup: sync unbound done.
      Mar 12 08:47:51 	php 	428 	rc.bootup: Unbound start waiting on dhcp6c.
      Mar 12 08:47:50 	php 	428 	rc.bootup: Unbound start waiting on dhcp6c.
      Mar 12 08:47:49 	php 	428 	rc.bootup: Unbound start waiting on dhcp6c.
      Mar 12 08:47:48 	php 	428 	rc.bootup: Unbound start waiting on dhcp6c.
      Mar 12 08:47:47 	php 	428 	rc.bootup: Unbound start waiting on dhcp6c.
      Mar 12 08:47:46 	php 	428 	rc.bootup: Unbound start waiting on dhcp6c.
      Mar 12 08:47:44 	php 	428 	rc.bootup: Unbound start waiting on dhcp6c.
      Mar 12 08:47:43 	php 	428 	rc.bootup: Unbound start waiting on dhcp6c.
      Mar 12 08:47:42 	php 	428 	rc.bootup: Unbound start waiting on dhcp6c.
      Mar 12 08:47:41 	php 	428 	rc.bootup: Unbound start waiting on dhcp6c.
      

      and DNS Resolver log :

      2022-03-12 10:11:41.165877+01:00 	unbound 	89573 	[89573:0] info: generate keytag query _ta-4f66. NULL IN
      2022-03-12 10:11:40.290016+01:00 	unbound 	89573 	[89573:0] info: start of service (unbound 1.13.2).
      2022-03-12 10:11:40.276727+01:00 	unbound 	89573 	[89573:0] notice: init module 1: iterator
      2022-03-12 10:11:40.276345+01:00 	unbound 	89573 	[89573:0] notice: init module 0: validator
      2022-03-12 09:58:58.856897+01:00 	unbound 	58559 	[58559:2] info: generate keytag query _ta-4f66. NULL IN
      2022-03-12 09:58:51.577322+01:00 	unbound 	58559 	[58559:0] info: start of service (unbound 1.13.2).
      2022-03-12 09:58:51.562657+01:00 	unbound 	58559 	[58559:0] notice: init module 1: iterator
      2022-03-12 09:58:51.562307+01:00 	unbound 	58559 	[58559:0] notice: init module 0: validator
      Mar 12 08:55:25 	unbound 	55268 	[55268:0] info: 64.000000 128.000000 153 
      

      I switched to BIND package, to try.

      (mostly ZOTAC CI or CA nano barebones)

      1 Reply Last reply Reply Quote 0
      • A
        ahsunh @jabacrack
        last edited by

        @jabacrack please install service watch guard on pfsense package and add dns resolver service in watchguard .

        Thanks

        1 Reply Last reply Reply Quote 0
        • J
          jabacrack @DaddyGo
          last edited by

          @daddygo
          Some new information:

          1. After I delete pfBlockerNG dns resolver fail on next day, I restart it
          2. It do not fail on next day and I think "I guess I had to restart it after removing pfBlocker". But it fail on third day.
          3. I restart router and it work one week, until today.
            I don't find anything interesting in logs.

          Latest DNS resolver entries. Looks like it restart but cannot start correctly, but maybe I'm wrong.

          Mar 20 13:43:18 unbound 29389 [29389:0] info: [25%]=0.125611 median[50%]=0.32768 [75%]=0.643216
          Mar 20 13:43:18 unbound 29389 [29389:0] info: lower(secs) upper(secs) recursions
          Mar 20 13:43:18 unbound 29389 [29389:0] info: 0.008192 0.016384 1
          Mar 20 13:43:18 unbound 29389 [29389:0] info: 0.032768 0.065536 1
          Mar 20 13:43:18 unbound 29389 [29389:0] info: 0.065536 0.131072 6
          Mar 20 13:43:18 unbound 29389 [29389:0] info: 0.131072 0.262144 5
          Mar 20 13:43:18 unbound 29389 [29389:0] info: 0.262144 0.524288 8
          Mar 20 13:43:18 unbound 29389 [29389:0] info: 0.524288 1.000000 6
          Mar 20 13:43:18 unbound 29389 [29389:0] info: 1.000000 2.000000 2
          Mar 20 13:43:18 unbound 29389 [29389:0] info: 2.000000 4.000000 1
          Mar 20 13:43:18 unbound 29389 [29389:0] notice: Restart of unbound 1.13.2.
          Mar 20 13:43:18 unbound 29389 [29389:0] notice: init module 0: validator
          Mar 20 13:43:18 unbound 29389 [29389:0] notice: init module 1: iterator
          Mar 20 13:43:18 unbound 29389 [29389:0] info: start of service (unbound 1.13.2).
          Mar 20 13:43:20 unbound 29389 [29389:3] info: generate keytag query _ta-4f66. NULL IN
          Mar 20 13:43:20 unbound 29389 [29389:2] info: generate keytag query _ta-4f66. NULL IN

          DHCP Log when DNS failed

          Mar 20 13:51:49 dhcpd 36656 DHCPREQUEST for 192.168.0.139 from da:2e:07:e3:c7:6d (M2004J19C-Redmi9) via igb1
          Mar 20 13:51:49 dhcpd 36656 Wrote 0 class decls to leases file.
          Mar 20 13:51:49 dhcpd 36656 Wrote 0 deleted host decls to leases file.
          Mar 20 13:51:49 dhcpd 36656 Wrote 0 new dynamic host decls to leases file.
          Mar 20 13:51:49 dhcpd 36656 Wrote 37 leases to leases file.
          Mar 20 13:51:49 dhcpd 36656 DHCPACK on 192.168.0.139 to da:2e:07:e3:c7:6d (M2004J19C-Redmi9) via igb1
          Mar 20 13:51:49 dhcpleases 78677 Sending HUP signal to dns daemon(29389)
          Mar 20 13:51:49 dhcpleases 78677 Could not deliver signal HUP to process 29389: No such process.

          @ahsunh
          thank you, that will help to avoid this problem.
          But does not cancel the fact that pfsense or unbound has error, that was not in the previous version of pfsense. And I think better to fix it than avoid it.

          T GertjanG 2 Replies Last reply Reply Quote 0
          • T
            thiasaef @jabacrack
            last edited by

            @jabacrack said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

            And I think better to fix it than avoid it.

            Exactly, but unfortunately there seem to be quite a few people here who don't think that way.

            1 Reply Last reply Reply Quote 0
            • GertjanG
              Gertjan @jabacrack
              last edited by Gertjan

              This :

              @jabacrack said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

              Mar 20 13:43:18 unbound 29389 [29389:0] notice: Restart of unbound 1.13.2.

              is the issue.

              unbound was told to restart.
              It was running before, using process ID n° 29389.
              It stops, and at the very end, it restarts itself. New processes will have a different process ID, this is valid for all Microsoft, macOS, Linux and FreeBSD systems.
              Your logs keep showing process ID 29389.
              This tells me that you should check if you haven't a second (or more ?) zombie unboud process running on your system.

              Open a console or SSH shell, and when you see the menu, use option 8.
              Then :
              [2.6.0-RELEASE][admin@pfsense.local.net]/root: ps ax | grep unbound

              31176  -  Ss       8:58.04 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
              32713  -  S        0:03.13 /usr/local/sbin/lighttpd_pfb -f /var/unbound/pfb_dnsbl_lighty.conf
              97333  0  S+       0:00.00 grep unbound
              

              Line 2 and 3 are not relevant, but line 1 show that I have one unbound process.
              Note that line one shows the process ID - 32713 in mt case.

              Check unbound the process ID file :

              cat /var/run/unbound.pid
              

              It should show the same ID : 32713.

              If you find more then one unbound instance, kill them all.
              Use the kill command followed with process ID number, like

              kill 31176
              kill xxxxxx
              etc
              

              and wipe the process ID file, if it isn't already gone :

              rm /var/run/unbound.pid
              

              Now you can start unbound in the GUI.

              @jabacrack said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

              Mar 20 13:51:49 dhcpd 36656 DHCPACK on 192.168.0.139 to da:2e:07:e3:c7:6d (M2004J19C-Redmi9) via igb1
              Mar 20 13:51:49 dhcpleases 78677 Sending HUP signal to dns daemon(29389)
              Mar 20 13:51:49 dhcpleases 78677 Could not deliver signal HUP to process 29389: No such process.

              This shows that the process dhcpleases wants to to send a message "HUP' ( it's a restart message) to the process 28389.
              dhcpleases wants to restart unbound and it knows what process ID it has : it reads /var/run/unbound.pid to find out.
              Guess what : process "29389" might be what's left of unbound, but it isn't responding any more ( == zombie).

              For every DHCPREQUEST / DHCPRENEW / DHCPACK event, dhcpleases will restart unbound. You have a lot of devices ? unbound will get restarted a lot ! Thus the change a zombie gets created grows a lot.
              This is a very known situation.

              To stop that issue right away :Goto the resolver setting page and un check this option :

              843625f9-a27c-450a-811b-3fd4e64f5581-image.png

              Save and apply.

              if you need some of your devices to be known with their host name and IP addresses, make a couple of static MAC DHCP leases on the Services > DHCP Server > LAN page.

              Btw : adding pfBlockerng-devel will restart unbound every day, only if feeds got updated.
              When pfBlockerng-devel is sued , the overall stop start time of unbound gets longer, from a second or two (default pfSense system with no packages) to several tens of seconds or even minutes. For those who use a lot of feeds : it can takes minutes.
              Side effect : during the restart time, DNS doesn't work
              Now, throw in some devices on your LAN that trigger DHCP events a bit more then 'frequent' and your set for the situation that makes you think that "DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0".

              I'm using pfBlockerng-devel (latest version) with some feeds (about 5 ?).
              No DHCP lease related unbound restarts as I de activated "DHCP Registration".

              Still, my unbound restarts often. See here for yourself.

              That's me, fiddling with the system ;)

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              J 2 Replies Last reply Reply Quote 1
              • J
                jabacrack @Gertjan
                last edited by

                @gertjan
                thank you for your help. Looks like possible reason. I check it now and I haven't any zombie, I'll wait for next fail to check your theory.

                @gertjan said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                To stop that issue right away :Goto the resolver setting page and un check this option :

                How I previously say, solutions like this help to avoid problem, but do not fix it. Yes, if unbound will restart not so frequently it will fail with lower probability. But if it fail one time in half of year versus one time in week does that mean that problem fixed? I think - no. It still can fail. And on 2.5.2 it do not fail with same amount srestarts at day.

                @gertjan said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                your set for the situation that makes you think that "DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0".

                I think this is irrelevant to me:

                1. I delete pfBlockerng and unbound fail anyway
                2. When I visit DNS resolver status page it say that it is not running. And it was in this state ~ 3 hours before I start it again last two times.
                1 Reply Last reply Reply Quote 0
                • J
                  jabacrack @Gertjan
                  last edited by

                  @gertjan
                  Today morning it fail again.

                  No zombies while unbound stoped:

                  ps ax | grep unbound

                  44241  -  S        0:00.00 sh -c ps ax | grep unbound 2>&1
                  44464  -  S        0:00.00 grep unbound
                  64935  -  Is       0:01.14 /usr/local/sbin/dhcpleases -l /var/dhcpd/var/db/dhcpd.leases -d lan -p /var/run/unbound.pid -u /var/unbound/dhcpleases_entries.conf -h /etc/hosts
                  
                  

                  cat /var/run/unbound.pid

                  91903
                  

                  Logs:

                  Mar 22 09:28:56 	unbound 	91903 	[91903:0] info: 4.000000 8.000000 36
                  Mar 22 09:28:56 	unbound 	91903 	[91903:0] info: 8.000000 16.000000 14
                  Mar 22 09:28:56 	unbound 	91903 	[91903:0] notice: Restart of unbound 1.13.2.
                  Mar 22 09:28:56 	unbound 	91903 	[91903:0] notice: init module 0: validator
                  Mar 22 09:28:56 	unbound 	91903 	[91903:0] notice: init module 1: iterator
                  Mar 22 09:28:56 	unbound 	91903 	[91903:0] info: start of service (unbound 1.13.2).
                  Mar 22 09:29:06 	unbound 	91903 	[91903:3] info: generate keytag query _ta-4f66. NULL IN 
                  
                  Mar 22 09:49:53 	dhcpleases 	64935 	Sending HUP signal to dns daemon(91903)
                  Mar 22 09:49:53 	dhcpleases 	64935 	Could not deliver signal HUP to process 91903: No such process.
                  Mar 22 09:49:53 	dhcpleases 	64935 	Sending HUP signal to dns daemon(91903)
                  Mar 22 09:49:53 	dhcpleases 	64935 	Could not deliver signal HUP to process 91903: No such process
                  

                  Looks like this is something else. Or I'm wrong?

                  GertjanG 1 Reply Last reply Reply Quote 0
                  • GertjanG
                    Gertjan @jabacrack
                    last edited by Gertjan

                    @jabacrack said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                    Looks like this is something else. Or I'm wrong?

                    Your "ps ax | grep unbound" lists some stuff, like the command itself (normal) and the dhcpleases process (the one that is paid to shoot unbound in the head as much as possible).
                    Look again.
                    I was asking you to look for multiple instances of unbound.
                    You found the contrary.
                    There was - at that moment, or a couple of seconds later, a unbound pid file. This file is created by the process (unbound) when it starts - or to be more precise : when it finished starting).
                    Or, the ps command did not list a unbound process.

                    That's what 'dhcpleases' is telling you : unbound is so slow to start up, that it signals (logs) you that the known unbound pid file "/var/run/unbound.pid" contains a process number that is not in the process list any more.
                    Or, there are so many DHCP events on you LAN that 'dhcpleases' is chain gunning the unbound process. It tries to sand more HUPs then that unbound can handle.
                    The result is what you experience : DNS is more down then up.

                    Btw : you could try to make the default lease time of the DHCP lease longer. This way, less DHCP renegotiations will happen. But this will out only for cabled connections. Wifi connections come and go, and on every wifi reconnect a DHCP event will start.
                    When your wifi device is on the edge of the zone wifi zone covered by the AP, , you trigger a lot of DHCP events. You wind up 'killing' unbound.

                    Check your logs, how many "Sending HUP signal to dns daemon" from the "dhcpleases" did you find ?

                    Check the unbound (resolver) log file, and look up how long it took for unbound to start. When it's done, the process should be present (the ps command), a pid file should exists, and very important : the number in the file should be the process id of the unbound process found with 'ps'.
                    Example :

                    Feb 16 15:07:08 	unbound 	7464 	[7464:0] info: service stopped (unbound 1.13.2).
                    

                    2 seconds later :

                    Feb 16 15:07:10 	unbound 	59164 	[59164:0] info: start of service (unbound 1.13.2).
                    

                    ( I do have pfBlockerng-devel using the Python mode - a couple of DNSBL feeds)

                    Final test :

                    dig @127.0.0.1 a.b.c +short
                    

                    where a.b.c is a host name that shouldn't be in the DNS cache file (a file you - you and all LAN devices - didn't look up, just find some random one, as millions exist). This 'dig' command forces unbound to do resolve the a.b.c. hostname. If you get an IP back, you know your DNS is ok.

                    edit :
                    Or the very informative :

                    dig @127.0.0.1 a.b.c +trace
                    

                    @jabacrack said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                    How I previously say, solutions like this help to avoid problem, but do not fix it

                    Sure thing.
                    Still, a good work around (you can do it a coup;e of seconds) is far better then waiting for a solution ( going for a couple of years now ).
                    The work around can even be made better, close to a perfect solution, when you add all your known LAN devices as a DHCP MAC leases list.

                    unbound is a process that starts during system start.
                    Like a nginx or apache2 web server, or the big brother 'bind' or postfix, world's most used mail server, these are actually never restarted.
                    Only during setup or testing the admin restarts them, for testing purposes.
                    unbound is the same. It's not meant to be restarted like this.

                    No "help me" PM's please. Use the forum, the community will thank you.
                    Edit : and where are the logs ??

                    J 1 Reply Last reply Reply Quote 0
                    • J
                      jabacrack @Gertjan
                      last edited by jabacrack

                      @gertjan said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                      I was asking you to look for multiple instances of unbound.
                      If it fail, there is no any instance of unboud. If it working - only one instance.

                      @gertjan said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                      Check your logs, how many "Sending HUP signal to dns daemon" from the "dhcpleases" did you find ?

                      Today, from morning to evening, when it fail again - 70.

                      @gertjan said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                      Check the unbound (resolver) log file, and look up how long it took for unbound to start.

                      second or less.

                      @gertjan said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                      Final test :

                      Sorry, I don't understand how it help me. When it fail, Status \ DNS Resolver page say that it is stopped, like this
                      9efbee69-ad13-474b-93b7-cefe32798b1d-image.png
                      And how I check previously there no any zombie process in this case. So 100% dns resolving do not work, I see when try to visit sites. And it do not start working until I manually run it.

                      @gertjan said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                      Sure thing.

                      I still hope that with your and other guys help I can localize this problem and add corresponded issue to bug tracker. And, maybe, in future it will be fixed. If it impossible I use workarounds.

                      GertjanG 1 Reply Last reply Reply Quote 0
                      • GertjanG
                        Gertjan @jabacrack
                        last edited by

                        @jabacrack said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                        Sorry, I don't understand how it help me.

                        When you see :

                        ff103c6d-7e5a-40c2-a65e-537ccda5be0b-image.png

                        you know DNS, the resolver, isn't answering.
                        But that's just a GUI message. The GUI is very nice when everything goes well. When you have to look for issues, forget about the GUI.
                        Unbound should listen on most if not all interfaces, and the most important one is 127.0.0.1

                        That's why I propose :

                        dig @127.0.0.1 a.b.c +short
                        

                        Run this in a console / SSH and leave it there for a while :

                        tail -f  /var/log/resolver.log | grep 'start\|stop'
                        

                        Only stop and start events will be shown.
                        Every logged 'stop' event should be followed by a start.

                        I'll post here a small shell script that compares the content of the /var/run/unbound.pid file with the process ID of the running unbound instance. They should be the same.
                        And if there is no pid file, then unbound isn't running. This can happens, but just for a short while.
                        I'll work on that.

                        @jabacrack said in DNS resolver Stop Working after upgrade from 2.5.2 to 2.6.0:

                        add corresponded issue to bug tracker. And, maybe, in future it will be fixed.

                        No need.
                        Unbound runs on all pfSense systems just fine. Tens of thousands, maybe hundreds of thousands. If it gets restarted, it restarts = it stops and then starts. Most of us won't never notice this.
                        These is only one thing to do : fin why yours stops doing it's work.
                        Because was stopped and resonating failed ?
                        The running instance freezes ?
                        It could be interface related (yep, they can die, drivers can fail, etc) but normally, it should stay 'up' on 127.0.0.1 as this is not a physical interface.

                        The GUI won't help you here to discover the issue. It's a command line task. And most of it is looking at the resolver log, and other logs like the system log to determine what event provokes your issue.

                        No "help me" PM's please. Use the forum, the community will thank you.
                        Edit : and where are the logs ??

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.