• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Bind upgrade producing errors on pfsense 2.5 upgrade

Scheduled Pinned Locked Moved pfSense Packages
112 Posts 16 Posters 30.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C
    Cussy
    last edited by Mar 24, 2021, 7:46 AM

    Has anybody found a solution to this?

    M 1 Reply Last reply Mar 29, 2021, 6:54 AM Reply Quote 0
    • M
      matthijs @Cussy
      last edited by Mar 29, 2021, 6:54 AM

      @cussy

      Yes, migrate to OPNSense

      1 Reply Last reply Reply Quote 0
      • G
        Gertjan @smartis
        last edited by Mar 29, 2021, 7:10 AM

        @smartis said in Bind upgrade producing errors on pfsense 2.5 upgrade:

        in a shell after install & boot. Annoying as hell.

        Install the Shellcmd pfSense package and enter that command, select it to get executed after boot.

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        1 Reply Last reply Reply Quote 0
        • J
          jacotec @smartis
          last edited by Apr 20, 2021, 7:51 PM

          @smartis said in Bind upgrade producing errors on pfsense 2.5 upgrade:

          /usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/

          I've waited with the upgrade until 2.5.1 and today some users are complaining that some of my sites were not loading for some, not all of them.

          Found my bind (master zone) did not work, restart via GUI was not working. When killing via SSH and restarting the log showed an exit with "general: info: received control channel command 'sync -clean'"

          After one hour of searching and shouting I found this thread and the command above solved it. What lacks is my understanding why.

          What is this "magic command" doing?
          Do I need to apply it after every restart of bind?
          Is there any config thing which causes the main issue (and that this command is needed) - or how can I solve this permanently. I don't want to nervously check if my bind is still running every day ... never had any issues in the last years with any version before 2.5.x 😕

          G 1 Reply Last reply Apr 21, 2021, 6:09 AM Reply Quote 0
          • G
            Gertjan @jacotec
            last edited by Apr 21, 2021, 6:09 AM

            @jacotec said in Bind upgrade producing errors on pfsense 2.5 upgrade:

            What is this "magic command" doing?

            This one :

            /usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/
            

            The command : /usr/local/sbin/named - you should know that the executables of the package known under the name 'bind' are called 'named', which comes probably from 'name daemon'.

            The option -4 indicates : use IPv4 only.

            "-c /usr/local/sbin/named" specifies the config file.

            "-u bind" specifies the user identity under which named runs.

            "-t /cf/named/" a temporary work directory - probably where all the zone info is stored.

            @jacotec said in Bind upgrade producing errors on pfsense 2.5 upgrade:

            Do I need to apply it after every restart of bind?

            That's why I proposed :

            @gertjan said in Bind upgrade producing errors on pfsense 2.5 upgrade:

            Install the Shellcmd pfSense package and enter that command, select it to get executed after boot.

            @jacotec said in Bind upgrade producing errors on pfsense 2.5 upgrade:

            Is there any config thing which causes the main issue (and that this command is needed) - or how can I solve this permanently. I don't want to nervously check if my bind is still running every day ... never had any issues in the last years with any version before 2.5.x

            Yes.
            For pfSense, processes that should start up at system boot should be listed here :
            /usr/local/etc/rc.d
            There should be an executable (script) file named or named.sh
            It should contain the correct instructions/commands.
            When bind (named) starts, it should leave log messages. Not much when everything is ok, more if there are errors - if it shuts down, it should log the reason.

            If it doesn't 'log' messages, crack up the verbosity.

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            J 1 Reply Last reply Apr 27, 2021, 8:50 AM Reply Quote 0
            • J
              jacotec @Gertjan
              last edited by jacotec Apr 27, 2021, 9:28 AM Apr 27, 2021, 8:50 AM

              @gertjan
              I need to come back to this. Without restarting my pfsense users again had issues and again I found that my pfsense was again not answering any DNS requests.

              Giving the above start command in shell resolved the issue.

              Strangely pfSense showed the service running in the GUI and also the Service Watchdog did not detect that bind was not running.

              I really need to fix the root cause of this happening.

              The "named.sh" script in /usr/local/etc/rc.d uses the exactly same start command you've posted, the stop command is completely different:

              #!/bin/sh
              # This file was automatically generated
              # by the pfSense service handler.
              
              rc_start() {
              			if [ -z "`/bin/ps auxw | /usr/bin/grep "[n]amed " | /usr/bin/awk '{print $2}'`" ]; then
              			/usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/
              		fi
              }
              
              rc_stop() {
              			/usr/local/sbin/rndc -q -c "/usr/local/etc/rndc.conf" sync -clean 2>/dev/null
              		/usr/local/sbin/rndc -q -c "/usr/local/etc/rndc.conf" stop -clean 2>/dev/null
              		sleep 5
              		/usr/bin/killall -TERM named 2>/dev/null
              		sleep 2
              }
              
              case $1 in
              	start)
              		rc_start
              		;;
              	stop)
              		rc_stop
              		;;
              	restart)
              		rc_stop
              		rc_start
              		;;
              esac
              

              I'm fully puzzled, but this issue is mission critical. Do you have any idea what's going wrong here?

              When I hit "Restart" for named in "Status" --> "Services" bind (named) is stopped according to the log, but never started again:

              Apr 27 10:51:41	named	20303	general: notice: exiting
              Apr 27 10:51:41	named	20303	general: notice: stopping command channel on 127.0.0.1#8953
              Apr 27 10:51:41	named	20303	general: info: shutting down: flushing changes
              Apr 27 10:51:41	named	20303	network: info: no longer listening on 80.152.208.158#53
              Apr 27 10:51:41	named	20303	network: info: no longer listening on 10.0.0.5#53
              Apr 27 10:51:41	named	20303	general: info: received control channel command 'stop -clean'
              Apr 27 10:51:41	named	20303	general: info: dumping all zones, removing journal files: success
              Apr 27 10:51:41	named	20303	general: info: received control channel command 'sync -clean'
              

              "Status" --> "Services" still shows it running and not stopped.

              Exectuting "named.sh" shows no error messages, but bind still does not start properly:

              [2.5.1-RELEASE][root@router.mydomain.de]/usr/local/etc/rc.d: ./named.sh
              

              I don't fully understand the condition in the start script could prevent named from being started:

              if [ -z "`/bin/ps auxw | /usr/bin/grep "[n]amed " | /usr/bin/awk '{print $2}'`" ]; then
              

              Update:
              With "ps auxw | grep named" I found a second, older "named" thread running which did not react to a normal "kill" command. I've killed it with "kill -9", now the GUI showed named correctly stopped. I've started bind via the GUI and for now it's running.
              I'll watch if it keeps working ...

              G 1 Reply Last reply Apr 27, 2021, 10:22 AM Reply Quote 0
              • G
                Gertjan @jacotec
                last edited by Apr 27, 2021, 10:22 AM

                @jacotec said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                Exectuting "named.sh" shows no error messages, but bind still does not start properly:
                [2.5.1-RELEASE][root@router.mydomain.de]/usr/local/etc/rc.d: ./named.sh

                The scripts tells you that :

                @jacotec said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                case $1 in

                This $1 is the first paramter on the command line.
                I propose you use stop or start or restart like :

                /usr/local/etc/rc.d: ./named.sh restart
                

                😊

                No "help me" PM's please. Use the forum, the community will thank you.
                Edit : and where are the logs ??

                1 Reply Last reply Reply Quote 0
                • J
                  jacotec
                  last edited by May 20, 2021, 3:37 PM

                  Just as a followup:

                  As I've read somewhere in forums that bind 9.16.12 is supposed to have a memory leak (which might have caused my crashes) I've manually updated bind to the current 9.16.15 where this was fixed:

                  pkg add -f https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All/bind916-9.16.15.txz
                  

                  I've never had issues since this update.

                  9.16.15 is not available via package manager yet (no idea how long this takes before the package is updated in the pfsense package manager).

                  1 Reply Last reply Reply Quote 1
                  • D
                    de0xyrib0se @freebsd-man
                    last edited by Aug 25, 2021, 5:24 PM

                    @freebsd-man said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                    After deleting the manual installed bind and lmdb packages und the old pfSense -pkg-bind package via shell, I installed the updated Package pfSense-pkg-bind-9.16_10 via GUI.

                    While installing I used "tail -f /var/log/resolver.log" to inspect the startup of the new bind.

                    I got rndc timeout messages from install log in GUI and errors from bind-startup in tail-output.
                    After GUI timouts the install finished successful.

                    After deleting corrupted journal files with "rm /cf/named/etc/namedb/*jnl" I finally was able to start bind via GUI.
                    Now it is up and running.

                    Just upgraded to 2.5.2. Nothing I did (restore config, reinstall package) would start bind, it simply was hung in lala-land. Deleting the journal files got it back to life immediately.

                    1 Reply Last reply Reply Quote 0
                    • M
                      matthijs
                      last edited by matthijs Sep 23, 2021, 7:20 AM Sep 23, 2021, 7:16 AM

                      Hi, This bind issue is still not fixed
                      When I reinstall the bind package it takes very a long time (5 minutes or so) while I see the following in the installation log multiple times :

                      xecuting custom_php_resync_config_command()...rndc: connect failed: 127.0.0.1#8953: timed out
                      rndc: connect failed: 127.0.0.1#8953: timed out
                      rndc: connect failed: 127.0.0.1#8953: timed out
                      rndc: connect failed: 127.0.0.1#8953: timed out
                      rndc: connect failed: 127.0.0.1#8953: timed out

                      Maybe the issue can be reproduced by using a not default rndc port (8953 in my case)
                      The same happens when I reboot PFSense, it takes a long time before bind service is started because of the same thing (rndc not available on 127.0.0.1:8953) After 5 minutes or so bind finally starts and everything is working fine.

                      D 1 Reply Last reply Sep 23, 2021, 9:41 AM Reply Quote 0
                      • D
                        de0xyrib0se @matthijs
                        last edited by Sep 23, 2021, 9:41 AM

                        @matthijs did "rm /cf/named/etc/namedb/*jnl" work for you?

                        M 1 Reply Last reply Sep 23, 2021, 9:49 AM Reply Quote 0
                        • M
                          matthijs @de0xyrib0se
                          last edited by Sep 23, 2021, 9:49 AM

                          @de0xyrib0se
                          No that did not work for me.

                          I do not have problems with the working of BIND, I can start/stop BIND and BIND is running fine. The problem I still experience is a very slow start after reboot (5 minutes) and a very slow package reinstall (also 5 minutes) This is because of the "rndc: connect failed: 127.0.0.1#8953: timed out" X 5 times. After the fifth timeout BIND starts (after 5 minutes or so) succesfully. This is an issue for me because packages like PFBlockerNG services, VMWare Guest services start only after BIND succesfully started. So it takes a long time for all the services being up & running after a restart/reboot

                          A 1 Reply Last reply Sep 24, 2021, 5:06 PM Reply Quote 0
                          • A
                            aligator638 @matthijs
                            last edited by Sep 24, 2021, 5:06 PM

                            @matthijs I have the same issue since I am also running the resolver, even worse I have many zones in bind and it takes up to 40mn to have all my services up.

                            1 Reply Last reply Reply Quote 0
                            • M
                              matthijs
                              last edited by Oct 6, 2021, 11:41 AM

                              Hi NetGate, is this going to be fixed anytime in the near future ?
                              We have this issue since the 2.5.0 release

                              G 1 Reply Last reply Oct 6, 2021, 12:08 PM Reply Quote 0
                              • G
                                Gertjan @matthijs
                                last edited by Gertjan Oct 6, 2021, 12:12 PM Oct 6, 2021, 12:08 PM

                                @matthijs

                                I presume 'bind' is a package with a non-Netgate-member maintainer. Like pfBlockerNG Suricata, postfix etc etc.
                                As such, it's done by people like 'you' and 'me' : pfSense users.

                                I couldn't find who is maintaining it right now (but didn't really looked more then 30 seconds neither ;) )

                                Find him send send him a PM ?

                                edit : check also the redmine tickets : there are 10 tickets open for BIND.

                                No "help me" PM's please. Use the forum, the community will thank you.
                                Edit : and where are the logs ??

                                M 1 Reply Last reply Oct 6, 2021, 12:43 PM Reply Quote 0
                                • M
                                  matthijs @Gertjan
                                  last edited by Oct 6, 2021, 12:43 PM

                                  @gertjan Ok thanks for the info, I did not know that :-)

                                  D 1 Reply Last reply Oct 6, 2021, 12:56 PM Reply Quote 0
                                  • D
                                    de0xyrib0se @matthijs
                                    last edited by Oct 6, 2021, 12:56 PM

                                    @matthijs was bind turned off when you tried to remove the journal files?

                                    M 1 Reply Last reply Oct 7, 2021, 9:04 AM Reply Quote 0
                                    • M
                                      matthijs @de0xyrib0se
                                      last edited by matthijs Oct 7, 2021, 9:05 AM Oct 7, 2021, 9:04 AM

                                      @de0xyrib0se
                                      No, first thing I did was raise my SOA serial number for my (master) zones (with a number higher than in the last .jnl zone update) I use the date serial format yyyymmddnn)
                                      after that I logged in the PFSense host with ssh, went to /cf/named/etc/namedb/master/mymastername/
                                      rm *.jnl
                                      and then restarted bind
                                      I think it is not related with my issue

                                      My problem with bind (I think) is during statup/boot and also with install/reinstall package it is trying to connect to rndc 127.0.0.1#8953 for some reason, but it is not running at that very moment, resulting in the rndc: connect failed: 127.0.0.1#8953: timed out message (and it is trying 5 times or so taking a long time)

                                      D 1 Reply Last reply Oct 8, 2021, 9:02 AM Reply Quote 0
                                      • G
                                        Gertjan
                                        last edited by Oct 8, 2021, 7:16 AM

                                        This :

                                        @matthijs said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                        rm *.jnl
                                        and then restarted bind

                                        These jnl are database-lookalike files, binary format en opened by bind permanently.
                                        You can't 'delete' them while bind9 has them open for writing.

                                        is a major no go.

                                        If the rm and restart had to be done (I doubt) I would do it like this :
                                        ( old fashioned debain service handling )

                                        service bind9 stop
                                        

                                        Now I edit zone files, config files, whatever.

                                        When done, I check my config and zone files :

                                        named-checkconf -z
                                        

                                        When no errors and all looke dandy :

                                        service bind9 start
                                        

                                        Btw : when I need to update a zone, for example : I want to change the SOA :

                                        oot@ns311465:~# rndc freeze test-domaine.fr
                                        root@ns311465:~# nano /etc/bind/zones/db.test-domaine.fr
                                        root@ns311465:~# rndc reload test-domaine.fr
                                        zone reload queued
                                        root@ns311465:~# rndc thaw  test-domaine.fr
                                        A zone reload and thaw was started.
                                        Check the logs to see the result.
                                        

                                        No need to restart bind, no journal file issues.

                                        Btw : journal files exists if the zone files are modified by other means as the admin.
                                        For example : when the zone contains info that is update using RFC 2136.
                                        Or when the zone is signed for DNSSEC.
                                        Simple zones do have dot jnl and dot jbk files.

                                        Btw : I'm using the somewhat older

                                        BIND 9.9.5-9+deb8u19-Debian (Extended Support Version)
                                        

                                        No "help me" PM's please. Use the forum, the community will thank you.
                                        Edit : and where are the logs ??

                                        1 Reply Last reply Reply Quote 0
                                        • D
                                          de0xyrib0se @matthijs
                                          last edited by Oct 8, 2021, 9:02 AM

                                          @matthijs said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                          @de0xyrib0se
                                          No, first thing I did was raise my SOA serial number for my (master) zones (with a number higher than in the last .jnl zone update) I use the date serial format yyyymmddnn)
                                          after that I logged in the PFSense host with ssh, went to /cf/named/etc/namedb/master/mymastername/
                                          rm *.jnl
                                          and then restarted bind
                                          I think it is not related with my issue

                                          My problem with bind (I think) is during statup/boot and also with install/reinstall package it is trying to connect to rndc 127.0.0.1#8953 for some reason, but it is not running at that very moment, resulting in the rndc: connect failed: 127.0.0.1#8953: timed out message (and it is trying 5 times or so taking a long time)

                                          Shut down bind (command is listed above) and then do the rm, you cannot remove the files when it has a read lock on them. Restart bind afterwards and it will rebuild the journal files automatically.

                                          This is what I did and it worked like a charm.

                                          M 1 Reply Last reply Oct 11, 2021, 9:38 AM Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                            [[user:consent.lead]]
                                            [[user:consent.not_received]]