• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Bind upgrade producing errors on pfsense 2.5 upgrade

pfSense Packages
16
112
29.1k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F
    freebsd-man @md0
    last edited by Mar 3, 2021, 10:18 AM

    @md0
    But we can say, named is not crashing with signal 11 any more.
    Your problem seems to be a config problem.
    And yes, bind behaves more and more like a "diva".

    My tips:
    When saving something in GUI and running in timeout, it is possible to interrupt the page reload after 5 or more seconds and enter bind GUI again. So the GUI has generated the config.
    Setting bind logging more verbose will also help to find the errors via "tail -f /var/log/resolver.log" in ssh shell.
    If rndc shows timeouts, in most cases named does not answer.
    So the timeout is caused by config problem. For example IP-port is used by other daemon or zone-loading error or journal file corrupted.
    When inspecting zones I always clear the serial.
    Keep your configs simple and look out for error messages at named start.
    When looking for errors, it is better to stop and start the process. Using rndc is useless if named is not running!
    Keep in mind: Errors will stop "named" process and warnings will not. Most errors will be show inside "resolver" logfiles.

    M 1 Reply Last reply Mar 3, 2021, 10:48 AM Reply Quote 0
    • M
      md0 @freebsd-man
      last edited by Mar 3, 2021, 10:48 AM

      @freebsd-man
      In my case named never crashed with signal 11 - it's definitely a different problem. If it's a config issue, the logs never show anything wrong - during the first (automatic) start attempt nothing gets logged and afterwards everything appears completely normal. Below is typical output:

      bind.txt

      Incrreasing the verboseness does not produce any additional warnings.
      Any idea what else I could do to try to fix this?

      M F 2 Replies Last reply Mar 3, 2021, 11:37 AM Reply Quote 0
      • M
        md0 @md0
        last edited by Mar 3, 2021, 11:37 AM

        @md0 Quick update:

        If I let the default start-up run its course, there are some errors in the log:

        2021-03-03T13:13:40.212844+02:00 router named[10949] general: error: malformed transaction: 5709a41150573356.mkeys.jnl last serial 9 != transaction first serial 8
        2021-03-03T13:13:40.212859+02:00 router named[10949] general: error: managed-keys-zone/[View]: keyfetch_done:dns_journal_write_transaction -> unexpected error
        2021-03-03T13:13:40.212902+02:00 router named[10949] dnssec: error: managed-keys-zone/[View]: error during managed-keys processing (unexpected error): DNSSEC validation may be at risk
        

        If I start the service manually there are no errors. In any case, Bind work correctly after starting up.

        F G 2 Replies Last reply Mar 3, 2021, 12:10 PM Reply Quote 0
        • F
          freebsd-man @md0
          last edited by Mar 3, 2021, 12:05 PM

          @md0
          Do you use the bind as local DNS only inside a LAN?
          Or is it a public (authoritative) or hidden-primary DNS for your officially registered Domains?

          I use it in both cases on different locations.

          In my LAN it holds the internal zones while the "DNS Resolver" does the caching.
          I use Bind on localhost (ports 1053 and 1953) for several master-zones and the reverse-zones.
          I use one ACL with a list of local clients.
          The "DNS Resolver" (unbound) has the Custom Option

          server:
          do-not-query-localhost: no
          

          and some "Domain Overrides" with "Lookup Server IP Address" 127.0.0.1@1053.
          DNS Resolver is listening on localhost and LAN (default ports 53 + 953) an has the outgoing interfaces localhost LAN und WAN.
          If the bind on my pfSense has a problem, I can change the domain overrides with a temporary bind on my fileserver.
          And the DNS Resolver caches the local zones for a while, when troubleshooting bind.
          His zones are a copy of the Bind GUIs "Resulting Zone Config File" outputs.

          And at my carrier-side server-housing location there is a public DNS on a pfSense for several official zones. (authoritative or hidden-primary).

          Now you can find out, what your need is.
          It is listening on a dedicated Interface. I use different ACLs for recursion or notify setups.

          These setups helped me to keep control on my DNS locations and are running for years.
          Public DNS is always dangerous when not monitored.
          Private DNS with dynamic IP location has to be simple.
          Bind as private caching DNS is in my opinion not the best idea.
          Unbound is reliable as caching DNS and is a pfSense builtin service, but not so interesting for holding own zones.
          So my best solution is to use both services side-by-side.

          1 Reply Last reply Reply Quote 0
          • F
            freebsd-man @md0
            last edited by Mar 3, 2021, 12:10 PM

            @md0
            stop your bind.
            Delete corrupted journal files for example with "rm /cf/named/etc/namedb/*jnl".
            Start bind.
            The journal files will be rebuild on bind startup.

            M 1 Reply Last reply Mar 3, 2021, 1:19 PM Reply Quote 0
            • M
              md0 @freebsd-man
              last edited by Mar 3, 2021, 1:19 PM

              @freebsd-man I use Bind as a public server for several domains and Unbound as a forwarder for the local network. Bind binds to a public ip address on port 53 and to 127.0.0.1 on port 8953 (control channel). Unbound uses port 53 on the LAN interface and 853 (default) on the loopback - there should are no conflicts and this setup has been stable for quite some time now.

              Deleteing the jnl files has no efect. On the next reboot Bind goes through the same timeouts (and errors) as before. Starting the named service manually on the other hand generates no errors, even if I do not remove the jnl file.

              One important hint may be that following an automatic start (after the usual timeouts) I end up with two named processes instead of one and, strangely, this does not appear to have any impact on Bind, I guess one of the instances is dormant, but I don't know enough about Bind's internal achitecture to make a guess about what's going on.

              F 1 Reply Last reply Mar 3, 2021, 5:10 PM Reply Quote 0
              • G
                Gertjan @md0
                last edited by Gertjan Mar 3, 2021, 1:27 PM Mar 3, 2021, 1:25 PM

                @md0 said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                blblabla.mkeys.jnl last serial 9 != transaction first serial 8

                That is best as it gets as an explanation that says that a dynamic zone file has been edited/modified by something / someone - but not bind itself.

                The theory :
                If you want - or pfSense wants - to edit (change) a dynamic zone files, the "freezed - edited - reloaded - thawed" instructions should be used.

                The reality :
                If pfSense want to change a zone file, because a 'A' or 'SOA' needed to be changed, it applies this method. If if something breaks, and bind itself continues (also) to modify these files, the sync journal - the file with the dot jnl extension - get corrupted.

                The If not :
                That's the error you see.

                And that's just one of the reasons that I think that the person that thinks that he can manage 'bind' with a GUI should be handled with love, compassion and respect. For the others : Just run. Now. As many have tried. Many have ....

                No "help me" PM's please. Use the forum, the community will thank you.
                Edit : and where are the logs ??

                1 Reply Last reply Reply Quote 0
                • F
                  freebsd-man @md0
                  last edited by Mar 3, 2021, 5:10 PM

                  @md0
                  Your problem is new to me.

                  Try to start from scratch.
                  My quick an dirty way.

                  • Backup pfSense config without encrypting.

                  • Unistall bind package.

                  • Rename the namedb folder inside /cf/ to namedb.old

                  • open /cf/conf/config.xml in "Edit File",
                    remove all bind sections
                    (<bindacls>, <bind>, <bindzone>, <bindviews>, ...)
                    and reinstall bind package.

                  Now you can set the ports, create acls and views and create the zones.

                  M 1 Reply Last reply Mar 4, 2021, 3:45 AM Reply Quote 0
                  • M
                    md0 @freebsd-man
                    last edited by Mar 4, 2021, 3:45 AM

                    @freebsd-man Thank you for your suggestion - I've already tried the exact same steps you mentioned, minus the config file pruning, as I don't want to reacreate all my zones again from scratch. Needless to say, nothing came out of it :(

                    I've now migrated my Bind zones to a debian server behind the firewall - I like the idea of a GUI for zone editing but as it is working right now it's not worth the trouble...

                    1 Reply Last reply Reply Quote 0
                    • S
                      smartis
                      last edited by Mar 7, 2021, 9:45 AM

                      Same issue here on several APU2s, as so many seem to have. Upgrade/Clean install - all the same.

                      What works for me is executing

                      /usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/
                      

                      in a shell after install & boot. Annoying as hell.

                      I've been happy with pfsense in the past but the upgrade to 2.5 has not been a positive user experience

                      M G J 3 Replies Last reply Mar 7, 2021, 11:46 AM Reply Quote 0
                      • M
                        md0 @smartis
                        last edited by Mar 7, 2021, 11:46 AM

                        @smartis For me starting the service manually via the GUI or executing

                        named.sh start
                        

                        also seemed to work.

                        1 Reply Last reply Reply Quote 0
                        • C
                          Cussy
                          last edited by Mar 24, 2021, 7:46 AM

                          Has anybody found a solution to this?

                          M 1 Reply Last reply Mar 29, 2021, 6:54 AM Reply Quote 0
                          • M
                            matthijs @Cussy
                            last edited by Mar 29, 2021, 6:54 AM

                            @cussy

                            Yes, migrate to OPNSense

                            1 Reply Last reply Reply Quote 0
                            • G
                              Gertjan @smartis
                              last edited by Mar 29, 2021, 7:10 AM

                              @smartis said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                              in a shell after install & boot. Annoying as hell.

                              Install the Shellcmd pfSense package and enter that command, select it to get executed after boot.

                              No "help me" PM's please. Use the forum, the community will thank you.
                              Edit : and where are the logs ??

                              1 Reply Last reply Reply Quote 0
                              • J
                                jacotec @smartis
                                last edited by Apr 20, 2021, 7:51 PM

                                @smartis said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                /usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/

                                I've waited with the upgrade until 2.5.1 and today some users are complaining that some of my sites were not loading for some, not all of them.

                                Found my bind (master zone) did not work, restart via GUI was not working. When killing via SSH and restarting the log showed an exit with "general: info: received control channel command 'sync -clean'"

                                After one hour of searching and shouting I found this thread and the command above solved it. What lacks is my understanding why.

                                What is this "magic command" doing?
                                Do I need to apply it after every restart of bind?
                                Is there any config thing which causes the main issue (and that this command is needed) - or how can I solve this permanently. I don't want to nervously check if my bind is still running every day ... never had any issues in the last years with any version before 2.5.x 😕

                                G 1 Reply Last reply Apr 21, 2021, 6:09 AM Reply Quote 0
                                • G
                                  Gertjan @jacotec
                                  last edited by Apr 21, 2021, 6:09 AM

                                  @jacotec said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                  What is this "magic command" doing?

                                  This one :

                                  /usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/
                                  

                                  The command : /usr/local/sbin/named - you should know that the executables of the package known under the name 'bind' are called 'named', which comes probably from 'name daemon'.

                                  The option -4 indicates : use IPv4 only.

                                  "-c /usr/local/sbin/named" specifies the config file.

                                  "-u bind" specifies the user identity under which named runs.

                                  "-t /cf/named/" a temporary work directory - probably where all the zone info is stored.

                                  @jacotec said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                  Do I need to apply it after every restart of bind?

                                  That's why I proposed :

                                  @gertjan said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                  Install the Shellcmd pfSense package and enter that command, select it to get executed after boot.

                                  @jacotec said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                  Is there any config thing which causes the main issue (and that this command is needed) - or how can I solve this permanently. I don't want to nervously check if my bind is still running every day ... never had any issues in the last years with any version before 2.5.x

                                  Yes.
                                  For pfSense, processes that should start up at system boot should be listed here :
                                  /usr/local/etc/rc.d
                                  There should be an executable (script) file named or named.sh
                                  It should contain the correct instructions/commands.
                                  When bind (named) starts, it should leave log messages. Not much when everything is ok, more if there are errors - if it shuts down, it should log the reason.

                                  If it doesn't 'log' messages, crack up the verbosity.

                                  No "help me" PM's please. Use the forum, the community will thank you.
                                  Edit : and where are the logs ??

                                  J 1 Reply Last reply Apr 27, 2021, 8:50 AM Reply Quote 0
                                  • J
                                    jacotec @Gertjan
                                    last edited by jacotec Apr 27, 2021, 9:28 AM Apr 27, 2021, 8:50 AM

                                    @gertjan
                                    I need to come back to this. Without restarting my pfsense users again had issues and again I found that my pfsense was again not answering any DNS requests.

                                    Giving the above start command in shell resolved the issue.

                                    Strangely pfSense showed the service running in the GUI and also the Service Watchdog did not detect that bind was not running.

                                    I really need to fix the root cause of this happening.

                                    The "named.sh" script in /usr/local/etc/rc.d uses the exactly same start command you've posted, the stop command is completely different:

                                    #!/bin/sh
                                    # This file was automatically generated
                                    # by the pfSense service handler.
                                    
                                    rc_start() {
                                    			if [ -z "`/bin/ps auxw | /usr/bin/grep "[n]amed " | /usr/bin/awk '{print $2}'`" ]; then
                                    			/usr/local/sbin/named -4 -c /etc/namedb/named.conf -u bind -t /cf/named/
                                    		fi
                                    }
                                    
                                    rc_stop() {
                                    			/usr/local/sbin/rndc -q -c "/usr/local/etc/rndc.conf" sync -clean 2>/dev/null
                                    		/usr/local/sbin/rndc -q -c "/usr/local/etc/rndc.conf" stop -clean 2>/dev/null
                                    		sleep 5
                                    		/usr/bin/killall -TERM named 2>/dev/null
                                    		sleep 2
                                    }
                                    
                                    case $1 in
                                    	start)
                                    		rc_start
                                    		;;
                                    	stop)
                                    		rc_stop
                                    		;;
                                    	restart)
                                    		rc_stop
                                    		rc_start
                                    		;;
                                    esac
                                    

                                    I'm fully puzzled, but this issue is mission critical. Do you have any idea what's going wrong here?

                                    When I hit "Restart" for named in "Status" --> "Services" bind (named) is stopped according to the log, but never started again:

                                    Apr 27 10:51:41	named	20303	general: notice: exiting
                                    Apr 27 10:51:41	named	20303	general: notice: stopping command channel on 127.0.0.1#8953
                                    Apr 27 10:51:41	named	20303	general: info: shutting down: flushing changes
                                    Apr 27 10:51:41	named	20303	network: info: no longer listening on 80.152.208.158#53
                                    Apr 27 10:51:41	named	20303	network: info: no longer listening on 10.0.0.5#53
                                    Apr 27 10:51:41	named	20303	general: info: received control channel command 'stop -clean'
                                    Apr 27 10:51:41	named	20303	general: info: dumping all zones, removing journal files: success
                                    Apr 27 10:51:41	named	20303	general: info: received control channel command 'sync -clean'
                                    

                                    "Status" --> "Services" still shows it running and not stopped.

                                    Exectuting "named.sh" shows no error messages, but bind still does not start properly:

                                    [2.5.1-RELEASE][root@router.mydomain.de]/usr/local/etc/rc.d: ./named.sh
                                    

                                    I don't fully understand the condition in the start script could prevent named from being started:

                                    if [ -z "`/bin/ps auxw | /usr/bin/grep "[n]amed " | /usr/bin/awk '{print $2}'`" ]; then
                                    

                                    Update:
                                    With "ps auxw | grep named" I found a second, older "named" thread running which did not react to a normal "kill" command. I've killed it with "kill -9", now the GUI showed named correctly stopped. I've started bind via the GUI and for now it's running.
                                    I'll watch if it keeps working ...

                                    G 1 Reply Last reply Apr 27, 2021, 10:22 AM Reply Quote 0
                                    • G
                                      Gertjan @jacotec
                                      last edited by Apr 27, 2021, 10:22 AM

                                      @jacotec said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                      Exectuting "named.sh" shows no error messages, but bind still does not start properly:
                                      [2.5.1-RELEASE][root@router.mydomain.de]/usr/local/etc/rc.d: ./named.sh

                                      The scripts tells you that :

                                      @jacotec said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                      case $1 in

                                      This $1 is the first paramter on the command line.
                                      I propose you use stop or start or restart like :

                                      /usr/local/etc/rc.d: ./named.sh restart
                                      

                                      😊

                                      No "help me" PM's please. Use the forum, the community will thank you.
                                      Edit : and where are the logs ??

                                      1 Reply Last reply Reply Quote 0
                                      • J
                                        jacotec
                                        last edited by May 20, 2021, 3:37 PM

                                        Just as a followup:

                                        As I've read somewhere in forums that bind 9.16.12 is supposed to have a memory leak (which might have caused my crashes) I've manually updated bind to the current 9.16.15 where this was fixed:

                                        pkg add -f https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All/bind916-9.16.15.txz
                                        

                                        I've never had issues since this update.

                                        9.16.15 is not available via package manager yet (no idea how long this takes before the package is updated in the pfsense package manager).

                                        1 Reply Last reply Reply Quote 1
                                        • D
                                          de0xyrib0se @freebsd-man
                                          last edited by Aug 25, 2021, 5:24 PM

                                          @freebsd-man said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                          After deleting the manual installed bind and lmdb packages und the old pfSense -pkg-bind package via shell, I installed the updated Package pfSense-pkg-bind-9.16_10 via GUI.

                                          While installing I used "tail -f /var/log/resolver.log" to inspect the startup of the new bind.

                                          I got rndc timeout messages from install log in GUI and errors from bind-startup in tail-output.
                                          After GUI timouts the install finished successful.

                                          After deleting corrupted journal files with "rm /cf/named/etc/namedb/*jnl" I finally was able to start bind via GUI.
                                          Now it is up and running.

                                          Just upgraded to 2.5.2. Nothing I did (restore config, reinstall package) would start bind, it simply was hung in lala-land. Deleting the journal files got it back to life immediately.

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.