Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Bind upgrade producing errors on pfsense 2.5 upgrade

    pfSense Packages
    16
    112
    29.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • freebsd-manF
      freebsd-man
      last edited by freebsd-man

      Hi!

      Could the usage of an additional ACL cause the problem?

      Debian Discussion: Bug#980786: named: after upgrade to bind9=1:9.16.11-1 named is killed with status=11/SEGV

      N M 2 Replies Last reply Reply Quote 2
      • N
        nordeep @freebsd-man
        last edited by

        @freebsd-man Thanks. It's make sense.

        1 Reply Last reply Reply Quote 0
        • M
          madforic @freebsd-man
          last edited by

          @freebsd-man That is the issue!
          In my case I disabled any named acl and bind started as expected.
          Named acls make sense only for master zones. Slave and forward zones works with named acls without proplems.

          1 Reply Last reply Reply Quote 0
          • M
            matthijs
            last edited by

            Is there maybe a way to workaround for this ACL issue ? I have two ACLs and I really need them (one is an ACL containing secondary name servers who are allowed to do zone transfers, the other is an ACL containing hosts who are allowed to do dynamic updates)

            Or should I wait till this bug is fixed and a new release is provided ?

            M 1 Reply Last reply Reply Quote 0
            • M
              md0 @matthijs
              last edited by

              I'm not sure if I have the same issue, but after upgrading to 2.5.0, named service, along with several other downloaded packages, fail to start upon reboot. A manual start of the named service clears the blockage and all the other services then proceed to start normally.

              No logs whatsoever are saved on the first (automatic) start attempt. Afterwards, no error is shown, I've tried to delete my only named ACL, but the problem persists. Reinstalling the Bind package is also of no use.

              Any hints on how I could debug this further would be appreciated.

              1 Reply Last reply Reply Quote 0
              • M
                matthijs
                last edited by matthijs

                I read that this Bind bug (bug 980786) where freebsd-man is referring to is already fixed.
                Can Netgate update the Bind package to the latest version?

                1 Reply Last reply Reply Quote 1
                • viktor_gV
                  viktor_g Netgate
                  last edited by

                  try to update it directly (on x86):

                  # pkg add -f https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All/lmdb-0.9.28,1.txz
                  # pkg add -f https://pkg.freebsd.org/FreeBSD:12:amd64/latest/All/bind916-9.16.12.txz
                  
                  freebsd-manF M N 3 Replies Last reply Reply Quote 2
                  • freebsd-manF
                    freebsd-man @viktor_g
                    last edited by

                    @viktor_g :
                    With the BIND 9.16.12 package I can use an ACL without issues in a view and zones.

                    The LDAP package is a needed dependency for installing the bind package this way.
                    I think the final updated pfSense package will perhaps also have this dependency or will be a modified package without it.
                    So I will delete both packages before updating bind via pfSense GUI.

                    1 Reply Last reply Reply Quote 0
                    • M
                      md0 @viktor_g
                      last edited by

                      @viktor_g
                      I've updated to the 9.16.12 version and my problem persists, so this is clearly not related. Maybe I should open a support ticket?

                      1 Reply Last reply Reply Quote 0
                      • M
                        matthijs
                        last edited by

                        I updated directly to bind916-9.16.12 from the command line like viktor_g suggested, and all my Bind problems are SOLVED :-) , bind starts and runs without issues now. Can Netgate update the official Package Manager with bind916-9.16.12 ?

                        Thanks in advance

                        Kr

                        Matthijs

                        1 Reply Last reply Reply Quote 0
                        • N
                          nordeep @viktor_g
                          last edited by

                          @viktor_g
                          Confirm. After manual update bind to 9.16.12 issue has been solved, in my case.
                          Thank you!

                          1 Reply Last reply Reply Quote 0
                          • freebsd-manF
                            freebsd-man
                            last edited by

                            After deleting the manual installed bind and lmdb packages und the old pfSense -pkg-bind package via shell, I installed the updated Package pfSense-pkg-bind-9.16_10 via GUI.

                            While installing I used "tail -f /var/log/resolver.log" to inspect the startup of the new bind.

                            I got rndc timeout messages from install log in GUI and errors from bind-startup in tail-output.
                            After GUI timouts the install finished successful.

                            After deleting corrupted journal files with "rm /cf/named/etc/namedb/*jnl" I finally was able to start bind via GUI.
                            Now it is up and running.

                            M D 2 Replies Last reply Reply Quote 0
                            • M
                              md0 @freebsd-man
                              last edited by

                              @freebsd-man
                              In my case, upgrading the Bind package did not solve the problem. I get the same timeouts at install, but also during startup. I've deleted all the files and reinstalled the package, but it didn't make any difference.

                              What I've found out is that given enough time (like 15-20 minutes in my case) the named service will eventually start, but not before cycling through each of the zones trying to freeze and thaw them and getting timeouts from RNDC. There are no logs of this, but I can see the processes in memory:

                              /usr/local/sbin/rndc -q -c /usr/local/etc/rndc.conf freeze [zone name] IN [view name]
                              /usr/local/sbin/rndc -q -c /usr/local/etc/rndc.conf thaw [zone name] IN [view name]
                              

                              It's definitely a problem with RNDC, but I can't figure out how to fix it, I guess I'll just have to go back to 2.4.5, since I'm apparently the only one experiencing this issue.

                              freebsd-manF 1 Reply Last reply Reply Quote 1
                              • freebsd-manF
                                freebsd-man @md0
                                last edited by

                                @md0
                                But we can say, named is not crashing with signal 11 any more.
                                Your problem seems to be a config problem.
                                And yes, bind behaves more and more like a "diva".

                                My tips:
                                When saving something in GUI and running in timeout, it is possible to interrupt the page reload after 5 or more seconds and enter bind GUI again. So the GUI has generated the config.
                                Setting bind logging more verbose will also help to find the errors via "tail -f /var/log/resolver.log" in ssh shell.
                                If rndc shows timeouts, in most cases named does not answer.
                                So the timeout is caused by config problem. For example IP-port is used by other daemon or zone-loading error or journal file corrupted.
                                When inspecting zones I always clear the serial.
                                Keep your configs simple and look out for error messages at named start.
                                When looking for errors, it is better to stop and start the process. Using rndc is useless if named is not running!
                                Keep in mind: Errors will stop "named" process and warnings will not. Most errors will be show inside "resolver" logfiles.

                                M 1 Reply Last reply Reply Quote 0
                                • M
                                  md0 @freebsd-man
                                  last edited by

                                  @freebsd-man
                                  In my case named never crashed with signal 11 - it's definitely a different problem. If it's a config issue, the logs never show anything wrong - during the first (automatic) start attempt nothing gets logged and afterwards everything appears completely normal. Below is typical output:

                                  bind.txt

                                  Incrreasing the verboseness does not produce any additional warnings.
                                  Any idea what else I could do to try to fix this?

                                  M freebsd-manF 2 Replies Last reply Reply Quote 0
                                  • M
                                    md0 @md0
                                    last edited by

                                    @md0 Quick update:

                                    If I let the default start-up run its course, there are some errors in the log:

                                    2021-03-03T13:13:40.212844+02:00 router named[10949] general: error: malformed transaction: 5709a41150573356.mkeys.jnl last serial 9 != transaction first serial 8
                                    2021-03-03T13:13:40.212859+02:00 router named[10949] general: error: managed-keys-zone/[View]: keyfetch_done:dns_journal_write_transaction -> unexpected error
                                    2021-03-03T13:13:40.212902+02:00 router named[10949] dnssec: error: managed-keys-zone/[View]: error during managed-keys processing (unexpected error): DNSSEC validation may be at risk
                                    

                                    If I start the service manually there are no errors. In any case, Bind work correctly after starting up.

                                    freebsd-manF GertjanG 2 Replies Last reply Reply Quote 0
                                    • freebsd-manF
                                      freebsd-man @md0
                                      last edited by

                                      @md0
                                      Do you use the bind as local DNS only inside a LAN?
                                      Or is it a public (authoritative) or hidden-primary DNS for your officially registered Domains?

                                      I use it in both cases on different locations.

                                      In my LAN it holds the internal zones while the "DNS Resolver" does the caching.
                                      I use Bind on localhost (ports 1053 and 1953) for several master-zones and the reverse-zones.
                                      I use one ACL with a list of local clients.
                                      The "DNS Resolver" (unbound) has the Custom Option

                                      server:
                                      do-not-query-localhost: no
                                      

                                      and some "Domain Overrides" with "Lookup Server IP Address" 127.0.0.1@1053.
                                      DNS Resolver is listening on localhost and LAN (default ports 53 + 953) an has the outgoing interfaces localhost LAN und WAN.
                                      If the bind on my pfSense has a problem, I can change the domain overrides with a temporary bind on my fileserver.
                                      And the DNS Resolver caches the local zones for a while, when troubleshooting bind.
                                      His zones are a copy of the Bind GUIs "Resulting Zone Config File" outputs.

                                      And at my carrier-side server-housing location there is a public DNS on a pfSense for several official zones. (authoritative or hidden-primary).

                                      Now you can find out, what your need is.
                                      It is listening on a dedicated Interface. I use different ACLs for recursion or notify setups.

                                      These setups helped me to keep control on my DNS locations and are running for years.
                                      Public DNS is always dangerous when not monitored.
                                      Private DNS with dynamic IP location has to be simple.
                                      Bind as private caching DNS is in my opinion not the best idea.
                                      Unbound is reliable as caching DNS and is a pfSense builtin service, but not so interesting for holding own zones.
                                      So my best solution is to use both services side-by-side.

                                      1 Reply Last reply Reply Quote 0
                                      • freebsd-manF
                                        freebsd-man @md0
                                        last edited by

                                        @md0
                                        stop your bind.
                                        Delete corrupted journal files for example with "rm /cf/named/etc/namedb/*jnl".
                                        Start bind.
                                        The journal files will be rebuild on bind startup.

                                        M 1 Reply Last reply Reply Quote 0
                                        • M
                                          md0 @freebsd-man
                                          last edited by

                                          @freebsd-man I use Bind as a public server for several domains and Unbound as a forwarder for the local network. Bind binds to a public ip address on port 53 and to 127.0.0.1 on port 8953 (control channel). Unbound uses port 53 on the LAN interface and 853 (default) on the loopback - there should are no conflicts and this setup has been stable for quite some time now.

                                          Deleteing the jnl files has no efect. On the next reboot Bind goes through the same timeouts (and errors) as before. Starting the named service manually on the other hand generates no errors, even if I do not remove the jnl file.

                                          One important hint may be that following an automatic start (after the usual timeouts) I end up with two named processes instead of one and, strangely, this does not appear to have any impact on Bind, I guess one of the instances is dormant, but I don't know enough about Bind's internal achitecture to make a guess about what's going on.

                                          freebsd-manF 1 Reply Last reply Reply Quote 0
                                          • GertjanG
                                            Gertjan @md0
                                            last edited by Gertjan

                                            @md0 said in Bind upgrade producing errors on pfsense 2.5 upgrade:

                                            blblabla.mkeys.jnl last serial 9 != transaction first serial 8

                                            That is best as it gets as an explanation that says that a dynamic zone file has been edited/modified by something / someone - but not bind itself.

                                            The theory :
                                            If you want - or pfSense wants - to edit (change) a dynamic zone files, the "freezed - edited - reloaded - thawed" instructions should be used.

                                            The reality :
                                            If pfSense want to change a zone file, because a 'A' or 'SOA' needed to be changed, it applies this method. If if something breaks, and bind itself continues (also) to modify these files, the sync journal - the file with the dot jnl extension - get corrupted.

                                            The If not :
                                            That's the error you see.

                                            And that's just one of the reasons that I think that the person that thinks that he can manage 'bind' with a GUI should be handled with love, compassion and respect. For the others : Just run. Now. As many have tried. Many have ....

                                            No "help me" PM's please. Use the forum, the community will thank you.
                                            Edit : and where are the logs ??

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.