Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Any changes to BIND zone results in SERVFAIL

    Scheduled Pinned Locked Moved pfSense Packages
    4 Posts 3 Posters 1.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      SpaceBass
      last edited by

      Hey friends

      I've got a faulty bind set up…or at least something is going on. Basically, every month or so, my main domain zone will stop working. lookups all fail with servfail and the logs show the dreaded query fail message.

      My solution has been to restore from a known working backup. Sometimes that buys me a month or two and sometimes only a week or two.

      Also, if I change anything with that zone, it fails and I have to restore again. I've tried adding new hosts using the GUI and the custom zone file section, both case clients to get SERVFAIL.

      I have, historically, had DHCP clients register with DNS but have turned that off testing now.

      Anyone see anything glaring with my zone file? (I know I have some public IPs in there that I've masked with XXX.ZZZ ... I don't think that's the problem unless you smart minds tell me otherwise.)

      Lastly, Aspen (10.50.1.1) is a slave and even when Washington (10.15.1.1) gets itself out of sorts, aspen (slave) will respond correctly and without error.

      $TTL 43200
      ;
      $ORIGIN nsnet.us.
      
      ;	Database file nsnet.us.DB for nsnet.us zone.
      ;	Do not edit this file!!!
      ;	Zone version 2496684162
      ;
      nsnet.us.	 IN  SOA 10.15.1.1\. 	 zonemaster.nsnet.us. (
      		2496684162 ; serial
      		1d ; refresh
      		2h ; retry
      		4w ; expire
      		1h ; default_ttl
      		)
      
      ;
      ; Zone Records
      ;
      @ 	 IN NS 	10.15.1.1.
      @ 	 IN A 	10.15.1.1
      washington 	 IN A  	10.15.1.1
      vail 	 IN A  	10.15.1.15
      ajax 	 IN A  	10.50.1.103
      alta 	 IN A  	192.168.83.2
      blackcomb 	 IN A  	10.50.1.15
      chamonix 	 IN A  	10.15.1.11
      colorado 	 IN A  	198.27.XXX.ZZZ
      frontrange 	 IN A  	192.99.XXX.ZZZ
      osx5 	 IN A  	10.15.1.100
      prima 	 IN A  	10.75.1.20
      telluride 	 IN A  	10.75.1.1
      verbier 	 IN A  	10.75.1.15
      wintergreen 	 IN A  	10.50.1.107
      winterpark 	 IN A  	144.217.XXX.ZZZ
      yonder 	 IN A  	10.75.1.25
      zermatt 	 IN A  	10.15.1.115
      aspen 	 IN A  	10.50.1.1
      elkrange 	 IN A  	217.182.XXX.ZZZ
      rockies 	 IN A  	217.182.XXX.ZZZ
      highline	IN A	10.15.1.105
      
      
      1 Reply Last reply Reply Quote 0
      • S
        SpaceBass
        last edited by

        @SpaceBass:

        Hey friends

        I've got a faulty bind set up…or at least something is going on. Basically, every month or so, my main domain zone will stop working. lookups all fail with servfail and the logs show the dreaded query fail message.

        My solution has been to restore from a known working backup. Sometimes that buys me a month or two and sometimes only a week or two.

        Also, if I change anything with that zone, it fails and I have to restore again. I've tried adding new hosts using the GUI and the custom zone file section, both case clients to get SERVFAIL. I know that the nameserver should be a hostname not an IP...but if I fix that, it breaks things and I have to restore.

        I have, historically, had DHCP clients register with DNS but have turned that off testing now.

        I've also tried completely deleting the zone and recreating it from scratch with only one host - also SERVFAIL

        Anyone see anything glaring with my zone file? (I know I have some public IPs in there that I've masked with XXX.ZZZ ... I don't think that's the problem unless you smart minds tell me otherwise.)

        Lastly, Aspen (10.50.1.1) is a slave and even when Washington (10.15.1.1) gets itself out of sorts, aspen (slave) will respond correctly and without error.

        $TTL 43200
        ;
        $ORIGIN nsnet.us.
        
        ;	Database file nsnet.us.DB for nsnet.us zone.
        ;	Do not edit this file!!!
        ;	Zone version 2496684162
        ;
        nsnet.us.	 IN  SOA 10.15.1.1\. 	 zonemaster.nsnet.us. (
        		2496684162 ; serial
        		1d ; refresh
        		2h ; retry
        		4w ; expire
        		1h ; default_ttl
        		)
        
        ;
        ; Zone Records
        ;
        @ 	 IN NS 	10.15.1.1.
        @ 	 IN A 	10.15.1.1
        washington 	 IN A  	10.15.1.1
        vail 	 IN A  	10.15.1.15
        ajax 	 IN A  	10.50.1.103
        alta 	 IN A  	192.168.83.2
        blackcomb 	 IN A  	10.50.1.15
        chamonix 	 IN A  	10.15.1.11
        colorado 	 IN A  	198.27.XXX.ZZZ
        frontrange 	 IN A  	192.99.XXX.ZZZ
        osx5 	 IN A  	10.15.1.100
        prima 	 IN A  	10.75.1.20
        telluride 	 IN A  	10.75.1.1
        verbier 	 IN A  	10.75.1.15
        wintergreen 	 IN A  	10.50.1.107
        winterpark 	 IN A  	144.217.XXX.ZZZ
        yonder 	 IN A  	10.75.1.25
        zermatt 	 IN A  	10.15.1.115
        aspen 	 IN A  	10.50.1.1
        elkrange 	 IN A  	217.182.XXX.ZZZ
        rockies 	 IN A  	217.182.XXX.ZZZ
        highline	IN A	10.15.1.105
        
        
        1 Reply Last reply Reply Quote 0
        • R
          rsterenborg
          last edited by

          I know I'm reviving an old thread but maybe someone can use this information.

          I've just had the same thing: even changing the serial makes Bind respond with SERVFAIL.

          We are using dynamic updates from DHCP, so Bind is keeping a journal for the zone. (https://ftp.isc.org/www/bind/arm95/Bv9ARM.ch04.html#dynamic_update)

          I noticed this in "Status -> System Logs -> System -> DNS Resolver" (/var/log/resolver.log; you may have to dig into history):

          Aug 16 14:33:34 pfsense named[27097]: zone <zone>/IN/<zone>: journal rollforward failed: journal out of
           sync with zone
          Aug 16 14:33:34 pfsense named[27097]: zone <zone>/IN/<zone>: not loaded due to errors.
          

          I solved this by first sync-ing the journal (login as admin/root on pfSense):

          rndc -c /cf/named/etc/namedb/rndc.conf sync -clean
          

          And then restart Bind using the GUI.
          I changed the serial again and the problem did not pop up.

          1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan
            last edited by

            Hi,

            Something you didin't mention, but probably did :
            When changing, for a example, the SOA in a zone file, and this zone is also updated by RFC 2136 (dynamic), you have to :

            rndc freeze <zone>
            

            Only now you can open, edit, and save the zone file.

            rndc reload <one>
            rndc thaw <zone>
            

            Syncing the ".jnl" == journal file is ok, buth bind keep the actual working zone structure in memory, not in the actual file you are editing.

            Btw : I don't know if all this is done by pfSense, but when you edit the file, use these 3 "rndc" sequences.
            The action is visible in the "general" log - if the bind package has such a facility :
            Example :

            16-Aug-2018 15:30:14.300 general: received control channel command 'freeze home.brit-hotel-fumel.fr'
            16-Aug-2018 15:30:14.300 general: freezing zone 'home.brit-hotel-fumel.fr/IN': success
            16-Aug-2018 15:30:27.940 general: received control channel command 'reload home.brit-hotel-fumel.fr'
            16-Aug-2018 15:30:35.636 general: received control channel command 'thaw home.brit-hotel-fumel.fr'
            16-Aug-2018 15:30:35.636 general: thawing zone 'home.brit-hotel-fumel.fr/IN': success
            

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.