Any changes to BIND zone results in SERVFAIL
-
Hey friends
I've got a faulty bind set up…or at least something is going on. Basically, every month or so, my main domain zone will stop working. lookups all fail with servfail and the logs show the dreaded query fail message.
My solution has been to restore from a known working backup. Sometimes that buys me a month or two and sometimes only a week or two.
Also, if I change anything with that zone, it fails and I have to restore again. I've tried adding new hosts using the GUI and the custom zone file section, both case clients to get SERVFAIL.
I have, historically, had DHCP clients register with DNS but have turned that off testing now.
Anyone see anything glaring with my zone file? (I know I have some public IPs in there that I've masked with XXX.ZZZ ... I don't think that's the problem unless you smart minds tell me otherwise.)
Lastly, Aspen (10.50.1.1) is a slave and even when Washington (10.15.1.1) gets itself out of sorts, aspen (slave) will respond correctly and without error.
$TTL 43200 ; $ORIGIN nsnet.us. ; Database file nsnet.us.DB for nsnet.us zone. ; Do not edit this file!!! ; Zone version 2496684162 ; nsnet.us. IN SOA 10.15.1.1\. zonemaster.nsnet.us. ( 2496684162 ; serial 1d ; refresh 2h ; retry 4w ; expire 1h ; default_ttl ) ; ; Zone Records ; @ IN NS 10.15.1.1. @ IN A 10.15.1.1 washington IN A 10.15.1.1 vail IN A 10.15.1.15 ajax IN A 10.50.1.103 alta IN A 192.168.83.2 blackcomb IN A 10.50.1.15 chamonix IN A 10.15.1.11 colorado IN A 198.27.XXX.ZZZ frontrange IN A 192.99.XXX.ZZZ osx5 IN A 10.15.1.100 prima IN A 10.75.1.20 telluride IN A 10.75.1.1 verbier IN A 10.75.1.15 wintergreen IN A 10.50.1.107 winterpark IN A 144.217.XXX.ZZZ yonder IN A 10.75.1.25 zermatt IN A 10.15.1.115 aspen IN A 10.50.1.1 elkrange IN A 217.182.XXX.ZZZ rockies IN A 217.182.XXX.ZZZ highline IN A 10.15.1.105
-
Hey friends
I've got a faulty bind set up…or at least something is going on. Basically, every month or so, my main domain zone will stop working. lookups all fail with servfail and the logs show the dreaded query fail message.
My solution has been to restore from a known working backup. Sometimes that buys me a month or two and sometimes only a week or two.
Also, if I change anything with that zone, it fails and I have to restore again. I've tried adding new hosts using the GUI and the custom zone file section, both case clients to get SERVFAIL. I know that the nameserver should be a hostname not an IP...but if I fix that, it breaks things and I have to restore.
I have, historically, had DHCP clients register with DNS but have turned that off testing now.
I've also tried completely deleting the zone and recreating it from scratch with only one host - also SERVFAIL
Anyone see anything glaring with my zone file? (I know I have some public IPs in there that I've masked with XXX.ZZZ ... I don't think that's the problem unless you smart minds tell me otherwise.)
Lastly, Aspen (10.50.1.1) is a slave and even when Washington (10.15.1.1) gets itself out of sorts, aspen (slave) will respond correctly and without error.
$TTL 43200 ; $ORIGIN nsnet.us. ; Database file nsnet.us.DB for nsnet.us zone. ; Do not edit this file!!! ; Zone version 2496684162 ; nsnet.us. IN SOA 10.15.1.1\. zonemaster.nsnet.us. ( 2496684162 ; serial 1d ; refresh 2h ; retry 4w ; expire 1h ; default_ttl ) ; ; Zone Records ; @ IN NS 10.15.1.1. @ IN A 10.15.1.1 washington IN A 10.15.1.1 vail IN A 10.15.1.15 ajax IN A 10.50.1.103 alta IN A 192.168.83.2 blackcomb IN A 10.50.1.15 chamonix IN A 10.15.1.11 colorado IN A 198.27.XXX.ZZZ frontrange IN A 192.99.XXX.ZZZ osx5 IN A 10.15.1.100 prima IN A 10.75.1.20 telluride IN A 10.75.1.1 verbier IN A 10.75.1.15 wintergreen IN A 10.50.1.107 winterpark IN A 144.217.XXX.ZZZ yonder IN A 10.75.1.25 zermatt IN A 10.15.1.115 aspen IN A 10.50.1.1 elkrange IN A 217.182.XXX.ZZZ rockies IN A 217.182.XXX.ZZZ highline IN A 10.15.1.105
-
I know I'm reviving an old thread but maybe someone can use this information.
I've just had the same thing: even changing the serial makes Bind respond with SERVFAIL.
We are using dynamic updates from DHCP, so Bind is keeping a journal for the zone. (https://ftp.isc.org/www/bind/arm95/Bv9ARM.ch04.html#dynamic_update)
I noticed this in "Status -> System Logs -> System -> DNS Resolver" (/var/log/resolver.log; you may have to dig into history):
Aug 16 14:33:34 pfsense named[27097]: zone <zone>/IN/<zone>: journal rollforward failed: journal out of sync with zone Aug 16 14:33:34 pfsense named[27097]: zone <zone>/IN/<zone>: not loaded due to errors.
I solved this by first sync-ing the journal (login as admin/root on pfSense):
rndc -c /cf/named/etc/namedb/rndc.conf sync -clean
And then restart Bind using the GUI.
I changed the serial again and the problem did not pop up. -
Hi,
Something you didin't mention, but probably did :
When changing, for a example, the SOA in a zone file, and this zone is also updated by RFC 2136 (dynamic), you have to :rndc freeze <zone>
Only now you can open, edit, and save the zone file.
rndc reload <one> rndc thaw <zone>
Syncing the ".jnl" == journal file is ok, buth bind keep the actual working zone structure in memory, not in the actual file you are editing.
Btw : I don't know if all this is done by pfSense, but when you edit the file, use these 3 "rndc" sequences.
The action is visible in the "general" log - if the bind package has such a facility :
Example :16-Aug-2018 15:30:14.300 general: received control channel command 'freeze home.brit-hotel-fumel.fr' 16-Aug-2018 15:30:14.300 general: freezing zone 'home.brit-hotel-fumel.fr/IN': success 16-Aug-2018 15:30:27.940 general: received control channel command 'reload home.brit-hotel-fumel.fr' 16-Aug-2018 15:30:35.636 general: received control channel command 'thaw home.brit-hotel-fumel.fr' 16-Aug-2018 15:30:35.636 general: thawing zone 'home.brit-hotel-fumel.fr/IN': success