IPsec tunnel with FQDN identifiers and "DNS on one site down" scenario

e-1-1

Consider the following scenario for a site A - site B IPsec tunnel:

For one reason or another, the DNS resolver at site B is no longer working. It could be because, somehow, after a hard reboot, the firewall logic restored a previous, invalid unbound config, one that isn't accepted by the daemon, therefore, it will not start.
FQDNs are used as identifiers.

With the dns resolver no longer working, messages such as

filterdns	91621	failed to resolve host {site A FQDN identifier} will retry later again.

appear in the DNS logs as a result of the business logic trying to find site A's IP address.

Worse, with no IP resolved, site A's address is not added to the hidden pf rules, therefore site A cannot initiate an IPsec tunnel, as its traffic over (default) UDP 500 is blocked by default by site B's pf instance.

Result: broken tunnel, manual intervention needed on site B.

I'd propose:

adding an option in phase 1 settings to "allow all inbound IPsec connections" when partner's site IP cannot be resolved. It's exposing the firewall a bit, but it can be a good tradeoff when the tunnel needs to stay on.
introducing business logic into the DNS side: either setting a config as being a "known good" one and reverting to it automatically when unbound detects a bad config at startup, or automatically going back one unbound config at a time until unbound loads.
With both of the above having associated alarms triggered when the logic is run.

e-1-1

@jimp I'd love to see an opinion from Netgate about this scenario when you got some time; can't be that I'm the only one running site to site IPsec tunnels with dynamic IPs and FQDNs as identifiers.