IPsec tunnel with FQDN identifiers and "DNS on one site down" scenario
-
Consider the following scenario for a site A - site B IPsec tunnel:
- For one reason or another, the DNS resolver at site B is no longer working. It could be because, somehow, after a hard reboot, the firewall logic restored a previous, invalid unbound config, one that isn't accepted by the daemon, therefore, it will not start.
- FQDNs are used as identifiers.
With the dns resolver no longer working, messages such as
filterdns 91621 failed to resolve host {site A FQDN identifier} will retry later again.
appear in the DNS logs as a result of the business logic trying to find site A's IP address.
Worse, with no IP resolved, site A's address is not added to the hidden pf rules, therefore site A cannot initiate an IPsec tunnel, as its traffic over (default) UDP 500 is blocked by default by site B's pf instance.
Result: broken tunnel, manual intervention needed on site B.
I'd propose:
- adding an option in phase 1 settings to "allow all inbound IPsec connections" when partner's site IP cannot be resolved. It's exposing the firewall a bit, but it can be a good tradeoff when the tunnel needs to stay on.
- introducing business logic into the DNS side: either setting a config as being a "known good" one and reverting to it automatically when unbound detects a bad config at startup, or automatically going back one unbound config at a time until unbound loads.
With both of the above having associated alarms triggered when the logic is run.
-
@jimp I'd love to see an opinion from Netgate about this scenario when you got some time; can't be that I'm the only one running site to site IPsec tunnels with dynamic IPs and FQDNs as identifiers.