DNS Resolver not listening on LAN CARP VIP after update to 2.5.1
-
@rle I understand your frustration (been there) and see what you mean. This is why I never update to the latest release on a production environment and rather do this "locally" where I can handle the disruption(s) easily. What you experienced is exactly what I have, verbatim, and I assure you it's caused by the CARP skew issue on the secondary node. If you are doing this in a production environment then avoid it and stick to a single node for now. The workaround is what I described above, but you need to manually intervene after every full sync and keep that skew difference monitored as it will bring your primary iface (thus network) down. It's a major fail yes, but easily fixable. I hope the developer will update the source ASAP.
-
@rle said in DNS Resolver not listening on LAN CARP VIP after update to 2.5.1:
I have been banging my head for the last 6 hours. So fed up to be honest.
When I enabled the CARP mode of pfBlockerNG, my complete network went crashing down the rabbit hole. Played again with various settings: especially unbound mode vs unbound python mode, resetting states, ifconfig, you name it....
My conclusion is that pfSense High Availability CARP w/ pfBlockerNG/unbound (and what about IPS/IDS?) is simply not up to par anymore nowadays. This use case is unfit for (business) production use.
I think Netgate has a very interesting dilemma and challenge with pfSense/FreeBSD and in keeping up programming/dependency wise versus up to par features.
Therefore, I'm going back to a single instance of pfSense with a much broader solid battle field tested base in combination with an old fashioned strategy of a good backup with a spare node. Downtime is going to be far less prevalent than what I'm experiencing now with HA CARP.
This is pfBlockerNG bug:
https://redmine.pfsense.org/issues/11964Use the "IP Alias" VIP type or wait for the fix.
-
@viktor_g That will still cause intermittent DNS failures and there will be 2 identical active IPs on 2 different hosts. Also, the resolver won't listen on the CARP GW IP. See above.
-
@viktor_g @Luke_71 Thanks for the feedback. In hindsight I'm going to wait it out (tired and tired of problem solving). So I've just shut down the secondary node, pulled the SYNC cable and disabled CARP for the time being. This is IMHO panning out as an acceptable temporarily solution until a fix comes along.
Will keep you posted.
-
@luke_71
Please install the System Patches package:
https://docs.netgate.com/pfsense/en/latest/development/system-patches.htmland apply Patch https://github.com/pfsense/FreeBSD-ports/pull/1071/commits/96abc00bba758dddebc09611300ac4680dc0fc5a
Then run pfBlockerNG Force restart
-
@viktor_g Unfortunately I got some error messages.
Status update:
--> Path Strip Count must be set to 4 instead of 2 (duh). Patch applied.Error Message
-
If I apply the CARP mode to pfBlockerNG I get:
Status / System Logs / System / General
May 28 01:46:44 dhcpleases 63735 Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) cannot be read, No such file or directory. May 28 01:46:44 dhcpleases 63735 Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) cannot be read, No such file or directory. May 28 01:46:29 dhcpleases 63735 Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) cannot be read, No such file or directory. May 28 01:46:29 dhcpleases 63735 Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) cannot be read, No such file or directory. May 28 01:46:27 dhcpleases 63735 Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) cannot be read, No such file or directory. May 28 01:46:20 dhcpleases 36267 Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) cannot be read, No such file or directory.
and Unbound + pfb_dnsbl service will not start at all regardless of DNSBL Mode.
Only the DNSBL VIP Type = IP Alias works for me.
In other words: I cannot properly test the patch unfortunately.
-
@viktor_g I confirm the patch works properly, the skew is no longer overwritten with a force reload on both nodes and if the resulting added value (100) is over 254 it reverts to max 254.
One additional observation: per the pfSense HA CARP guide, the CARP VIPs should have the same subnet as the main interface:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/high-availability.html
Incorrect Subnet Mask
The real subnet mask must be used for a CARP VIP, not /32. This must match the subnet mask for the IP address on the interface to which the CARP IP is assigned.I am certain that for "local" ifaces /32 is ok for pfBlocker, but shouldn't the subnet be something else (=matching the assigned iface subnet) other than /32 when pfBlocker is configured in CARP VIP mode based on the above assumption or am I reading this incorrectly?
Thanks for the patch.
-
@rle can you check the logs and see what the issues are with unbound? I only select LAN, localhost and CARP VIP for listening ifaces (don't select ALL) and WAN on outbound interface (plus LAN CARP VIP for local domain overrides). Remember to Force reload all first primary then secondary in pfBlocker Update after changing to CARP mode in DNSBL.
-
@luke_71 @viktor_g It seems that I had a (huge) misconfiguration with unbound. My knowledge is not up to par...
Apologies for my rant a couple of posts back. Can't seem to change/edit it however.
Only issue now is that pfb_dnsbl/pfBlockerNG DNSBL service is not starting at all. CARP issues are gone.
Huge thanks to both of you for your help and quick release of the patch!
Tested it on pfSense 2.6.0.a.20210527.0100
-
@rle I have no issues with pfBlockerNG but I'm on 2.5.1 / 3.0.0_16 + patch. I can only suggest you check the logs after having run a full reload on both nodes. Be sure that the unbound service is running without issues and that the DNSBL webserver config has no conflicting ports on the LAN interface.