CARP Sync Issue - when no internet on standby
-
Hi all
I've noted a possible bug in pfSense CARP. We have multiple pf instances set up in failover. In some of them, we do not have public IPs available for both primary and the secondary and therefore we are using internal IPs for the WAN interfaces, and a public IP for the CARP.
This works well and fails over normally. However, when making changes on the primary, they often fail with:
Jul 4 09:02:55 php-fpm 32688 /rc.filter_synchronize: XML_RPC_Client: RPC server did not send response before timeout. 103
Jul 4 09:02:55 php-fpm 32688 /rc.filter_synchronize: A communications error occurred while attempting XMLRPC sync with username admin https://172.16.18.2:443.
Jul 4 09:02:55 php-fpm 32688 /rc.filter_synchronize: New alert found: A communications error occurred while attempting XMLRPC sync with username admin https://172.16.18.2:443.If I reboot the secondary (standby) pfSense, it syncs up all the changes on first boot and then starts throwing that error up around 10 minutes afterwards.
At first, I blamed the internal IPs on the WAN interfaces. Therefore, I replaced the private IPs on one instance and put public IPs throughout. This immediately resolved the issue. So I continued to explore logs to try and identify the actual root cause. As part of this test, I blocked internet access to the secondary (standby) pfSense unit. As soon as I did this, the unit started throwing the above errors.
When using private IPs, the secondary (standby) unit never has internet access until failover occurs. Therefore, this issue seems to be related to the standby unit not having internet and/or not reaching the gateway.
Any ideas?
-
When using private IPs, the secondary (standby) unit never has internet access until failover occurs. Therefore, this issue seems to be related to the standby unit not having internet and/or not reaching the gateway.
That's likely the entire issue.
Which is why we don't recommend using that style of configuration on a primary WAN. For a non-default/secondary WAN it can be OK, or for internal interfaces, but both units need to have functioning Internet access, or at least functioning DNS.
Now if your private IP addresses on WAN can get out (upstream does NAT, for example), and your NAT rules on WAN are OK, then it's possible the units themselves could get out and be OK. If traffic leaving the firewall must use the CARP VIP to exit, then probably not.
You might try spinning up a local DNS server off the firewalls and then point DNS on the firewalls to that, see if it helps.