HTTPS and SSH services appear to be down only on CARP backup
-
We use a Zabbix 6.2 server to monitor our firewalls. We have the zabbix 6.0 package running on all of them, but that's not terribly relevant to this issue. The zabbix server checks the availability of the HTTPS and SSH services independently of the zabbix agent package on the firewalls. We have two sites, each with two firewalls configured for HA, and the sites are connected via an OpenVPN site-to-site setup. The zabbix server is monitoring each of them on their dedicated LAN IP (not any shared CARP IP).
Since their configuration with zabbix, the zabbix server incorrectly reports that the HTTPS and SSH services go down on only the firewalls with the CARP backup role (the firewalls with the master role have no such reporting). It's also worth noting that the zabbix agent does not appear to go down during these times. Each service is checked every minute. Each firewall has a different intermittency from each other, but the services for each are simultaneous with each other (e.g. if pfSense A has HTTPS reported as down, SSH will also be reported as down; and when one comes up, the other comes with it. pfSense B could be reported as just fine for both the whole time). During the windows when it's reported down, I'm able to access both services just fine, so I know it's false positives.
Below are the service up/down history for the past four hours for both firewalls with the CARP backup role.
First firewall:
Second firewall:
I've been thinking this one over for a while now, but I can't seem to think of any reason why only those services would go down (at varying frequencies and durations) when the agent stays up, and only on the backup firewalls...
The closest thing I can think of is if there's some sort of self-protection built-in that blocks access from any address if it has so many unsuccessful logins within a period of time. But if that is the case, I would expect the frequency and duration to be consistent and to happen with the master firewalls, too.
Any thoughts or pointers would be greatly appreciated!
-
Well, it looks like my expectations about the self-protection were wrong! I found in the system logs of the pfSense firewalls that it was flagging the checks from zabbix as an attack, and would periodically block all access from the zabbix server IP. I was able to whitelist that IP from the login protections, and I haven't seen any issues since. I still have no idea why this issue only manifested for the backup firewalls and not the master ones, seeing as their configurations are nearly identical, but hopefully this helps someone else in the future!