CARP VIPs with different states on secondary firewall

decibel83

Hi everyone,
I have two pfSense 2.5.2-RELEASE (amd64) systems in HA configuration.

Yesterday I suddenly realised that the secondary system had different states, only one was correctly BACKUP, all others was MASTER even if the primary firewall was up and running:

Screenshot 2022-08-30 at 09.10.02.png

This is my VIPs configuration:

screencapture-fw1-dc-ies-it-8443-firewall-virtual-ip-php-2022-09-01-17_50_02.png

This caused me big routing problems and some services were unreachable at that moment because the public network were routed to the primary firewall and not to the secondary.

Anyway I expect that VIPs with the same vhid should have the same status on a single firewall.

Could you help me to understand what's going on, please?

Thank you very much!
Bye

dotdash

@decibel83
It looks like you aren't passing the vlan tags over to the secondary unit. Are those vlans?, I can't tell. In any case, make sure the primary and secondary can communicate on all interfaces.
The vhid is just a number that must be unique on the interface. Having the same vhid on different interfaces does not mean they are linked in any way. Expected behavior is that any CARP VIPs on the same interface should have the same status.
Your mixed use of CARP and Alias VIPs could also be problematic. It is certainly a valid way to stack more VIPs, but if you are not well versed in HA, it's easier to make all of your VIPs CARP. I can't tell, due to the redaction of your private ips, if the configuration is correct, but you aren't using enough VIPs to require Alias stacking.

Derelict

@decibel83 said in CARP VIPs with different states on secondary firewall:

Anyway I expect that VIPs with the same vhid should have the same status on a single firewall.

No, that's not how it works. The VHID has nothing to do with anything on different broadcast domains.

You should probably read and understand this:

https://forum.netgate.com/post/719523

decibel83

@derelict sorry for my late answer.

The reason I choose the same VHID on all VIPs because I read the CARP FAQ on the OpenBSD website: https://www.openbsd.org/faq/pf/carp.html

The description of the group flag explains what follows:

If one physical CARP-enabled interface goes down, CARP will increase the demotion counter by 1 on interface groups that the carp(4) interface is a member of, in effect causing all group members to fail-over together.

Reading so, I understand that virtual IP addresses with the same VHID will be in the same failover group, so they will failover at the same time if one of the IP addresses will failover, which is what I want: I need all VIPs to have the same CARP status on the same firewall, or I will have some routing problems (traffic which enter one firewall and exit to the other).

Furthermore, I read the doc at https://docs.netgate.com/pfsense/en/latest/highavailability/reduce-heartbeat-traffic.html which tells what follows:

[...] additional VIPs may be “stacked” on top of one CARP VIP on an interface.
[...]
This not only reduces the heartbeats on a given segment, but it also causes all of the IP alias VIPs to change status along with the “main” CARP VIP, reducing the likelihood that a layer 2 issue will cause individual CARP VIPs to not fail over as expected.

I understand that I can set one VIP as CARP IP address, and the others on the same interface as IP Alias to the "main" CARP IP address, and this causes they to failover at the same time.

Could you help me to understand what's wrong with my interpretation, please?

Thank you very much!

Derelict

@decibel83 The VHID has nothing to do with anything except the other nodes on that broadcast domain.

In pfSense, all CARP interfaces on the firewall are demoted as described if another CARP interface loses link. There is no concept of different CARP groups.

The IP Alias trick is elegant and recommended but there is nothing technically wrong with having a bunch of CARP VIPs on an interface as long as the VHIDs are unique. Conflicts generally happen with other devices on the segment, not itself since it's pretty hard to create a VIP with the same VHID on the same interface in pfSense due to the validation code.

decibel83

@derelict said in CARP VIPs with different states on secondary firewall:

In pfSense, all CARP interfaces on the firewall are demoted as described if another CARP interface loses link. There is no concept of different CARP groups.

So if I understand well what you are telling, all interfaces should failover on the same firewall if one of them loses link and need failover.

So if it's true, why the situation shown in my first screenshot happened to me (LAN interface became BACKUP and all the others remained MASTER causing me wrong routing from underlying hosts)?

In short, a single firewall should be MASTER or BACKUP for all of its interfaces, not only few of them, am I right in this observation?

Thank you!

Derelict

@decibel83 The answer is it depends. In order to go to BACKUP state the VIPs have to see "better" advertisements from another node. Simply being demoted is not enough.

If everything is MASTER on the primary and BACKUP on the secondary just put the primary in maintenance mode. Everything should flip.

When you down an interface, it should go into INIT state. I don't see any screenshots indicating that.

I don't know enough about your setup to elaborate any further.

decibel83

@Derelict said in CARP VIPs with different states on secondary firewall:

@decibel83 The answer is it depends. In order to go to BACKUP state the VIPs have to see "better" advertisements from another node. Simply being demoted is not enough.

I understand your question, but I really have to understand this.
The situation shown in the screenshot happened without any human intervention and caused a network split, which is dangerous and makes the HA feature useless.

If everything is MASTER on the primary and BACKUP on the secondary just put the primary in maintenance mode. Everything should flip.

I know, but this is valid only in case of any maintenance.

When you down an interface, it should go into INIT state. I don't see any screenshots indicating that.

Interfaces were not down, both firewall were running without any problem, and that happened without any human intervention.

I don't know enough about your setup to elaborate any further.

I understand your point, anyway do you have any ideas or hypothesis about what could happen?

Derelict

@decibel83 A problem at Layer 2 is the most common cause.