No single point of failure- pfsense HA

c5244714

Hello everybody.
we implemented a pfsense in HA like in the diagram below 0_1547750620136_pfsense_ha_dual_sw (1).png

the issue is regards to the recovering of the HA in case of any failure :
scenario 1 - primary pfsense goes down (crashed or powered-off) :
slave device becomes a master and traffic from upper network (192.168.1.x) to and from the lower network (10.1.0.x) is working with no problem.

scenario 2 (the problematic one) - link between master firewall to the lower network ( 10.0.1.21/24 ip ) is in down state from any reason :
the client in the upper network unable to ping /communicate with the lower network and vice versa.
the problem resolved only when i power off the master pfsense.

well how is that possible in HA environment when no even single point of failure is overcome ??
i know with SQL cluster any link failure between the node to the network and fail-over is taken into action..

well if its a normal behavior - please suggest any ability/policy triggered to intercept this kind of issue and act like the whole master node fail
Thanks
Tom

SteveITS

Since I just posted about it, is this a Netgate SG-3100? If so you need to have the sync on the LAN switch and the LAN on the Opt port, or the unit will see the switch as "up." (https://forum.netgate.com/topic/132433/sg-3100-ha-or-not/8)

c5244714

@teamits
Hello .
Its not an appliance but two vm under vmware server. Latest Version 2.4.4.
Tom

Derelict

0_1547765654697_HA+LACP.png

pfSense HA is HA for the firewalls only. If you have a case where the link is up but not passing traffic, pfSense will not see link-down and will not fail over. If the LACP link described up there is up and passing the BPDUs and pfSense has no reason to think the link is down, failover will not occur even if the switches pass no traffic. That would be a switch failure not a router failure and should be handled by the switching layer.

There is no way to know if failing over will even help.

You would likely end up with a MASTER/MASTER situation in that case because if traffic cannot be processed by the switch CARP probably isn't making it from the primary to the secondary anyway.

c5244714

Hello again.
sorry that i was not cleared but the link is down/disconnected and not "up , but not pass traffic" ..

Derelict

Then it would fail over if it had a CARP VIP on it.

c5244714

It has a VIP on it- WAN interface and LAN interface with a different VHID of course. from some reason its not trigger the fail over.
any advise?

Derelict

What does Status > CARP (failover) show when the link is down?

What is logged in the system logs when the link goes down and up?

c5244714

Hello Derelict.
finally i was able to solve the problem due to the misconfigured XMLRPC SYNC section : the two pfsense had different password to admin username . in addition , in one of the servers the ip address in XMLRPC field was blank
thanks for your willing to help me with this issue