Port forwarded NAT TCP state disappearing during failover (SOLVED)

adam65535

I setup a new cluster install with bare minimal rules to test pfsense clustering for deployments in the future. During the test I connect from a client on the WAN to an external WAN carp VIP which has a port forward to an internal LAN host. If I can get this figured out I will be deploying this setup soon and will be purchasing support btw.

When I pull the WAN interface cable on main firewall I interestingly still have communication for about 24 seconds but then the NAT state of the connection gets removed from the secondary for some reason causing the connection to freeze. I looked at the main firewall and the NAT state was also removed from the primary firewall.

State table on secondary firewall before pulling the WAN cable on the main firewall is below(stripped of other states). State is being synced to the secondary properly. Looks good.

y.y.50.22 is the internal SSH server on the LAN interface.
x.x.136.204 is the CARP VIP on the WAN side of the firewall
x.x.136.73 is the client on the WAN making the connection

tcp y.y.1.50:22 <- x.x.136.204:22 <- x.x.136.73:51127 ESTABLISHED:ESTABLISHED
tcp x.x.136.73:51127 -> y.y.50:22 ESTABLISHED:ESTABLISHED

State table on secondary (primary too)firewall about 24 seconds after pulling the cable on primary.
tcp x.x.136.73:51127 -> y.y.1.50:22 ESTABLISHED:ESTABLISHED

Connection now freezes about 24 seconds after I disconnected the WAN cable on primary. The firewall shows that it is now blocking the communication from the client because the state is no longer there for that connection.
The NAT state disappeared but the other part of that connection is still there.

If I establish a new connection through the secondary it connects just fine.
When I plug the cable back into the main firewall the NAT state does not disappear on the new connection so the failover back to the primary works without dropping the port forwarded connection from the WAN. This main to secondary and secondary to main failover behavior is repeatable every time.

I have a 2 cluster firewall with a main and backup on 2 Dell Poweredge 1950 servers. VIPs are on the WAN and LAN with a dedicated sync interface. Sync is working from primary to secondary(rules, VIP, state, etc). State sync is also working from secondary to primary. I am actually using 2 virtual IPs on the WAN. One is the cluster IP of the firewall itself (not used for port forwarding). The second (x.x.136.204) is what I am using to port forward the traffic to an internal private IP. All VIPs have a different VHID group.

Can anyone think of a condition that would cause the primary to remove the port forewarded NAT state and replicate that to the secondary or maybe both of them removing it during a main to backup failover?

Installed using pfSense-2.0.1-RELEASE-amd64.iso.gz btw using the SMP kernel on scsi raid mirrored hard drives.

adam65535

Problem solved… After finding release notes mentioning a gateway monitoring option that disabled clearing states I found the option below.

System->Advanced->Miscellaneous
the bottom option...

Gateway Monitoring
States

By default the monitoring process will flush states for a gateway that goes down. This option overrides that behavior by not clearing states for existing connections.

That is definitely not something you want for a cluster HA solution. I don't see anything stopping deployment now with some more testing.