Connection state is lost after XMLRPC sync

Lenny

I have two pfsense firewalls in HA configuration (2.1-BETA0-amd64-20121106-0059).
They are configured to use WAN, SYNC and LAN interface with 30 VLAN interfaces configured on it.
The CARP is configured on WAN, LAN and all VLAN interfaces.

I have noticed that the web interface stops responding for a while when I apply new FW rule, NAT or alias.
On top of that the SSH connection to either FW itself or to server behind the FW "hangs" at the same time and never recover. The same behaviour can be provoked by force of XMLRPC sync to backup FW.
The Firewall log shows that the response from server was denied by "Default Rule":
@3 block drop out log inet all label "Default deny rule IPv4".
Afther lot of digging into this issue, I found that FW lose part of the connection state.

There are two connection states for established SSH connection:
client.someport -> server.22
server.22 -> client.someport

After the XMLRPC sync there is only one connection state left:
client.someport -> server.22

The result is that all responses from the server are blocked by that default rule as there is no state for that connection.

cmb

A number of CARP-related things have been fixed in the last few days, I'd upgrade before pursuing further.

Lenny

Thanks,
tried latest snapshot 2.1-BETA0-amd64-20121115 and the issue seems to be gone.
I will do some more tests though.

jimp

Most likely explanation:

Due to the CARP NAT oddities on your old snapshot (that have been fixed), your secondary unit was not able to ping its gateway, so it was marked down
XMLRPC sync happens
After that, a filter reload is performed, and the gateway check code sees that your WAN gateway is down, and flushes states to that gateway
pfsync propagates that action to the master node

Shouldn't be an issue on current snaps, but if you only have one WAN, you can disable the state killing behavior under System > Advanced on the Misc. tab.

Lenny

Yes, that was the reason as I found out that the backup FW wasn't able to connect anywhere as it was using carp IP as a source one so all replies went to master FW.
There are no issues since I've upgraded to newer snapshot.

Thanks guys.