pfSense 2.5.2 crash when enable Synchronize states

pszafer

@jimp thanks for response!
Only single node resetting itself on 2.5.2, it is master node if it's important.
It was working on 2.5.1.

One weird thing though, but it started in 2.5.1 (I think it was fine it 2.5.0).
Nrpe service is syncing settings between firewalls (bind IP) so it's not working properly.

I will test resetting states and check if this will crash this afternoon.

jimp

Config sync and state sync are unrelated, so problems in one area are unlikely to be related to the other (and vice versa).

Though if something isn't syncing correctly usually that's due to some other inconsistency in the HA configuration, such as interfaces not being assigned in the correct identical order, or somehow the internal interface names don't match up.

pszafer

@jimp I resetted states and it hangs anyway.
Interfaces are in identical order, I rechecked. Any other way to troubleshot except reinstall from scratch?

jimp

Is the hardware setup identical? Meaning are the two nodes either the exact same make/model or at least have the same network interface names?

I'm not aware of anything that would cause trouble like what you are seeing, so it's not quite clear what the next step might be to narrow it down.

What are your pfsync settings in the GUI on each node?

What does the interface config on both look like for the interface where the sync happens (e.g. ifconfig igbX)?

What does the status of the pfsync0 interface look like (ifconfig pfsync0)?

pszafer

@jimp said in pfSense 2.5.2 crash when enable Synchronize states:

Is the hardware setup identical? Meaning are the two nodes either the exact same make/model or at least have the same network interface names?

Hardware is different. Backup router is VM. Interface names are identical, but interface id's are different. First one has base lagg0 and second one has vtnet0

I'm not aware of anything that would cause trouble like what you are seeing, so it's not quite clear what the next step might be to narrow it down.

What are your pfsync settings in the GUI on each node?

pfSync on 1st node:

pfSync on 2nd node:

What does the interface config on both look like for the interface where the sync happens (e.g. ifconfig igbX)?

What does the status of the pfsync0 interface look like (ifconfig pfsync0)?

For the first node as HA is disabled right now it looks like:

pfsync0: flags=0<> metric 0 mtu 1500
        groups: pfsync

for second one:

pfsync0: flags=41<UP,RUNNING> metric 0 mtu 1500
        pfsync: syncdev: vtnet0.13 syncpeer: 192.168.100.1 maxupd: 128 defer: off
        syncok: 1
        groups: pfsync

I have pfsense in second branch of our office with similar setup so hardware router1 and virtual router2 on "old" pfSense 2.5.0 and it still works there without problem.

jimp

OK, if the interface names are different then pfsync isn't doing anything for you at the moment.

States are bound to specific interfaces and if those names are not identical on both (e.g. both lagg0 for the same interface like WAN/LAN or whatever) then the states would never match on the secondary node and vice versa.

That will change on future versions but is still true for now.

That may be the source of your problem, though I don't recall seeing it crash like that before.

For now I'd recommend keeping state synchronization off since it doesn't give you any benefits. That, or find a way to make the interface names line up, like changing the VM to use laggX interfaces even if the laggs only have a single member.

pszafer

@jimp I have it off for now, but I'm curious what's wrong.
You're saying that interface names should be the same, but would it end up for pfSense to crash?
If so, if I turn off second node, then enable pfSync over multicast in theory it should enable itself without crash?

jimp

I don't expect it to crash, but if the interfaces don't match then state sync does nothing for you. The states are there but they cannot match traffic.

Using an example here, say the LAN on the primary is ix0 and on the secondary it is igb0. The state will have something like this (very much simplified) ix0 from a.a.a.a:xxxx -> b.b.b.b:yyy

So even if the addresses match on the secondary, because the interface does not match, the state is ignored and the traffic would be dropped.

If you failover from one node to the other, all the connections have to be rebuilt by clients. In practice, you won't notice it a ton since most connections are short lived or things like UDP which are technically stateless. You'll see it most with persistent / long-term connections.

If you just do something like ping you may not notice since it's stateless as well.

What I'm saying is you have two choices:

Make the interface names match one way or another and keep pfsync enabled. I don't know if your crash is related to this, so it could still happen in this case, but it would narrow down the cause at least.
Disable pfsync and live with just config sync and let states be held individually on the HA nodes, which is essentially what you're doing now and didn't know.

SteveITS

@jimp said in pfSense 2.5.2 crash when enable Synchronize states:

That will change on future versions

Yay :) will be nice when replacing hardware.

@pszafer State sync does work with a one-NIC LAGG but that doesn't work with traffic shaping.

jimp

If you can tag a VLAN on that interface, traffic shaping does work with LAGG+VLANs since at that point traffic shaping is only on the VLAN, not the LAGG directly.

We hoped to have the updated pf code to let this work would be in 2.5.2 but it still needed some work and had to be backed out.

It's in 2.6.0 snapshots already but still needs work yet, may be a couple weeks before it's in a state were this would be testable in a viable way.