XMLRPC sync errors since upgrade to 2.4.4

Derelict

XMLRPC sync is working fine for lots and lots of people in 2.4.4, 2.4.4-p1 or 2.4.4-p2. It is something else unique to your setup.

Do you have State Killing on Gateway Failure enabled? (System > Advanced, Miscellaneous)

DrNick 0

@derelict Perhaps that should be "setups"? Problem still exists for me too with my config described above on -p1.

netblues

@derelict xmlrpc sync IS working fine even with the error.
And yes, state killing on gateway failure seems to nail it.
Unchecking the box eliminates xmlrpcsync errors.
I don't recall anymore why this was checked in the first place, but IMHO looks like a bug to me.

bbrendon

It seems to me the pfsense devs are still in denial about this one. The syncing is working so I just ignore it.

Derelict

@netblues said in XMLRPC sync errors since upgrade to 2.4.4:

I don't recall anymore why this was checked in the first place, but IMHO looks like a bug to me.

If you are killing the state XMLRPC sync is using the connection will fail in different ways.

jimp

There is no bug. There is nothing to be in denial about.

You chose the option to kill states on gateway failure
You have a gateway down
XMLRPC sync triggers a filter reload
Firewall notices the down gateway and kills states
XMLRPC dies because the state died

It's doing exactly what you told it to do. It may not be what you intended it to do, but it's doing what you told it to do.

Fix the down gateway or unset that option.

netblues

@jimp So what you say is that whenever I update a firewall rule I have a gateway down?

jimp

Any time there is a filter reload (applying firewall rules, interface events, schedules, etc) it checks for down gateways and kills states if you have that option enabled.

Caligari

@derelict said in XMLRPC sync errors since upgrade to 2.4.4:

Do you have State Killing on Gateway Failure enabled? (System > Advanced, Miscellaneous)

Yes! Checked on primary and unchecked on secondary, but unchecked both and the problem has disappeared

Now, I am wondering in what way "state killing on gw failure" is related to the "xmlrpc sync"...

Thank you for the support!

netblues

It wouldn't be the case in a pre 2.4.4 setup for sure.

So it is really All states killing in gateway failure, not just the ones related to the gateway.
In my case I have 2 gateways being down on secondary (because they are used by primary)
Disabling the check on secondary and keeping it on primary (which has no down gw normaly) works fine.

I suppose that if all states are killed, nginx looses the connection while expecting the final ok from standby peer thus complaining.
I just wonder in @Caligari situation if state kiling on primary also affects the admin http connection.

jimp

It's been the same since 2.3.x, not a new change. If it worked before, it was only by accident.

netblues

@jimp So, an accidental feature then :). Which makes me wonder if kill all states was really working on a pre 2.4.4 setup.
I do recall switching from master to backup whlie checking voip connections and not loosing them or the https connection to the console.
I just checked and it now affects the web console as the case should be. @2.4.4p2

win win situation :P

netblues

Now I know why I have kill states on gateway failover.
sip states!!
Without that, sip registrations don't work after failover until states are cleared manually.
Funny thing is that current calls via pf aren't lost and keep working via the f/o peer.
However new calls don't work.
Obviously at the same time sip host can ping all sip remote gw via pfsense just fine.

I believe we need an exclusion here. Sync interface is a special use interface, and doesn't have a gateway too.
How about a feature of not clearing states on interfaces that do NOT have a gateway?
Will that break anything ?
(it would be better to fix sip issue as it dates back years too)

jimp

The only way that happens is if you have a gateway somewhere that is down. The sync interface wouldn't normally be considered at all, it's just an innocent bystander. Look at your gateway list and see what shows as 'down', and fix that.

netblues

@jimp Innocent it is. however it does produce lots or noise emails.
As for the gateways, well, nothing is down apart from openvpn bound to carp interfaces that go up when secondary node kicks in.
So.. it is technically down but it cannot be "fixed" since it aint broken :)

I understand, its a feature, but......

jimp

Unless you are using those OpenVPN gateways in a gateway group, you can disable monitoring for them so they are always considered up, and thus would not trigger the state kill.

netblues

@jimp As a matter of fact I do, but it starts to feel too limited anyway, don't you think?
Especially when this was "fixed" in 2.4.4
And what if one has pppoe interfaces bound to carp vips, which is much more common, and also needs gateway monitoring?

jimp

Again, nothing changed here in 2.4.4. If it worked at all before, it was by coincidence. This has always been the expected behavior of state killing on gateway failure when you have gateways that are down.

netblues

I totally agree that this was the EXPECTED behaviour.
Moving forward, the situation is simple.
Whoever has state killing on gateway failure and an active/standby pair faces constant xml rpc sync errors on primary on every change, (and by mail too), which do raise concerns even to experienced net admins.

There are good reasons to have state killing on (voip being the main one) and it is not always possible not to have gateways that are down in an active/standby setup, by design. (since pppoe is too dominant to ignore)
So I humbly request a feature enhancement that will eliminate the errors (making an exception of the sync interface from state clearance being probably the most straight forward solution)

jimp

It is avoidable if you configure it as I stated above. The sync interface has nothing to do with it. It isn't nearly as "simple" as you imply. The states are flushed entirely, as they must be, there is no way to make an exception for any interface in pfctl.

If you want that, make a feature request upstream for pfctl in FreeBSD.