XMLRPC sync errors since upgrade to 2.4.4

AcaaliK

Hello All,

I am facing the same issue after an upgrade from 2.4.3 to 2.4.4, I have gone through all the checks suggested on the thread and most are ok with the exception of an entry in Secondary system logs under the general tab. The error is XMLRPC unbound /var/unbound/root.key corrupt deleted and recreated each time a sync is performed.

On the primary node I will get the sporadic XMLRPC communication errors stated here. Please note the sync is successful and the changes from the primary are reflected on the secondary with some delay. This only started after the upgrade.

netblues

I'm facing exactly the same issue. And after upgrading to 2.4.4p1 from 2.4.3
Settings are replicated, however I see this on the secondary.

nginx: 2018/12/17 16:36:37 [crit] 79693#100242: *18691 SSL_write() failed (SSL:) (13: Permission denied) while sending to client, client: 192.168.50.3, server: , request: "POST /xmlrpc.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "192.168.50.4"

50.3 is the primary and 50.4 is the secondary
Any ideas?

It looks like the config is received but the ack is never send back to the primary, thus the complaint.

Derelict

Permission denied is almost always something being blocked by policy.

Are you running snort or suricata?

Is it enabled on the sync interface?

netblues

@derelict No snort or suricata ever installed. Not even pfblocker.

stephenw10

Do you see that same error if you just save the Unbound settings page on the secondary without making any changes?

Does Unbound actually start on the secondary?

Is the filesystem full?

Steve

netblues

on the secondary...

@stephenw10

/root: df -h
Filesystem                                         Size    Used   Avail Capacity  Mounted on
/dev/gptid/5bd4713a-8d68-11e8-aed9-5b3c92e7c0e9     18G    1.0G     16G     6%    /
devfs                                              1.0K    1.0K      0B   100%    /dev
/dev/md0                                           3.4M    156K    3.0M     5%    /var/run
devfs                                              1.0K    1.0K      0B   100%    /var/dhcpd/dev

code

 ps -alx | grep unb
 59 91398     1   0  20  0 48640 23480 kqread   Is    -      0:00.16 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
  0 86919 48899   0  20  0  6564  2456 piperd   S+    0      0:00.00 grep unb

code

and no it is not happening if I save settings on secondary on dns resolv

netblues

So I reverted everything to http

nginx: 2018/12/19 10:01:20 [alert] 91226#100384: *10 writev() failed (13: Permission denied) while sending to client, client: 192.168.50.3, server: , request: "POST /xmlrpc.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "192.168.50.4"

It is not an ssl issue too.
Where does this permission denied comes from?

jimp

Permission denied in the nginx log means something cut it off, like a state being killed/removed or possibly a firewall rule prevented the outbound connection.

netblues

@jimp Well, definitely not on the configuration. It is repeatable on every config update, and others have it too.
If it was linux I would look at selinux...
Now, since we are talking freebsd here, it looks like an audit denial (but my freebsd knowledge is limited)

Caligari

@jimp said in XMLRPC sync errors since upgrade to 2.4.4:

Permission denied in the nginx log means something cut it off, like a state being killed/removed or possibly a firewall rule prevented the outbound connection.

We have been suffering this problem since 2.4.4 upgrade and insisted with 2.4.4-p1...

Does 2.4.4-p2 solve this problem? (it announces a lot of bugfixes with nginx/php)

Derelict

XMLRPC sync is working fine for lots and lots of people in 2.4.4, 2.4.4-p1 or 2.4.4-p2. It is something else unique to your setup.

Do you have State Killing on Gateway Failure enabled? (System > Advanced, Miscellaneous)

DrNick 0

@derelict Perhaps that should be "setups"? Problem still exists for me too with my config described above on -p1.

netblues

@derelict xmlrpc sync IS working fine even with the error.
And yes, state killing on gateway failure seems to nail it.
Unchecking the box eliminates xmlrpcsync errors.
I don't recall anymore why this was checked in the first place, but IMHO looks like a bug to me.

bbrendon

It seems to me the pfsense devs are still in denial about this one. The syncing is working so I just ignore it.

Derelict

@netblues said in XMLRPC sync errors since upgrade to 2.4.4:

I don't recall anymore why this was checked in the first place, but IMHO looks like a bug to me.

If you are killing the state XMLRPC sync is using the connection will fail in different ways.

jimp

There is no bug. There is nothing to be in denial about.

You chose the option to kill states on gateway failure
You have a gateway down
XMLRPC sync triggers a filter reload
Firewall notices the down gateway and kills states
XMLRPC dies because the state died

It's doing exactly what you told it to do. It may not be what you intended it to do, but it's doing what you told it to do.

Fix the down gateway or unset that option.

netblues

@jimp So what you say is that whenever I update a firewall rule I have a gateway down?

jimp

Any time there is a filter reload (applying firewall rules, interface events, schedules, etc) it checks for down gateways and kills states if you have that option enabled.

Caligari

@derelict said in XMLRPC sync errors since upgrade to 2.4.4:

Do you have State Killing on Gateway Failure enabled? (System > Advanced, Miscellaneous)

Yes! Checked on primary and unchecked on secondary, but unchecked both and the problem has disappeared

Now, I am wondering in what way "state killing on gw failure" is related to the "xmlrpc sync"...

Thank you for the support!

netblues

It wouldn't be the case in a pre 2.4.4 setup for sure.

So it is really All states killing in gateway failure, not just the ones related to the gateway.
In my case I have 2 gateways being down on secondary (because they are used by primary)
Disabling the check on secondary and keeping it on primary (which has no down gw normaly) works fine.

I suppose that if all states are killed, nginx looses the connection while expecting the final ok from standby peer thus complaining.
I just wonder in @Caligari situation if state kiling on primary also affects the admin http connection.