XMLRPC Sync makes backup node's GUI unresponsive
-
I'm setting up a simple cluster (1 LAN, 1 WAN and a SYNC Interface) both runnig 2.3 but i'm getting troubles with the syncronization, if i force an update or make any change that requires it, the backup node's GUI stops responding.
On the master node, i get a notification stating "A communications error occurred while attempting XMLRPC sync with username admin https://10.5.2.2:443"
On the backup node i get a 504 from nginx and to get it back up, just needed to restart PHP-FPM. In that nodes logs i get
"[error] 87989#0: *93 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 10.181.0.201, server: , request: "GET /getstats.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "10.181.0.52", referrer: "https://10.181.0.52/""
Saw this issue in the 2.3.1 redmine (https://redmine.pfsense.org/issues/6328), which seems to be related, and just in case did an upgrade to 2.3.1-DEVELOPMENT but the problem remains the same.
Now after the restart, the logs shows this message "rc.php-fpm_restart >>> Found XMLRPC lock. Removing" and the gui starts responding again just like it did on the release one.
Sometimes after the php-fpm reset, some things sync. In the nginx i see this:
May 16 16:38:30 gateway-2 gateway-2.domain.com nginx: 10.5.2.1 - admin [16/May/2016:16:38:30 -0300] "POST /xmlrpc.php HTTP/1.0" 200 763 "-" "PEAR XML_RPC"
May 16 16:38:49 gateway-2 gateway-2.domain.com nginx: 10.5.2.1 - admin [16/May/2016:16:38:49 -0300] "POST /xmlrpc.php HTTP/1.0" 200 145 "-" "PEAR XML_RPC"
May 16 16:39:49 gateway-2 gateway-2.domain.com nginx: 10.5.2.1 - admin [16/May/2016:16:39:49 -0300] "POST /xmlrpc.php HTTP/1.0" 499 0 "-" "PEAR XML_RPC"I guess that 499 its a timeout from the client and that triggers the message on the master (after that, the gui on the backup kept working).
Could this be a bug or a misconfiguration on my side?
Edit: Just in case, both host are running on top of two proxmox 4.2 servers, LAN and SYNC are actually vlans on one of the proxmox interfaces.
Also the backup node works without problems when in master.
pfBlocker's sync does the same thing "/usr/local/www/pfblockerng/pfblockerng.php: XML_RPC_Client: RPC server did not send response before timeout. 103"
-
Having kinda the same issue here but in my case it's physical hardware and a specific interface for sync. After some firewall rules the 2nd firewall stops responding and I get this error on the first one:
A communications error occurred while attempting XMLRPC sync with username admin https://10.222.0.3:443. @ 2016-05-20 11:48:47
I'm able to sync again only after php-fpm restart. It happened 3 times already in about 10 minutes…
-
Still having the same issue, it's becoming really hard to manage it since I need to restart php-fpm a lot of times to remove the lock…
Last logs from nginx-errors:
2016/05/25 12:11:13 [error] 41431#0: *199 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 10.10.100.80, server: , request: "GET /widgets/widgets/system_information.widget.php?getupdatestatus=1 HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "10.10.115.238", referrer: "https://10.10.115.238/" 2016/05/25 13:29:04 [error] 41431#0: *2335 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 10.10.100.80, server: , request: "GET /widgets/widgets/system_information.widget.php?getupdatestatus=1 HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "10.10.115.238", referrer: "https://10.10.115.238/"
And from system.log:
May 24 11:25:59 fw2 xinetd[25532]: Starting reconfiguration May 24 11:25:59 fw2 xinetd[25532]: Swapping defaults May 24 11:25:59 fw2 xinetd[25532]: readjusting service 6969-udp May 24 11:25:59 fw2 xinetd[25532]: Reconfigured: new=0 old=1 dropped=0 (services) [u]May 24 11:25:59 fw2 php-cgi: rc.banner: PHP ERROR: Type: 1, File: /etc/inc/rrd.inc, Line: 60, Message: Call to undefined function gettext()[/u] May 24 11:26:00 fw2 php-fpm[71032]: /xmlrpc.php: The command '/usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid lagg0_vlan102 lagg0_vlan103 lagg0_vlan104 lagg0_vlan105 lagg0_vlan106 lagg0_vlan107 lagg0_vlan108' returned exit code '1', the output was 'Internet Systems Consortium DHCP Server 4.3.3-P1 Copyright 2004-2016 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Config file: /etc/dhcpd.conf Database file: /var/db/dhcpd.leases PID file: /var/run/dhcpd.pid Wrote 6882 leases to leases file. Listening on BPF/lagg0_vlan108/a0:36:9f:91:3c:c9/10.108.0.0/16 Sending on BPF/lagg0_vlan108/a0:36:9f:91:3c:c9/10.108.0.0/16 Listening on BPF/lagg0_vlan107/a0:36:9f:91:3c:c9/10.107.0.0/16 Sending on BPF/lagg0_vlan107/a0:36:9f:91:3c:c9/10.107.0.0/16 Listening on BPF/lagg0_vlan106/a0:36:9f:91:3c:c9/10.106.0.0/16 Sending on BPF/lagg0_vlan106/a0:36:9f:91:3c:c9/10.106.0.0/16 Listening on BPF/lagg0_vlan105/
And not sure if it is related but fw2 is creating crash reports with the following:
Crash report begins. Anonymous machine information: amd64 10.3-RELEASE-p3 FreeBSD 10.3-RELEASE-p3 #1 3ef16fb(RELENG_2_3_1): Tue May 17 19:34:13 CDT 2016 root@ce23-amd64-builder:/builder/pfsense-231/tmp/obj/builder/pfsense-231/tmp/FreeBSD-src/sys/pfSense Crash report details: PHP Errors: [25-May-2016 13:31:09 Europe/Berlin] PHP Fatal error: Call to undefined function gettext() in /etc/inc/rrd.inc on line 60 [25-May-2016 13:31:09 Europe/Berlin] PHP Fatal error: Call to undefined function gettext() in /etc/inc/rrd.inc on line 60
Any idea?
Thanks
-
I know this post is old, but I am having the same issues related to the backup firewall and the 502 bad gateway error. Has anyone found a solution?
-
I'm getting this exact same issue when trying to do an XMLRPC sync. In my specific case, it may or may not be related to HAProxy, which is the only package that I'm also syncing.
Has anyone got any ideas on this? A whole lot of nothing in the last few months.
Thanks!
-
The first thing to check is if the secondary can resolve names, check for updates, etc while in backup status. And if not why not.