[solved in 2.2.6] 2.2.5 problems with XMLRPC sync - the server 2 is not updated
-
Any errors on the secondary in its system logs?
I can't reproduce any sync issues here on my 2.2.5 cluster.
-
Hi jimp,
unfortunately there is nothing in the log files of the server 2:
Nov 9 13:05:57 kernel: xl0: link state changed to DOWN Nov 9 13:05:57 check_reload_status: Linkup starting xl0 Nov 9 13:01:57 kernel: done. Nov 9 13:01:57 syslogd: kernel boot file is /boot/kernel/kernel Nov 9 13:01:56 syslogd: exiting on signal 15 Nov 9 13:01:56 kernel: done. Nov 9 13:01:56 php: rc.bootup: Creating rrd update script Nov 9 13:01:56 php: rc.bootup: The command '/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/ipsec-packets.rrd N:U:U:U:U:U:U:U:U' returned exit code '1', the output was 'ERROR: expected 4 data source readings (got 8) from N:U:U:U:U:U:U:U:U' Nov 9 13:01:56 php: rc.bootup: The command '/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/ipsec-traffic.rrd N:U:U:U:U:U:U:U:U' returned exit code '1', the output was 'ERROR: expected 4 data source readings (got 8) from N:U:U:U:U:U:U:U:U' Nov 9 13:01:29 php-fpm[286]: /index.php: Successful login for user 'admin' from: 10.0.26.4 Nov 9 13:01:29 php-fpm[286]: /index.php: Successful login for user 'admin' from: 10.0.26.4 Nov 9 13:01:26 snmpd[4584]: disk_OS_get_disks: adding device 'ada0' to device list Nov 9 13:01:23 kernel: done Nov 9 13:01:23 kernel: .done. Nov 9 13:01:22 kernel: .. Nov 9 13:01:21 kernel: . Nov 9 13:01:21 kernel: . Nov 9 13:01:20 check_reload_status: Updating all dyndns Nov 9 13:01:20 kernel: done. Nov 9 13:01:19 php: rc.bootup: Static Routes: Gateway IP could not be found for 192.168.32.0/24 Nov 9 13:01:19 php: rc.bootup: The command '/sbin/route change -inet '192.168.214.195/32' -iface 'ovpnc2'' returned exit code '68', the output was 'route: bad address: ovpnc2' Nov 9 13:01:19 kernel: done. Nov 9 13:01:19 php: rc.bootup: ROUTING: setting default route to XXX.XXX.XXX.XXX Nov 9 13:01:19 lighttpd[81817]: (log.c.194) server started Nov 9 13:01:18 kernel: done. Nov 9 13:00:57 kernel: done. Nov 9 13:00:55 kernel: .done. Nov 9 13:00:55 kernel: .. Nov 9 13:00:54 kernel: . Nov 9 13:00:54 kernel: . Nov 9 13:00:53 kernel: pflog0: promiscuous mode enabled Nov 9 13:00:53 kernel: tun1: changing name to 'ovpnc1' Nov 9 13:00:53 kernel: Nov 9 13:00:53 kernel: done. Nov 9 13:00:53 php: rc.bootup: Resyncing OpenVPN instances. Nov 9 13:00:53 php: rc.bootup: Configuring CARP settings finalize...
(xl0 is not the SYNC interface.)
Both servers can successfully ping each other.
I will try to get more information about the problem…
Best regards
yarick123 -
I have debugged the XMLRPC communication. Some XMLRPC packets are sent and delivered from server 1 to server 2. Some packets are sent, but not delivered. Server 2 has not these packets even in the lighthttpd-access.log, extra configured by me.
Server 1 does not get any answer on such packets exactly 60 seconds. Then server 1 gets an empty answer, which cannot be parsed as a correct XMLRPC answer and the error "Code 2: Invalid return payload: enable debugging to examine incoming payload" is generated.
Interesting is, that the not delivered packages are quite long - ~630K.
One more interesting thing is, that the same XMLRPC error message was shown sometimes by pfSense 2.2.4, but the configs were at the end synchronized.
After turning on lighthttpd debug.log-request-header on server 2, the log shows, that the request was delivered to server 2.
I will try to get some more information…
Best regards
yarick123 -
After the logging options tuning here are some more log data from the lighthttpd.
The first request, the method pfsense.host_firmware_version is successfully called, a correct response is received:
Nov 12 10:33:36 pf2 rc.php-fpm_restart[26389]: >>> Restarting php-fpm Nov 12 10:33:36 pf2 system[26689]: [WARNING] The maximum number of processes has been reached. Please review your configuration and consider raising 'process.max' Nov 12 10:34:02 pf2 lighttpd[44606]: (request.c.311) fd: 8 request-len: 163 \nPOST /xmlrpc.php HTTP/1.0\r\nUser-Agent: PEAR XML_RPC\r\nHost: 192.168.11.2\r\nAuthorization: Basic XXXXXXXXXXXXXXXXXXXX=\r\nContent-Type: text/xml\r\nContent-Length: 207\r\n\r\n Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.336) -- splitting Request-URI Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.337) Request-URI : /xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.338) URI-scheme : http Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.339) URI-authority : 192.168.11.2 Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.340) URI-path (raw) : /xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.341) URI-path (clean): /xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.342) URI-query : Nov 12 10:34:02 pf2 lighttpd[44606]: (mod_access.c.137) -- mod_access_uri_handler called Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.465) -- before doc_root Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.466) Doc-Root : /usr/local/www/ Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.467) Rel-Path : /xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.468) Path : Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.516) -- after doc_root Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.517) Doc-Root : /usr/local/www/ Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.518) Rel-Path : /xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.519) Path : /usr/local/www/xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.536) -- logical -> physical Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.537) Doc-Root : /usr/local/www/ Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.538) Basedir : /usr/local/www/ Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.539) Rel-Path : /xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.540) Path : /usr/local/www/xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.557) -- handling physical path Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.558) Path : /usr/local/www/xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.565) -- file found Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.566) Path : /usr/local/www/xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.716) -- handling subrequest Nov 12 10:34:02 pf2 lighttpd[44606]: (response.c.717) Path : /usr/local/www/xmlrpc.php Nov 12 10:34:02 pf2 lighttpd[44606]: (mod_access.c.137) -- mod_access_uri_handler called Nov 12 10:34:02 pf2 lighttpd[44606]: (mod_compress.c.743) -- handling file as static file Nov 12 10:34:02 pf2 lighttpd[44606]: (mod_fastcgi.c.3518) handling it in mod_fastcgi Nov 12 10:34:03 pf2 lighttpd[44606]: (response.c.124) Response-Header: \nHTTP/1.0 200 OK\r\nExpires: Sat, 14 Nov 2015 11:34:01 GMT\r\nCache-Control: max-age=180000\r\nContent-Length: 1184\r\nContent-Type: text/xml; charset=UTF-8\r\nConnection: close\r\nDate: Thu, 12 Nov 2015 09:34:02 GMT\r\nServer: lighttpd/1.4.37\r\n\r\n
The second request, the method pfsense.restore_config_section is called, the connection was closed in ~1 minute by the read timeout (!), no response is logged, the request was repeated:
Nov 12 10:34:03 pf2 lighttpd[44606]: (request.c.311) fd: 8 request-len: 166 \nPOST /xmlrpc.php HTTP/1.0\r\nUser-Agent: PEAR XML_RPC\r\nHost: 192.168.11.2\r\nAuthorization: Basic XXXXXXXXXXXXXXXXXXXX=\r\nContent-Type: text/xml\r\nContent-Length: 646760\r\n\r\n Nov 12 10:35:06 pf2 lighttpd[44606]: (server.c.1308) connection closed - read timeout: 8 Nov 12 10:35:06 pf2 lighttpd[44606]: (request.c.311) fd: 8 request-len: 166 \nPOST /xmlrpc.php HTTP/1.0\r\nUser-Agent: PEAR XML_RPC\r\nHost: 192.168.11.2\r\nAuthorization: Basic XXXXXXXXXXXXXXXXXXXX=\r\nContent-Type: text/xml\r\nContent-Length: 646760\r\n\r\n Nov 12 10:36:09 pf2 lighttpd[44606]: (server.c.1308) connection closed - read timeout: 8
And at the end the method pfsense.filter_configure was successfully called:
Nov 12 10:36:09 pf2 lighttpd[44606]: (request.c.311) fd: 8 request-len: 163 \nPOST /xmlrpc.php HTTP/1.0\r\nUser-Agent: PEAR XML_RPC\r\nHost: 192.168.11.2\r\nAuthorization: Basic XXXXXXXXXXXXXXXXXXXX=\r\nContent-Type: text/xml\r\nContent-Length: 202\r\n\r\n Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.336) -- splitting Request-URI Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.337) Request-URI : /xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.338) URI-scheme : http Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.339) URI-authority : 192.168.11.2 Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.340) URI-path (raw) : /xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.341) URI-path (clean): /xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.342) URI-query : Nov 12 10:36:09 pf2 lighttpd[44606]: (mod_access.c.137) -- mod_access_uri_handler called Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.465) -- before doc_root Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.466) Doc-Root : /usr/local/www/ Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.467) Rel-Path : /xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.468) Path : Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.516) -- after doc_root Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.517) Doc-Root : /usr/local/www/ Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.518) Rel-Path : /xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.519) Path : /usr/local/www/xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.536) -- logical -> physical Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.537) Doc-Root : /usr/local/www/ Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.538) Basedir : /usr/local/www/ Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.539) Rel-Path : /xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.540) Path : /usr/local/www/xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.557) -- handling physical path Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.558) Path : /usr/local/www/xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.565) -- file found Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.566) Path : /usr/local/www/xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.716) -- handling subrequest Nov 12 10:36:09 pf2 lighttpd[44606]: (response.c.717) Path : /usr/local/www/xmlrpc.php Nov 12 10:36:09 pf2 lighttpd[44606]: (mod_access.c.137) -- mod_access_uri_handler called Nov 12 10:36:09 pf2 lighttpd[44606]: (mod_compress.c.743) -- handling file as static file Nov 12 10:36:09 pf2 lighttpd[44606]: (mod_fastcgi.c.3518) handling it in mod_fastcgi Nov 12 10:36:09 pf2 php-fpm[26795]: /xmlrpc.php: ROUTING: setting default route to XXX.XXX.XXX.XXX Nov 12 10:36:09 pf2 check_reload_status: Reloading filter Nov 12 10:36:09 pf2 php-fpm[26795]: /xmlrpc.php: The command '/sbin/route change -inet '192.168.214.195/32' -iface 'ovpnc2'' returned exit code '68', the output was 'route: bad address: ovpnc2' Nov 12 10:36:09 pf2 php-fpm[26795]: /xmlrpc.php: Static Routes: Gateway IP could not be found for 192.168.32.0/24 Nov 12 10:36:10 pf2 php-fpm[26795]: /xmlrpc.php: Resyncing OpenVPN instances. Nov 12 10:36:37 pf2 lighttpd[44606]: (response.c.124) Response-Header: \nHTTP/1.0 200 OK\r\nExpires: Sat, 14 Nov 2015 11:36:09 GMT\r\nCache-Control: max-age=180000\r\nContent-Length: 565\r\nContent-Type: text/xml; charset=UTF-8\r\nConnection: close\r\nDate: Thu, 12 Nov 2015 09:36:36 GMT\r\nServer: lighttpd/1.4.37\r\n\r\n
So, the problem is, that the connection with the call of pfsense.restore_config_section is closed in my situation always by the read timeout. I will try to increase the read timeout…
Regards
yarick123 -
increasing of the read timeout does not help :(
-
I am not alone … https://www.reddit.com/r/PFSENSE/comments/3s3kql/xmlrpc_sync_bug_in_225release/
-
I have solved the problem by replacing the new version of /usr/local/sbin/lighttpd (1.4.37) and the libraries (/usr/local/lib/lighttpd/) with the old one (1.4.35) from the pfSense-2.2.4-RELEASE on the server 2.
-
I have exactly the same problem.
Reverted to 2.2.4 snapshot until there's a solution.
-
Thanks for the leg work there, yarick123. I still can't replicate that anywhere, but knowing the source of the issue will help greatly. Will be looking more into it soon.
-
@cmb:
Thanks for the leg work there, yarick123. I still can't replicate that anywhere, but knowing the source of the issue will help greatly. Will be looking more into it soon.
Do you have a fresh install?
-
Do you have a fresh install?
All the systems I had tested with thus far were upgraded. Just checked a clean install and no diff there, works.
-
Strange.
Fact is some people is getting this problem. :'(
@cmb:
Do you have a fresh install?
All the systems I had tested with thus far were upgraded. Just checked a clean install and no diff there, works.
-
Same here! Before 2.2.5-RELEASE I never had this issue.
Thanks,
Michele -
Fact is some people is getting this problem. :'(
I'm not denying it's a problem. Just it's far from universal, and I don't have any case where it's replicable. Could someone that's seeing this share their config backups with me? Can PM me here, or email to cmb at pfsense dot org as attachments or to arrange another means of transfer.
I opened a bug ticket.
https://redmine.pfsense.org/issues/5509 -
Thank you Chris,
I just sent you the config and the syslog of both boxes via email.Thanks,
Michele -
I have a pair of boxes where I can reproduce this if necessary.
-
Hi!
I was testing 2.2.5 to deploy on networks that have HA setup, had the same problem.
The test was done in Virtual Box, states are synced without problem, but XMLRPC Sync doesn't work.
I've attached "configs.zip" containing the backup of both VMs I used in the HA setup, also a PDF(topology.zip) containing a diagram of the network topology. If you need something tested, please let me know.
-
Hi
I also have a pair of 2.2.5-RELEASE (amd64) built on Wed Nov 04 15:49:37 CST 2015 FreeBSD 10.1-RELEASE-p24, and I also have the SYNC issue. If I change/add a few new rules, and save - sometimes some of them are synced.
Thanks
Felix -
Is there a Bug item for this on Redmine or Git?
I'm trying to learn my way around the files and procedures to work on the packages, but it's being a little hard to get clear info on how to commit changes, interact and communicate around there.
-
Is there a Bug item for this on Redmine or Git?
I'm trying to learn my way around the files and procedures to work on the packages, but it's being a little hard to get clear info on how to commit changes, interact and communicate around there.
This was referenced on the previous page.
https://redmine.pfsense.org/issues/5509