XMLRPC sync fails after HAProxy configuration change
-
Hello,
I'm posting this issue here as I couldn't fine a similar one yet.
We are running a pair of pfSense+ (24.11).
For now few weeks we realize a break of XMLRPC sync after each change done on the HAProxy configuration.
It seems no other change on any other aspect of the firewall is causing this issue.The current workaround is to log in on the cli and restart the php-fpm process. Then after a change, the XMLRPC sync works normally again.
Here are the logs on the firewalls (firewall01 = master, firewall02 = backup)
The logs provided are given after triggering an HAProxy configuration change.firewall01 (Master) system logs:
# system.log Mar 25 10:02:51 firewall01 php[8665]: /usr/local/pkg/haproxy/haproxy_sync.sh: Beginning XMLRPC sync data to https://x.x.x.2:444/xmlrpc.php. Mar 25 10:03:43 firewall01 php-fpm[23068]: /rc.filter_synchronize: Retrying XMLRPC Request due to error: A communications error occurred while attempting to call XMLRPC method host_firmware_version: Request timed out due to default_socket_timeout php.ini setting Mar 25 10:03:44 firewall01 php-fpm[23068]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://x.x.x.2:444/xmlrpc.php. Mar 25 10:03:44 firewall01 php[9050]: /usr/local/pkg/haproxy/haproxy_sync.sh: A communications error occurred while attempting to call XMLRPC method exec_php: Request timed out due to default_socket_timeout php.ini setting Mar 25 10:03:44 firewall01 php[9050]: /usr/local/pkg/haproxy/haproxy_sync.sh: New alert found: A communications error occurred while attempting to call XMLRPC method exec_php: Request timed out due to default_socket_timeout php.ini setting Mar 25 10:03:51 firewall01 php[8665]: /usr/local/pkg/haproxy/haproxy_sync.sh: Retrying XMLRPC Request due to error: A communications error occurred while attempting to call XMLRPC method merge_installedpackages_section: Request timed out due to default_socket_timeout php.ini setting Mar 25 10:03:52 firewall01 php[8665]: /usr/local/pkg/haproxy/haproxy_sync.sh: Beginning XMLRPC sync data to https://x.x.x.2:444/xmlrpc.php. Mar 25 10:04:44 firewall01 php-fpm[23068]: /rc.filter_synchronize: Retrying XMLRPC Request due to error: A communications error occurred while attempting to call XMLRPC method host_firmware_version: Request timed out due to default_socket_timeout php.ini setting Mar 25 10:04:45 firewall01 php-fpm[23068]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://x.x.x.2:444/xmlrpc.php. Mar 25 10:04:53 firewall01 php[8665]: /usr/local/pkg/haproxy/haproxy_sync.sh: Retrying XMLRPC Request due to error: A communications error occurred while attempting to call XMLRPC method merge_installedpackages_section: Request timed out due to default_socket_timeout php.ini setting Mar 25 10:04:54 firewall01 php[8665]: /usr/local/pkg/haproxy/haproxy_sync.sh: Beginning XMLRPC sync data to https://x.x.x.2:444/xmlrpc.php. Mar 25 10:05:45 firewall01 php-fpm[23068]: /rc.filter_synchronize: Retrying XMLRPC Request due to error: A communications error occurred while attempting to call XMLRPC method host_firmware_version: Request timed out due to default_socket_timeout php.ini setting Mar 25 10:05:46 firewall01 php-fpm[23068]: /rc.filter_synchronize: Beginning XMLRPC sync data to https://x.x.x.2:444/xmlrpc.php. Mar 25 10:05:54 firewall01 php[8665]: /usr/local/pkg/haproxy/haproxy_sync.sh: Retrying XMLRPC Request due to error: A communications error occurred while attempting to call XMLRPC method merge_installedpackages_section: Request timed out due to default_socket_timeout php.ini setting Mar 25 10:05:55 firewall01 php[8665]: /usr/local/pkg/haproxy/haproxy_sync.sh: Beginning XMLRPC sync data to https://x.x.x.2:444/xmlrpc.php.
firewall 02 (Backup) System/Nginx logs
# system.log Mar 25 10:05:53 firewall02 nginx: 2025/03/25 10:05:53 [error] 39998#101165: *1377 upstream timed out (60: Operation timed out) while sending request to upstream, client: x.x.x.1, server: , request: "POST /xmlrpc.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "x.x.x.2:444" Mar 25 10:06:54 firewall02 nginx: 2025/03/25 10:06:54 [error] 39998#101165: *1381 upstream timed out (60: Operation timed out) while sending request to upstream, client: x.x.x.1, server: , request: "POST /xmlrpc.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "x.x.x.2:444" Mar 25 10:33:01 firewall02 php-fpm[68356]: haproxy: reload old pid:65058 Mar 25 10:33:01 firewall02 php-fpm[68356]: Mar 25 10:33:01 firewall02 php-fpm[68356]: haproxy: started new pid:42479 Mar 25 10:33:01 firewall02 php-fpm[68356]: Mar 25 11:06:23 firewall02 php-fpm[98218]: haproxy: reload old pid:42479 Mar 25 11:06:23 firewall02 php-fpm[98218]: Mar 25 11:06:23 firewall02 php-fpm[98218]: haproxy: started new pid:92170 Mar 25 11:17:27 firewall02 login[39762]: login on ttyv0 as root # <-- logging-in and restarting php-fpm Mar 25 11:17:45 firewall02 rc.php-fpm_restart[75157]: >>> Found XMLRPC lock. Removing. Mar 25 11:17:45 firewall02 rc.php-fpm_restart[76330]: >>> Restarting php-fpm Mar 25 11:17:45 firewall02 check_reload_status[77072]: check_reload_status is starting. # nginx.log Mar 25 10:05:53 firewall02 nginx: 2025/03/25 10:05:53 [error] 39998#101165: *1377 upstream timed out (60: Operation timed out) while sending request to upstream, client: x.x.x.1, server: , request: "POST /xmlrpc.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "x.x.x.2:444" Mar 25 10:05:53 firewall02 nginx: x.x.x.1 - admin [25/Mar/2025:10:05:53 +0100] "POST /xmlrpc.php HTTP/1.1" 504 2022 "-" "PEAR::XML_RPC2/@package_version@"**bolded text**
An information I could give if I consider the different ongoing/recently-addressed issues regarding HAProxy size configuration on pfsense+; our configuration section of our HAProxy is ~4K Lines big on the XML backup to give a rough estimate.
Not sure how to tackle this issue. I would be glad to provide any other relevant information if needed.
-
@YannL I have experienced similar (or same) issues since years and could not fix it. All investigation I did, did not find the real issue.
- dedicated HA sync interface which is officially supported (in our case an additional LAN card with intel chipset instead of built in RJ45 ports), MTU lowered to 1360 instead of default 1500 (https://forum.netgate.com/topic/190990/ha-sync-does-not-work-error-operation-timed-out/2)
- Interface physical connection is stable (permanent ping with high frequency, big packets, etc. all stable and fast
- iperf3 tests forward and backwards are close to the theoretical maximum of 1 GBit/s
- firewall rules: allow any - any on each HA interface in both pfsense cluster members
- user login credentials and permissions for the ha user checked
- webconfigurator processes set to 500 (max allowed value)
- cpu is idle (never less than 95% free)
- ram is > 90% free
- mbuffers increased
- io statistics of SSDs on both sides are very low
- bandwidth usage on ha interfaces during ha sync / xmlrpc is very low (just some kbit/s)
- checked nginx logs, system logs, error logs => no specific reason found
- checked the ha documentation multiple times: no error in our config found.
- each time ha sync / xmlrpc is happening, I can see on the console of the backup/passive member messages that port bindings of the captive portal ports fail:
Message from syslogd@srvwlan02 at Apr 10 01:39:53 ... nginx: 2025/04/10 01:39:53 [emerg] 44580#101678: bind() to [::]:8006 failed (48: Address already in use)
maybe you have some idea what you can check further or not. We are lost...