XMLRPC doesn't sync, CARP works



  • Hi, I have read a few threads regarding this problem but can't find any fix/solution.

    So XMLRPC sync doesn't work for me, but CARP failover does.

    Setup is as follows:

    Interfaces set up in the same order:
    vm0 (WAN) used for perimeter, but used for management momentary
    (this vmnic is set to a native/access VLAN on the hypervisor with usual hypervisor stuff: "Promiscuous mode" “MAC Address changes” “Forged transmits” and Net.ReversePathFwdCheckPromisc set to true)
    vmx1 (LAN) used for sync (pfsync, XMLRPC)
    (this vmnic is set to a native/access VLAN on the hypervisor)
    vmx2 (OPT) not even used yet, not even up now
    Both nodes have a firewall rule that permit any traffic on the sync interface
    Both nodes have the same subnet on each for sync interface, and their interfaces are up
    Only the primary node has the various XMLRPC config sync options checked
    WebUI is set with the same username and port on both nodes
    Both are on the same version 2.4.4p3

    Node 1 and 2 can ping each other over their sync interfaces
    Node 1 and 2 can reach each other on the TCP port specified for WebGUI over their sync interfaces
    No blocked entries in the firewall log for the sync interface under Status > System Logs on Firewall
    No Logs on secondary (Status > System Logs on the System/General tab)
    Tried different ports with and without SSL without success (ie 80, 443, others)

    Classical HTTP sync try:
    This shows up on the Status > System Logs > General tab on primary node:

    May 25 17:44:43	php-fpm	1354	/rc.filter_synchronize: New alert found: A communications error occurred while attempting to call XMLRPC method restore_config_section:
    May 25 17:44:43	php-fpm	1354	/rc.filter_synchronize: A communications error occurred while attempting to call XMLRPC method restore_config_section:
    May 25 17:43:43	php-fpm	1354	/rc.filter_synchronize: Beginning XMLRPC sync data to http://192.168.91.252:10080/xmlrpc.php.
    May 25 17:43:43	php-fpm	1354	/rc.filter_synchronize: New alert found: A communications error occurred while attempting to call XMLRPC method restore_config_section:
    May 25 17:43:43	php-fpm	1354	/rc.filter_synchronize: A communications error occurred while attempting to call XMLRPC method restore_config_section:
    May 25 17:42:46	php-fpm	347	/rc.filter_synchronize: New alert found: A communications error occurred while attempting to call XMLRPC method restore_config_section:
    May 25 17:42:46	php-fpm	347	/rc.filter_synchronize: A communications error occurred while attempting to call XMLRPC method restore_config_section:
    May 25 17:42:43	check_reload_status		Reloading filter
    May 25 17:42:43	php-fpm	1354	/rc.filter_synchronize: Beginning XMLRPC sync data to http://192.168.91.252:10080/xmlrpc.php.
    May 25 17:42:43	php-fpm	1354	/rc.filter_synchronize: XMLRPC versioncheck: 19.1 -- 19.1
    May 25 17:42:43	php-fpm	1354	/rc.filter_synchronize: XMLRPC reload data success with http://192.168.91.252:10080/xmlrpc.php (pfsense.host_firmware_version).
    May 25 17:42:43	php-fpm	1354	/rc.filter_synchronize: Beginning XMLRPC sync data to http://192.168.91.252:10080/xmlrpc.php.
    May 25 17:42:42	check_reload_status		Syncing firewall
    May 25 17:41:46	php-fpm	347	/rc.filter_synchronize: Beginning XMLRPC sync data to http://192.168.91.252:10080/xmlrpc.php.
    May 25 17:41:46	php-fpm	347	/rc.filter_synchronize: New alert found: A communications error occurred while attempting to call XMLRPC method restore_config_section:
    May 25 17:41:46	php-fpm	347	/rc.filter_synchronize: A communications error occurred while attempting to call XMLRPC method restore_config_section:
    May 25 17:41:17	php-fpm	1354	/system_hasync.php: Configuring CARP settings finalize...
    May 25 17:41:17	php-fpm	1354	/system_hasync.php: pfsync done in 30 seconds.
    May 25 17:40:46	php-fpm	347	/rc.filter_synchronize: Beginning XMLRPC sync data to http://192.168.91.252:10080/xmlrpc.php.
    May 25 17:40:46	php-fpm	347	/rc.filter_synchronize: XMLRPC versioncheck: 19.1 -- 19.1
    May 25 17:40:46	php-fpm	347	/rc.filter_synchronize: XMLRPC reload data success with http://192.168.91.252:10080/xmlrpc.php (pfsense.host_firmware_version).
    May 25 17:40:46	php-fpm	347	/rc.filter_synchronize: Beginning XMLRPC sync data to http://192.168.91.252:10080/xmlrpc.php.
    May 25 17:40:46	php-fpm	1354	/system_hasync.php: waiting for pfsync...
    May 25 17:40:45	check_reload_status		Syncing firewall
    

    HTTPS sync try:
    This shows up on the Status > System Logs > General tab on primary node:

    May 25 18:16:45	php-fpm	347	/system_hasync.php: Configuring CARP settings finalize...
    May 25 18:16:45	php-fpm	347	/system_hasync.php: pfsync done in 30 seconds.
    May 25 18:16:34	php-fpm	25822	/rc.filter_synchronize: The pfSense software configuration version of the other member could not be determined. Skipping synchronization to avoid causing a problem!
    May 25 18:16:34	php-fpm	25822	/rc.filter_synchronize: XMLRPC versioncheck: -- 19.1
    May 25 18:16:34	php-fpm	25822	/rc.filter_synchronize: New alert found: A communications error occurred while attempting to call XMLRPC method host_firmware_version:
    May 25 18:16:34	php-fpm	25822	/rc.filter_synchronize: A communications error occurred while attempting to call XMLRPC method host_firmware_version:
    May 25 18:16:24	php-fpm	25822	/rc.filter_synchronize: Beginning XMLRPC sync data to https://192.168.91.252:443/xmlrpc.php.
    May 25 18:16:24	php-fpm	25822	/rc.filter_synchronize: New alert found: A communications error occurred while attempting to call XMLRPC method host_firmware_version:
    May 25 18:16:24	php-fpm	25822	/rc.filter_synchronize: A communications error occurred while attempting to call XMLRPC method host_firmware_version:
    May 25 18:16:14	php-fpm	25822	/rc.filter_synchronize: Beginning XMLRPC sync data to https://192.168.91.252:443/xmlrpc.php.
    May 25 18:16:14	php-fpm	347	/system_hasync.php: waiting for pfsync...
    May 25 18:16:13	check_reload_status		Syncing firewall
    

    I read a lot about interface order causing problems so I reseted both VMs to factory defaults and started from there, but this didn't help. Thereafter I started again but from fresh installs on both VMs, again with same symptoms.

    Forgot to mention, I tried this first on 2.5 daily snaps and was having this same problems so I have chosen to downgrade to stable release in hope for it to being solved, maybe because of the "could not determine version" stuff but this didn't change anything.

    Any suggestions?

    Thanks in advance
    Olivia


  • LAYER 8 Netgate

    The primary has to:

    1. Have firewall rules on the secondary's sync interface for the initial sync to take place
    2. the firewall rules on the sync interface on the primary must allow these same connections on the secondary after the initial sync
    3. The admin interface must be the same credentials (admin/password) and the same port (http/https/customer port if set.)

    This all works if it is configured correctly.



  • Thanks @Derelict for the fast reply!

    I don't think my configuration is "violating" any of those requirements. I even have everything set to "allow any".

    Rules on sync interface on secondary:
    alt text

    Rules on sync interface on primary:
    alt text

    I have same username (admin) and port on both nodes WebUI, I'm using the same username to setup the XMLRPC sync section.


  • Rebel Alliance Moderator

    Have you checked your sync net?
    Pinged 192.168.91.252 from 192.168.91.251?
    Vice Versa?



  • Hi @JeGr, thanks for the answer. Yes, ping works across both nodes on their sync interfaces. Port test also works.


  • Rebel Alliance Moderator

    @nanas3 Did you re-check that admin user/pass is correctly entered into the master node and is correct on the standby? But communication error seems more like the standby won't work on http via port 10080.



  • Hi @JeGr yes, but I'll try again. I'll change my admin password on both nodes and see what happens.

    I have seen, from others, errors where it indicates that there is something wrong with the username, but not in my case.
    Is there a way to know more in detail what XMLRPC is doing because the message seems like a generic one.

    Thanks in advance,
    Olivia

    [update] changing the admin password on both nodes, and putting the new one on the sync section of HA didn't did anything.


  • Rebel Alliance Moderator

    Huh... Damn.
    Did you check the system log on the standby node? Does it say anything in particular?



  • Hi @JeGr, thanks for the reply.

    The second system doesn't show anything on the system log > general when I initiate a sync from the primary.

    The primary always shows something like:

    A communications error occurred while attempting to call XMLRPC method host_firmware_version: @ 2019-06-03 19:13:15
    A communications error occurred while attempting to call XMLRPC method host_firmware_version: @ 2019-06-03 19:13:25 
    

    It would be nice to know what "host_firmware_version: " means, and why is it empty.


  • LAYER 8 Netgate

    It is checking that the firmware versions match before it syncs because that can be bad. Those are logged because it cannot connect.



  • Hi @Derelict, thank you.

    Yes, I understand that, but in this case, they are VMs and it doesn't seem to show any data about it, it's like it's empty and because of it it's not syncing.

    What I don't get is, how can't it connect to the other node if there is a connection between both nodes on that sync networks which works with ping and port probe, with rules to permit any traffic.
    There must be something else that I can try.

    Olivia



  • [SOLVED]

    OK everyone, this may be "funny" but after checking everything again and again (with reinstalls in between) I noticed that the port channel on the physical switch connecting to one of the ESXi hosts had an MTU of 1500 instead of 9000 like the the rest of the ports, since I had everything set to 9000 (physical switch ports, virtual switches, pfSense NICs (vNICs)) this miss match caused trouble.

    Thanks to everyone who helped.

    Olivia


  • LAYER 8 Netgate

    Glad you found it.


Log in to reply