Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    XMLRPC Sync no longer performed after update to 2.5.2 (not even attempted) - but actually it broke earlier

    HA/CARP/VIPs
    2
    4
    1403
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K
      Klaws last edited by

      Two pfSense machines (master/backup) configured with CARP and XMLRPC Sync, which reliably worked. Last Friday, I updated both machines to 2.5.2, prior to some changes. The changes included a new static DHCP mapping and changes in NAT and firewall rules (I installed a new SIP gateway). Also a new VLAN and associated interface (which I deleted again later on).

      Today (Saturday), I noticed that the backup pfSense was still showing old firewall and NAT rules on the WebGUI.

      The Sync Interface is showing both incoming and outgoing traffic on both machines. It appears that CARP works. Failover also works, rebooting the master machine causes the backup machine to become the new master. No issues there.

      "Synchronize states" is enables on both machines, Synchronize Interface correctly selected, and the respective other node's IP address is specified.

      XMLRPC Sync is configured on the master, IP address, user name and password of the backup is specified, and every checkbox checked except "Synchronize admin". On the backup, only the checkboxes (except "Synchronize admin") are checked, no IP address, no user name, no password.

      Note that I had not changed anything there; I had left everything there as it was (which worked until the 252 update).

      Of course I have checked the system logs (Status / System Logs / System / General) for error messages. A search for "XMLRPC" on the backup machine showed no matches. The same search for "XMLRPC" on the master only matches prior to the time of the 2.5.2 upgrade - all related to ACME, indicating "/usr/local/pkg/acme/acme_command.sh: XMLRPC reload data success with https://192.168.555.2:88/xmlrpc.php (pfsense.exec_php)." (IP address changed to protect the innocent; the real one is valid). Which is a lie, as the backup node still sports certificates from 2020 (I had to install ACME on the backup machine, with automatic updates disabled, so I could update the WebGUI certificate via ACME manually).

      Consequently, I triggered a certificate renewal in ACME (successfully), but no XMLRPC message would appear in the system logs, only non-XMLRPC-related ACME log entries.

      Looking into the configuration history of the backup machines revealed that the last XMLRPC config merge was about half a year ago. So XMLRPC sync apparently had stopped working not just yesterday, after the upgrade to 2.5.2., but half a year ago already (the last XMLRPC-related change on the backup machine corresponded to "(system): syslog-ng: Settings saved" on the master). But at least the master node had attempted to sync to the backup machine...at least as long as ACME was concerned.

      So, where do I go from here? Debugging is a bit hard if I don't even error messages!

      SipriusPT 1 Reply Last reply Reply Quote 1
      • SipriusPT
        SipriusPT @Klaws last edited by

        @klaws regarding notifications about pfsense activity, do you use Mail Reports? Pfsense can let you know through email issues related with XMLRPC Sync.

        1xSG-4860-1U
        1xSG-3100
        2xpfSense Virtual Machines

        K 1 Reply Last reply Reply Quote 0
        • K
          Klaws @SipriusPT last edited by

          @sipriuspt Thank you for your answer!

          I use regular email notifications, and these showed nothing suspicious (except that the certificated on the backup node expired, but that was just a symptom). I installed Mail Reports, but since the log files also show no XMLRPC-related messages, it's not very helpful. ๐Ÿ˜Š

          In any case, I solved the issue now. I updated the backup node from 2.5.2 to 2.6.0, everything smooth and fine, then entered CARP maintenance mode on the master, performed the same update there, and everything went sideways. ๐Ÿ˜ฐ

          Apparently, something had been corrupted on the master node some time ago already, and now "everything was broken" (including the WebGUI). Well, SSH still worked (even though I was dropped right into a command prompt, no pfSense menu or anything):

          pkg-static clean -ay; pkg-static install -fy pkg pfSense-repo pfSense-upgrade
          pkg-static upgrade -f
          shutdown -r +1
          

          And this fixed both the botched upgrade as well as my XMLRPC issue! ๐Ÿ˜‚

          1 Reply Last reply Reply Quote 1
          • SipriusPT
            SipriusPT last edited by SipriusPT

            @klaws Mail Report just let you know in time any issues that could occur.

            At least for me it helps a lot dealing with pfsense clusters.

            Examples:

            like when some CARP state changes states (master or backup),

            17:51:09 HA cluster member "(10.0.13.1@ixl3.13): (IXL3_VLAN13_IT_ADMINS)" has resumed CARP state "BACKUP" for vhid 12
            

            when WANs went offline or online in gateway groups:

            11:07:07 MONITOR: WAN_ROUTERA_WAN2_GW is available now, adding to routing group GW_GROUP x.x.x.225|172.16.2.2|WAN_ROUTERA_WAN2_GW|34.651ms|87.308ms|18%|online|loss
            

            when services stop working and watchdog service detect and handle the situation,

            9:26:00 Service Watchdog detected service openvpn stopped. Restarting openvpn (OpenVPN server: Internal Devices)
            

            when rules cannot load:

            15:42:40 There were error(s) loading the rules: /tmp/rules.debug:51: cannot load "/var/db/aliastables/pfB_NAmerica_v6.txt": Invalid argument - The line in question reads [51]: table <pfB_NAmerica_v6> persist file "/var/db/aliastables/pfB_NAmerica_v6.txt"
            

            when XMLRPC communication fails:

            17:29:59 A communications error occurred while attempting to call XMLRPC method restore_config_section:
            
            16:43:28 Exception calling XMLRPC method restore_config_section # Impossible to encode value '' from type 'NULL'. No analogous type in XML_RPC.
            

            1xSG-4860-1U
            1xSG-3100
            2xpfSense Virtual Machines

            1 Reply Last reply Reply Quote 0
            • First post
              Last post