• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

XMLRPC sync errors since upgrade to 2.4.4

Scheduled Pinned Locked Moved HA/CARP/VIPs
64 Posts 13 Posters 12.6k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    DrNick 0
    last edited by Nov 2, 2018, 10:17 AM

    @bbrendon no we're not running pfblocker or any other packages - just using the built-in firewalling, NAT & captive portal functions. I still have config sync disabled most of the time - I just enable it when I'm making config changes I want synced then turn it off again, which I can live with.
    @windiz interesting theory about auto config backup, but we're not using that either. Not to say in your case it might not be related, but I've never turned on that option here.

    1 Reply Last reply Reply Quote 0
    • S
      stephenw10 Netgate Administrator
      last edited by Nov 2, 2018, 1:04 PM

      If you enable autoconfig backup on the secondary and it has no WAN connection it will try to backup it's config at every change pushed from the primary and fail. It will have to timeout waiting for that and if the primary tries to push another changes during that time it may fail.
      Really you should be running those as an HA pair in that situation.

      You should be able to disable ACB on the secondary though.

      Steve

      1 Reply Last reply Reply Quote 0
      • N
        Nima304
        last edited by Nov 13, 2018, 8:51 PM

        I'm having the exact same issue as @DrNick-0 with a similar setup. I have a pair of XG-1541 1U HAs, and have been receiving the "A communications error occurred while attempting to call XMLRPC method restore_config_section" message immediately after upgrading to 2.4.4-RELEASE. Here are the answers to @jimp's questions as well:

        • Yes, I can reach the sync address from one firewall from the other.
        • Yes, I can reach both GUI ports
        • I'm not seeing any blocked entries in the firewall log for the sync interface.
        • No XMLRPC or nginx logs on the secondary.
        • No interface events for the sync interface on either firewall.
        • Sync interface looks fine on both firewalls.

        Additionally, I'm using a direct cable for the sync interface between the two firewalls, nothing's in between. Occasionally, I'll get the message "/rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc (pfsense.host_firmware_version)," and if I sync the configuration manually through Status>Filter Reload, it seems to sync just fine, with the following logs:

        • Nov 13 15:45:29 php-fpm /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc.php (pfsense.restore_config_section).
        • Nov 13 15:44:31 php-fpm /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php.
        • Nov 13 15:44:31 php-fpm /rc.filter_synchronize: XMLRPC versioncheck: 18.8 -- 18.8
        • Nov 13 15:44:31 php-fpm /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc.php (pfsense.host_firmware_version).
        • Nov 13 15:44:31 php-fpm /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php.
        • Nov 13 15:44:30 check_reload_status Syncing firewall

        Some time afterwards (up to 30 minutes later), it'll go back to spamming the "A communications error occurred while attempting to call XMLRPC method restore_config_section" logs again. I've tried rebooting the secondary firewall to no avail, and can't reboot the primary since it's in production. Any help would be greatly appreciated.

        S B 2 Replies Last reply Nov 13, 2018, 9:26 PM Reply Quote 0
        • J
          jimp Rebel Alliance Developer Netgate
          last edited by Nov 13, 2018, 9:03 PM

          What packages do you have installed? I've seen several HA clusters running 2.4.4 and none have sync issues like this.

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          N 1 Reply Last reply Nov 13, 2018, 9:05 PM Reply Quote 0
          • N
            Nima304 @jimp
            last edited by Nov 13, 2018, 9:05 PM

            @jimp said in XMLRPC sync errors since upgrade to 2.4.4:

            What packages do you have installed? I've seen several HA clusters running 2.4.4 and none have sync issues like this.

            I have no packages installed on either firewall.

            1 Reply Last reply Reply Quote 0
            • D
              Derelict LAYER 8 Netgate
              last edited by Nov 13, 2018, 9:21 PM

              Is the webgui healthy on the secondary at the time? Can you log in there and navigate?

              Are you trying to game things without the requisite 3 public IP addresses on WAN? Can the secondary get to the internet, resolve names, etc when it is not CARP master?

              N 1 Reply Last reply Nov 13, 2018, 9:26 PM Reply Quote 0
              • N
                Nima304 @Derelict
                last edited by Nov 13, 2018, 9:26 PM

                @derelict said in XMLRPC sync errors since upgrade to 2.4.4:

                Is the webgui healthy on the secondary at the time? Can you log in there and navigate?

                Are you trying to game things without the requisite 3 public IP addresses on WAN? Can the secondary get to the internet, resolve names, etc when it is not CARP master?

                Yup, the webgui is just fine. I'm not trying to game anything, both firewalls have their own unique upstream address, and the CARP address is a different and also unique address as well. The secondary firewall can get to the Internet and resolve DNS names when it's not CARP master, I pinged google.com to check.

                1 Reply Last reply Reply Quote 0
                • S
                  SteveITS Galactic Empire @Nima304
                  last edited by Nov 13, 2018, 9:26 PM

                  @nima304
                  is 172.16.1.3 the sync IP or the LAN IP of the second router?

                  @windiz
                  same question for 10.51.0.2?

                  The routers I upgraded last week aren't logging comm errors...

                  A long time ago I did have sync issues. I seem to recall I tracked it down to Suricata and that we had selectively disabled many of the unneeded individual rules. Turns out all that had to sync and it was timing out. Solution: don't disable individual rules and it has less to process.

                  Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                  When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                  Upvote 👍 helpful posts!

                  N 1 Reply Last reply Nov 13, 2018, 9:30 PM Reply Quote 0
                  • N
                    Nima304 @SteveITS
                    last edited by Nov 13, 2018, 9:30 PM

                    @teamits said in XMLRPC sync errors since upgrade to 2.4.4:

                    @nima304
                    is 172.16.1.3 the sync IP or the LAN IP of the second router?

                    @windiz
                    same question for 10.51.0.2?

                    The routers I upgraded last week aren't logging comm errors...

                    A long time ago I did have sync issues. I seem to recall I tracked it down to Suricata and that we had selectively disabled many of the unneeded individual rules. Turns out all that had to sync and it was timing out. Solution: don't disable individual rules and it has less to process.

                    That's the sync IP for the second firewall. The primary's is 172.16.1.2.

                    1 Reply Last reply Reply Quote 0
                    • D
                      Derelict LAYER 8 Netgate
                      last edited by Nov 13, 2018, 9:44 PM

                      This post is deleted!
                      1 Reply Last reply Reply Quote 0
                      • B
                        bbrendon @Nima304
                        last edited by Nov 13, 2018, 9:54 PM

                        @nima304 Thanks for digging into your setup to get to the bottom of this. I just haven't had time on my end and since things more or less work, it hasn't been a priority.

                        N 1 Reply Last reply Nov 14, 2018, 1:56 AM Reply Quote 0
                        • S
                          stephenw10 Netgate Administrator
                          last edited by Nov 13, 2018, 10:37 PM

                          Do you have a large number of users in the config?

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • N
                            Nima304 @bbrendon
                            last edited by Nov 14, 2018, 1:56 AM

                            @bbrendon said in XMLRPC sync errors since upgrade to 2.4.4:

                            @nima304 Thanks for digging into your setup to get to the bottom of this. I just haven't had time on my end and since things more or less work, it hasn't been a priority.

                            No problem, hopefully there's a resolution that solves it for all of us.

                            @stephenw10 said in XMLRPC sync errors since upgrade to 2.4.4:

                            Do you have a large number of users in the config?

                            Steve

                            No, literally just the admin user, but I also have LDAP auth configured.

                            1 Reply Last reply Reply Quote 0
                            • S
                              stephenw10 Netgate Administrator
                              last edited by Nov 14, 2018, 12:51 PM

                              That should be no problem as long as the user accounts are not on pfSense. A large number can introduce delays on the secondary when the sync'c config is added preventing it responding in reasonable time.

                              Hmm, I'd probably start a packet capture on the secondary sync interface. Set it for a large number and wait for it to fail. See what's actually happening there.

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • S
                                SteveITS Galactic Empire
                                last edited by Nov 14, 2018, 4:51 PM

                                In windiz's logs, it is exactly 60 seconds from the beginning of the sync to the error and that sounds like a timeout to me. Brainstorming, how large is your config export file? We have some decently complex ones for our data center that are about 180 KB, for reference...Suricata rules, pfBlockerNG, OpenVPN, etc.

                                Router2 isn't set to sync back to router1 is it? That would be a loop.

                                Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                                Upvote 👍 helpful posts!

                                N 1 Reply Last reply Nov 14, 2018, 8:25 PM Reply Quote 0
                                • S
                                  stephenw10 Netgate Administrator
                                  last edited by Nov 14, 2018, 6:34 PM

                                  Yes, the timeout is 60s. It used to be possible to take longer than that to load the config and respond with more than ~50 users on some hardware. There have been improvements gone in since then though.

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • N
                                    Nima304 @SteveITS
                                    last edited by Nov 14, 2018, 8:25 PM

                                    @teamits said in XMLRPC sync errors since upgrade to 2.4.4:

                                    In windiz's logs, it is exactly 60 seconds from the beginning of the sync to the error and that sounds like a timeout to me. Brainstorming, how large is your config export file? We have some decently complex ones for our data center that are about 180 KB, for reference...Suricata rules, pfBlockerNG, OpenVPN, etc.

                                    Router2 isn't set to sync back to router1 is it? That would be a loop.

                                    Good catch, my logs are showing the same thing. While config sync isn't set at all on the secondary, the primary is syncing states from the secondary, and the secondary from the primary, as per pfSense's documentation.

                                    I'm going to try to blow the firewall rules open on the sync interface for both firewalls and see if that does anything.

                                    1 Reply Last reply Reply Quote 0
                                    • N
                                      Nima304
                                      last edited by Nov 14, 2018, 8:35 PM

                                      Blowing open the rules did nothing, unfortunately. I'm seeing data received on the secondary firewall, so it's not a cable issue. I'll do a packet capture and see if anything interesting turns up.

                                      1 Reply Last reply Reply Quote 0
                                      • N
                                        Nima304
                                        last edited by Nov 14, 2018, 8:45 PM

                                        The transmission is encrypted using TLS, so I can't actually see what's going on.

                                        1 Reply Last reply Reply Quote 0
                                        • S
                                          stephenw10 Netgate Administrator
                                          last edited by Nov 14, 2018, 9:30 PM

                                          You could set the GUI to http just while you test. However you should still be able to see the TCP sequence and lack or responses.
                                          Make sure both nodes are time sync'd and then compare the log entries. Does the secondary log anything during that 60s window?

                                          Steve

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                            [[user:consent.lead]]
                                            [[user:consent.not_received]]