Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Another XMLRPC communication error

    HA/CARP/VIPs
    5
    24
    3.7k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mse
      last edited by

      Hi there,

      I try now for second day to configure a HA installation of pfSense on a hetzner cloud and fail badly.
      This message appears on sync attempts

      80f496ab-68ec-439f-a218-18d20d80d1cd-image.png

      Both firewalls are same (2.4.4_3), both run on same hardware in a hetzner cloud.

      Configuration synchronization between master and slave doesn't work for unknown reason.
      Synchronisation goes via a dedicated interface (192.168.4.0/24 ... master and slave in a /28 subnet, see picture). Both are reachable.

      ab8b919f-e20b-4f1b-b12d-5118edf3d7ff-image.png

      Rules for the SYNC interface are defined on both sides like this
      9a2571ef-5a8e-4632-af8d-ab3b3ea0eeb2-image.png
      User is 'admin' and Passwords are on both sides correctly for sure.

      Any idea what I missed maybe? Any help would be appreciated.

      Kind regards,
      Michael

      1 Reply Last reply Reply Quote 0
      • JeGrJ
        JeGr LAYER 8 Moderator
        last edited by JeGr

        @mse said in Another XMLRPC communication error:

        User is 'admin' and Passwords are on both sides correctly for sure.

        But can they ping each other via Sync interface? (and I'd enable ICMP on Sync, too)

        Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

        If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

        1 Reply Last reply Reply Quote 0
        • M
          mse
          last edited by

          Thanks for your response. Ohhh... they seem to not to see each other...
          Changed rules on both sides

          5ef0ed6d-223e-4e3d-b4b2-f7bf6985961b-image.png

          (any ICMP for simplicity... at now)

          And ping from master to slave fails.

          PING 192.168.4.3 (192.168.4.3) from 192.168.4.2: 56 data bytes
          --- 192.168.4.3 ping statistics ---
          3 packets transmitted, 0 packets received, 100.0% packet loss

          Hmm...
          The interface seems to be configured correctly... or is the /32 CIDR wrong here?

          90cbd38c-9835-49b5-b952-c55c5f42631b-image.png

          At console all interfaces seem to be OK too. On other side same. Only different adresses in same subnets.

          6a0a975c-44f8-4435-aa35-ad6176016f39-image.png

          Gateway is on SYNC interface comming from DHCP and is 192.168.4.1 I think.

          Really, this drives me crazy slowly. ๐Ÿ˜จ ๐Ÿ˜

          JeGrJ 1 Reply Last reply Reply Quote 0
          • M
            mse
            last edited by

            Perhaps, the SYNC Adresses are at System -> Advanced -> Admin Access -> Alternate Hostnames too.

            1 Reply Last reply Reply Quote 0
            • M
              mse
              last edited by

              OK, ping works. And the message changed now to

              A communications error occurred while attempting to call XMLRPC method host_firmware_version: Unable to connect to tls://192.168.4.3:443. Error: Operation timed out @ 2019-11-28 16:53:47

              1 Reply Last reply Reply Quote 0
              • DerelictD
                Derelict LAYER 8 Netgate
                last edited by

                Then you can't connect to the webgui there. Is the other side set to run the webgui on https/443?

                What happens when you use Diagnostics > Test Port to test 192.168.4.3 port 443?

                What is logged on the secondary in the system and maybe firewall logs?

                There is also zero reason to pass the CARP protocol on the sync interface. See the sticky post in this category for explanations. If anything you should be passing pfsync, not CARP.

                Chattanooga, Tennessee, USA
                A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                Do Not Chat For Help! NO_WAN_EGRESS(TM)

                M 1 Reply Last reply Reply Quote 0
                • JeGrJ
                  JeGr LAYER 8 Moderator @mse
                  last edited by JeGr

                  @mse said in Another XMLRPC communication error:

                  The interface seems to be configured correctly... or is the /32 CIDR wrong here?

                  /32 is wrong, you need at least a /30 or /29 to set up the net between the two nodes.

                  Set node 1 to 192.168.4.1/30 and the other to 192.168.4.2/30 and it should work. I'd use a ICMP any any rule for testing so to not run into problems with sync_net or sync_address.

                  With using the correct subnetting perhaps your XMLRPC error will vanish, too.

                  Also don't understand how you use DHCP but manually configure an IP in the fields below. Should simply be Static IP IMHO. And how are those other interfaces (WAN/LAN) able to work with each other if you configure them as /32? You have to use the correct netmask of the subnet, not a single IP /32. How else should node 1 be able to reach its peer on LAN/WAN when trying to speak CARP? That setup seems very buggy.

                  Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

                  If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                  M 1 Reply Last reply Reply Quote 0
                  • M
                    mse @Derelict
                    last edited by

                    @Derelict
                    Thanks. The admin UI is running at 443. Port-Ping works and I can see log entries for synchronisation attempts too (all green). But the sync message is still failure.

                    A communications error occurred while attempting to call XMLRPC method host_firmware_version: @ 2019-11-29 08:39:53

                    22d7250d-9aef-4d41-9cf1-a67c48296274-image.png

                    I set the rule for CARP following this
                    https://vorkbaard.nl/how-to-set-up-pfsense-high-availability-hardware-redundancy/

                    I'm new to the funky firewall business ๐Ÿ˜‰ I try to find a solution to secure a public cloud.
                    Or better to say to turn a cloud with public IPs on the nodes to a private one, where only
                    some edge nodes are accessible. All traffic among the nodes goes trough private network and doesn't leave the box. Of course, I could apply rules for ufw to block all traffic at any port on the public network, but thie solution with a firewall sounds for me more sexy.
                    My idea is:

                    • Install a private network (works fine)
                    • Put router/firewall (HA in any case) in front of the nodes (current topic)
                    • Turn off public IPs (works too)
                    • Provision the nodes (Key management system, Rancher cluster, Kubernetes cluster and storage
                      )

                    If things work I can document and roll up the configuration on another cluster of nodes to prepare a production system. For now its an experiment, to get familiar with this topic and to avaluate if I'm riding a dead horse or not.

                    1 Reply Last reply Reply Quote 0
                    • M
                      mse @JeGr
                      last edited by

                      @JeGr
                      Ohh ja, its a very small network with /32 CIDR. ๐Ÿ˜ Actually its an /24 and the two pfSense nodes are in a /28 subnet (/30 would do too). WAN is DHCP with additional floating IPs which are static.
                      WAN and LAN work fine.
                      Not sure if this is the right place to configure them (Alias IPv4 address in interface configuration) but the WAN addresses of the pfSense nodes have additional floating IPs.

                      1 Reply Last reply Reply Quote 0
                      • M
                        mse
                        last edited by

                        It seems not to work at all (on this hardware?) or I'm missing something essential.
                        I don't see any reason why it doesn't work.

                        1. Both pfSense instances are up to date (first thing done after activating WAN)
                          db055ead-461c-4089-92c4-79a5e26d1f12-image.png

                        2. Interfaces
                          WAN -> DHCP (works)
                          LAN -> DHCP (works) 10.70.64.0/21 (pfSense in subnet /24)
                          SYNC -> DHCP (works) 192.168.4.0/24 (pfSense in subnet /29)

                        pfSync-Master has IP: 10.70.64.2 (LAN), 192.168.4.2 (SYNC)
                        pfSync-Slave has IP: 10.70.64.3 (LAN), 192.168.4.3 (SYNC)

                        1. Rules on both sides on the SYNC interface wide open

                        e7582078-d4bb-4ba6-bade-b932a46978aa-image.png

                        1. Port-Ping from Master to Slave works fine
                          588537e0-866c-4737-943b-d22b8c852c7b-image.png

                        2. XMLRPC fails always
                          5d09c738-0047-48f3-bf39-d9a91b46fb8c-image.png

                        3. Logs on target site contain successful connection attempts
                          07c0a899-e9f6-497d-9cbb-d065a203fc95-image.png

                        4. Logs on source site contain following messages
                          53a3d77b-0f2f-4e29-925e-39f8cfc47aef-image.png

                        1 Reply Last reply Reply Quote 0
                        • DerelictD
                          Derelict LAYER 8 Netgate
                          last edited by

                          What is logged on the secondary?

                          Chattanooga, Tennessee, USA
                          A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                          DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                          Do Not Chat For Help! NO_WAN_EGRESS(TM)

                          M 1 Reply Last reply Reply Quote 0
                          • M
                            Mr.Trieu
                            last edited by Mr.Trieu

                            i have issue the same.
                            you resolved it, please help me

                            1 Reply Last reply Reply Quote 0
                            • M
                              mse @Derelict
                              last edited by

                              @Derelict
                              Tried it today again. Same errors. On secondary the only noticeable log entry is maybe this from nginx

                              Nov 30 10:24:17 router-slave.localdomain nginx: 2019/11/30 10:24:17 [error] 65828#100098: send() failed (54: Connection reset by peer)

                              But nothing special in /etc/log/nginx/error.log to identify the reason.

                              1 Reply Last reply Reply Quote 0
                              • M
                                mse
                                last edited by

                                Perhaps, I see people fight this issues for years now ๐Ÿ˜ฑ
                                https://forum.netgate.com/topic/122196/high-avail-sync-broken/19
                                Didn't find any solution by now. Tried IPFire but its a joke in comparison to pfSense. There is no HA available at all. Good to protect the own toaster maybe but not in production clusters. ๐Ÿ˜

                                1 Reply Last reply Reply Quote 0
                                • DerelictD
                                  Derelict LAYER 8 Netgate
                                  last edited by

                                  Well it works for many, many people, me included. You have something configured incorrectly or something wrong in your environment.

                                  If it was me and I was stuck, I would pcap the exchange and see exactly what was happening.

                                  Chattanooga, Tennessee, USA
                                  A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                  DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                  Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                  M 1 Reply Last reply Reply Quote 0
                                  • M
                                    mse @Derelict
                                    last edited by

                                    @Derelict
                                    Sure, maybe the reason are the virtual networks at Hetzner. Really I can't figure out, why it's not working. It fails checking the results of the xml rpc invocation
                                    I took a look at the code of the 'host_firmware_version' function and tried it separately in script... all values (OS version, config version etc.) are there and can be parsed as expected. I give up. I think a scripted solution using iptables, UCARP and BGP should do too. At least I can install this things unattended using terraform.

                                    Anyway. Thank you very much for your time and suggestions. ๐Ÿ‘ ๐Ÿ™‚

                                    1 Reply Last reply Reply Quote 0
                                    • JeGrJ
                                      JeGr LAYER 8 Moderator
                                      last edited by

                                      @mse said in Another XMLRPC communication error:

                                      the virtual networks at Hetzner

                                      Which are know for some "specialities" in IP "black-magic" and nasty /32 PtP routing etc. so that could very well be a case of it not working the way it should.

                                      Did you try (just as a test) using LAN or even WAN as syncing interface just for fun? Perhaps that additional "private subnet" you use as SYNC isn't functioning correctly. Had some hassle with the new Hetzner vCloud in the past, too, as I was trying to setup IPv6 between two instances. One worked flawlessly, the other was bugged as hell. Only after opening a ticket and putting them through the hoops, they discovered the VM host where the second and buggy VM ran was incorrectly configured and thus not 100% IPv6 compatible... After they fixed, everything ran smooth.

                                      Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

                                      If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                                      M 1 Reply Last reply Reply Quote 0
                                      • M
                                        mse @JeGr
                                        last edited by

                                        @JeGr
                                        Yes, it can drive one really crazy. ๐Ÿ˜ UCARP doesnโ€˜t work, because all floating IPs need to be attached to a physical node. I think using keepalived and a script to reasign the floating IP and โ€šdefault routeโ€˜ to the slave in the virtual network should work. To switch between master and slave I had only to assign the floating IP (used as external IP of the network) to the slave and adjust the route for 0.0.0.0/0 to the slave as gateway. All traffic is routed via slave.
                                        It works fine doing simple routing with UFW and turning the public IP off but I donโ€™t trust this construct.
                                        IPv6 was fun too with Ubuntu 18.04... i tried today Wireguard but failed for some reason... The interfaces stop to forward traffic when IPv6 is active on wireguard interface.
                                        I didnโ€™t understand why... found this here
                                        https://angristan.xyz/fix-ipv6-hetzner-cloud/
                                        but the cloudinit network configuration seems to be fixed in current version of the ISO... Hmm... I am beekeeper in my next life for sure. ๐Ÿ˜€ Really, whole week only issues with everything. Perhaps. pfSense on public interface synchronizes without any problems... but of course, not a good idea I suppose

                                        K 1 Reply Last reply Reply Quote 1
                                        • K
                                          Koby Peleg Hen @mse
                                          last edited by

                                          @mse I have the same exact problem with hetzner Cloud , the same as you.
                                          After two days of trying to sync 2 nodes of pfsense (ver 2.4.5-1), I gave up , and try to look for another solution.
                                          I must say that I appreciate heztner for there stability and there service.
                                          But you can't win them all.

                                          Best regards ,
                                          Koby Peleg Hen

                                          M 1 Reply Last reply Reply Quote 0
                                          • M
                                            mse @Koby Peleg Hen
                                            last edited by

                                            @koby-peleg-hen
                                            By now hetzner supports loadbalancer and routing into virtual 'private' networks. But you need to allow at least outgoing traffic on public interfaces to be able to get anything from outside the cluster.
                                            Two years ago I solved this using two edge nodes (the only once with public interface activated) in a HA setup with keepalived, ufw, nginx as reverse proxy and floating IP reassignment on node failures.
                                            It looks like this:
                                            loadbalancer.png
                                            This works fine but today I would use the loadbalancer instead and only limit the access of the nodes over public IPs.

                                            K 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.