Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Frequent Crashes of Secondary PFsense Node in HA Cluster After Large Sync Operations

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    2 Posts 2 Posters 378 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      Sinsei0605
      last edited by

      Hello,

      I'm posting this message on the forum to find out if anyone else is experiencing the same issue or if there is a solution somewhere that I haven't found yet.

      Basically, I have four PFsense HA clusters, and we regularly experience crashes on the secondary node with the following message on the primary:
      "A communications error occurred while attempting XMLRPC sync."

      Upon checking, we completely lose access to the WEBUI of our secondary node. It becomes entirely inaccessible, sometimes displaying an ERRORx50 PHP crash page, but no error reports are generated.

      We then connect directly to PFsense via CLI, where everything appears fine—the firewall is accessible and responsive. I've already tried forcing a restart of the WEBUI, but nothing happens. Each time, I am forced to restart the secondary node manually.

      We have noticed that the error occurs after synchronizing a large number of firewall rules, creating an IPsec tunnel, or even setting up a new VPN. It seems like the issue arises when there is a large amount of data to synchronize.

      We have already switched operations to the secondary node before and never encountered any production issues or disruptions.

      We are running version 24.03 with some installed packages, including Freeradius (for VPN user management), OpenVPN Client Export, pfBlockerNG, Suricata, Syslog-NG, System Patches, and Zabbix Agent.

      This issue occurs on all four of my PFsense HA clusters, not just one.

      Unfortunately, we do not understand the root cause of these "crashes," even though they may not be actual crashes—since no crash logs are available, and we have to force a reboot from the CLI.

      1 Reply Last reply Reply Quote 0
      • B
        btspce
        last edited by

        We are also seeing this since we upgraded to 24.11 with all patches applied.
        "A communications error occurred while attempting XMLRPC sync." on primary node.

        Accessing the webgui on secondary node hangs the firewall after 5-10 seconds.
        If we access the CLI everything seems fine and no hangs unless we initiate a reboot, Then the secondary hangs and we need to pull the power to recover.

        This happens usually after 5-14 days of uptime of secondary node.

        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.