Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Whole environment has become slow after introducing HA

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    5 Posts 2 Posters 908 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • N
      nfern
      last edited by

      Hi,
      I'm new to pfsense.
      I have introduced a second pfsense router and configured HA with CARP ip of 10.x.x.189.
      Individual interface IPs are 10.x.x.196 and 10.x.x.197.
      I have seen a lot of ping loss to all servers
      When tracing path between servers, i noticed that data is flowing via secondary device 10.x.x.197 instead of primary device.
      Master and Backup are seen correctly in CARP failover.
      Can someone help? Thank you

      1 Reply Last reply Reply Quote 0
      • M
        mjh_ca
        last edited by

        Welcome, pfSense HA is great but can be very sensitive to configuration errors.

        When you say "ping loss to all servers", do you mean pinging your internal servers from externally? Internally to internally? Internal to external?

        Are you using CARP VIPs on your WAN as well as LAN? Or only on LAN side? Need more information about your setup. The IPs you list are private IPs (i.e. LAN only) so I'm unclear how you have this configured.

        You say "whole environment has become slow". Do you mean internally (i.e. pinging server to server on same subnet)? HA shouldn't affect pinging internally between servers at all unless it is doing routing between VLANs or something, if you're seeing this slowness internally then you have some other issue like possibly you have introduced a loop on your switches and don't have STP/RSTP enabled.

        Assuming it isn't a networking issue (i.e. packet loss is only when traversing pfSense nodes) then it could be a misconfiguration of NAT or some other routing issue.

        Are you sure you properly setup Manual Outbound NAT, checked that XMLRPC sync is happening properly between the primary and secondary nodes, etc? Check the Status > System Logs, do you see any unusual messages about CARP?

        Double check the HA Troubleshooting docs and the links at the bottom of that page
        https://docs.netgate.com/pfsense/en/latest/highavailability/troubleshooting-high-availability-clusters.html

        There are lots of little errors that could cause weird behavior (slightly different configuration between the primary and the secondary nodes - i.e. incorrect subnet mask, etc).

        What is your upstream provider? Certain providers' equipment (in particular, cable modems) will freak out and block packets when they see multiple MACs involved with the same IPs and it simply can't be fixed unless the provider allows that configuration. Even if they will issue you a public /29 which should work for CARP, it won't work because you will have high packet loss when it sees multiple MACs and starts blocking.

        Re-read the configuration guides carefully to make sure nothing was missed:
        https://docs.netgate.com/pfsense/en/latest/highavailability/configuring-high-availability.html
        https://docs.netgate.com/pfsense/en/latest/book/highavailability/index.html

        1 Reply Last reply Reply Quote 0
        • N
          nfern
          last edited by

          When you say "ping loss to all servers", do you mean pinging your internal servers from externally? Internally to internally? Internal to external?

          I am using the WAN interface to route internally between vlans.

          Are you using CARP VIPs on your WAN as well as LAN? Or only on LAN side? Need more information about your setup. The IPs you list are private IPs (i.e. LAN only) so I'm unclear how you have this configured.

          CARP is configured on WAN and LAN and i have Alias IPs configured for each of the vlans. All IPs are private and not going to the internet through this setup.

          You say "whole environment has become slow". Do you mean internally (i.e. pinging server to server on same subnet)? HA shouldn't affect pinging internally between servers at all unless it is doing routing between VLANs or something, if you're seeing this slowness internally then you have some other issue like possibly you have introduced a loop on your switches and don't have STP/RSTP enabled.

          Before I could add the second box into the picture for HA, there were no issues of slowness. As soon as I introduced HA, i saw a ping loss.

          Assuming it isn't a networking issue (i.e. packet loss is only when traversing pfSense nodes) then it could be a misconfiguration of NAT or some other routing issue.

          *I do not have NAT configured.

          I have checked configuration on both devices, they match.
          Also noticed that although primary device shows "master" and secondary device shows "backup", a trace between clients goes out the primary and returns via secondary, which makes me think that they are both master.
          Advertising freq of base value and skew is also correct, 0 on master and 100 on backup
          *

          1 Reply Last reply Reply Quote 0
          • N
            nfern
            last edited by

            @mjh_ca
            I rebooted the primary device and when it came up, no more ping loss issue or slowness in environment. However, after about 6 hours or so, i'm back to square one. same slowness and tracert shows traffic passing through the secondary device instead of the primary.

            Any advice would be great at this point.
            Thank you

            1 Reply Last reply Reply Quote 0
            • M
              mjh_ca
              last edited by

              Lots of possibilities. I would simplify the configuration down to the basics and see if you can get it working.

              Key suspects for me would be network structure issue (have you accidentally introduced a loop and STP/RTSP is kicking in and disabling ports, causing weird routing? do you have a bad cable or ports that are auto-negotiating at the wrong speeds? etc). A CARP or Virtual IP configuration step you missed (wrong netmask on a virtual IP? left CARP in temporary maintenance mode, etc...)?

              Check out the pfSense system logs, check that node is MASTER on both WAN and LAN (if they split then of course you have routing issues) and the other is BACKUP on both, check your switch port status and logs to see if it gives you any hints...

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.