Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    CARP Split brain scenario with sustained throughput

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    5 Posts 3 Posters 674 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • I
      ilbicio
      last edited by

      Hello, consider this scenario:
      pfSense in HA pair, on Dell R210II (Single Xeon CPU, 8 GB RAM). NIC are Intel x520 SFP 10 GigE dual port and Broadcom
      copper 1 Gbe dual Port.

      Running pfSense 2.4.5 but had same issue even with previous versions.

      Intel NICs are configured to serve LAN networks in LACP LAG to a daisy-chained swiches (1 port per switch). On top of LACP LAG I put my pfSense interfaces, on VLANs. CARP is one of these VLANs.

      Broadcom NICs are connected via LACP LAG to a switch, which is connected to WAN router.

      I have a Backup LAN (intended as a LAN segment where I put backup servers). Backup servers runs smb and nfs protocols.

      When using SMB for sustained throughput from Servers LAN to Backup LAN (almost 5 hours with files varying in size from 2 to 200 GB at speed greater than 100 MBps), copy fails and entire network suddenly become and stay unresponsive: CARP status on master is ok reporting all interfaces Master, but some interfaces in secondary are reported as Master too.

      Rebooting secondary seems to solve the issue.

      For now I tried putting a backup server in the same segment of Servers LAN and everything seems running smooth.

      For obvious reasons (100 people working) I can't try to replicate the problem and investigate further, because the risk is to cripple the network again. When copying files CPU, memory utilizations and Web GUI responsivness are normal.

      I'm thinking a couple solutions:

      • move CARP interfaces outside Switches, by adding second copper network card
      • implement VLAN Priority for CARP Interfaces (Could it work?)

      Any advice appreciated.
      Thanks

      1 Reply Last reply Reply Quote 0
      • T
        thesurf
        last edited by

        I had a similar setup. vlan changes on the lag always resulted in total Core dump.

        I have now simplied the setup since I did a little research on carp. The pfsense will make a total failover if one carp interface will fail.

        So the full mesh with the switches is not needed. If the switch or the cable goes bad the full firewall will failover to the other firewall cluster member.

        Maybe you could also adopt this and get rid of the lag and this also solves your problems since this advanced setup is a little bit shaky.

        1 Reply Last reply Reply Quote 0
        • JeGrJ
          JeGr LAYER 8 Moderator
          last edited by

          @thesurf said in CARP Split brain scenario with sustained throughput:

          I had a similar setup. vlan changes on the lag always resulted in total Core dump.

          You wouldn't have a Intel X520 (or something) SFP+ card in that setup, would you? We had 2-3 instances of that chipset/driver misbehaving on LACP/VLAN pairings and changing anything related to VLAN setups etc. would break apart.

          With 2.4.5_1 and 2.5 those drivers were updated and now the systems run smoothly again.

          As the OP wrote about x520s but running 2.4.5 I don't know if that happend on 2.4.5 again (and with what errors in syslog) but in one case we were switching away from that particular nic/driver and inserted another better supported 10G SFP+ card and now the customer is happy and has no problems at all.

          Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

          If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

          1 Reply Last reply Reply Quote 0
          • I
            ilbicio
            last edited by

            Sincerely I had never other problems except this. VLAN and LACP behave fine, no core dumps or particular problems in using X520s.
            But since a couple of reconditioned SFP cards are cheaper than troubleshooting, I will try that way. Any advice on which SFP+ card put in my servers? Is Chelsio S320 good?

            1 Reply Last reply Reply Quote 0
            • JeGrJ
              JeGr LAYER 8 Moderator
              last edited by

              AFAIR chelsio are the ones Netgate uses itself in the XG series thou I don't know exactly what model or revision, but I'd try them!

              Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

              If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.