Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Backup->master at random intervals

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    5 Posts 3 Posters 2.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      miloman
      last edited by

      Recently i've been having some weird behaviour on my firewall setup. The CARP-sub of this forum is probably not the correct one, but it's because of CARP that i've noticed the problem.

      I have two firewalls running in HA with CARP. Pfsense 2.0.2 AMD64.

      I have noticed that the backup firewall will all of a sudden become the master of some, not all, CARP addresses and then go right back to being backup. It's quite random which addresses it happens to.

      At first i just thought it was a CARP bug, and didn't give the issue any real attention because it didn't interfere with regular traffic. Or so I thought…

      One of my users one day said that he had a problem transferring a large file over FTP. The FTP server would say "connection reset". I asked him for a timestamp of when it happened, and sure enough the firewall logs showed that the backup firewall had become master and then gone back to being backup at the exact time the user had written.

      Yesterday i was pinging a server in one of my vlans, and i suddenly got 6 ping timeouts. No errors in the firewall log or anything, but i could see that the backup firewall had become master and then went back to backup... All this happens within 5 seconds.

      Now... What i'm thinking is that the primary firewall is experiencing some sort of "flow stop", the backup firewall sees this as the master has gone offline and becomes the master, but then the primary firewall resumes normal operation and the backup goes back to being backup.

      Do any of you have any idea what could be causing this? I've tried booting the firewalls, but it didn't help, otherwise i wouldn't be posting this. Last night i even swapped out the server with my sparepart server. (Dell R610 with intel 10Gbit interfaces)

      I have a rather large setup, with about 15 vlans, and about 250 productions servers and load balancers behind the firewalls. So i can't just do changes or anything else that would have impact on operations.

      Any advice would be nice as i'm running low on ideas. The only thing i haven't explored is booting/ios updating my cisco switches. They have uptimes surpassing 400 days.

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        Most commonly that is an issue at layer 2.

        The switchover only happens if the backup fails to see advertisements from the master, or if it sees its own faster.

        If you have any kind of multicast/broadcast storm control or limiting in your switches, disable it.

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 0
        • M
          miloman
          last edited by

          i knew you someone would say that. sadly i don't have any multi/broadcast limiter in my switches. that would have been an easy fix though.

          allright… time for an ios upgrade. :)

          i'll let you know how it works out.

          1 Reply Last reply Reply Quote 0
          • C
            cmb
            last edited by

            The CARP flapping is the telling symptom of the actual problem, but it isn't the problem in and of itself, that indicates connectivity problems between the systems. Seeing if you lose connectivity to the interface IPs of the firewalls could be telling.

            1 Reply Last reply Reply Quote 0
            • M
              miloman
              last edited by

              solved

              i didn't get to the ios update part because i found the problem. spanning tree was converging at random times even though there was no topology changes. edited some stp costs, that did the trick.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.