Immediate help - network loop, even though bridge0 down
Alright, I need some help. This has happened a few times, and it just happened again now.
Two pfsense firewalls in a failover setup using the standard carp configuration. They're both configured to bridge the public network and the private network, because I don't control .1 (gateway). I have no choice. My servers (behind the firewall) are also using public IPs.
There's a dedicated line between the two firewalls for carp and pfsync.
Basically what's happening is, the two firewalls are looping over the bridge0 interface every once and a while (months in between…), even though bridge0 is DOWN on firewall2. Spanning tree is also on, however it's useless when the bridge interface is in a 'down' state. Carp was also reporting correctly (master/slave, not two masters).
This happened a few minutes ago, resulting in about 20 minutes of downtime. I rebooted the master firewall, and about 20 minutes after it came up, the loop happened again. This time, I had our hosting company on the phone when it happened, and they confirmed. Both ports that the firewalls are in were doing 800mbps in and out, to each other. Loop.
I have exhausted all possibilities, and I need some help.
Anyone with any ideas would be greatly appreciated..
CARP won't work on bridges.
I understand that. I'm built based on this:
I'm using carp for interface state changing, not IP failover.
Anyway, my bigger concern is the fact that the bridge was passing packets, even though the interface was down.
Probably won't find much help here on that, the freebsd-net list would likely be more helpful.
I'm going to PM you a link to a 1.2 image using FreeBSD 6.3 rather than 6.2 in case this is a 6.2 glitch that's since been fixed. If you did post to freebsd-net with 6.2, someone would undoubtedly tell you to upgrade to 6.3 (if not 7.0), so this is a good first step.
I haven't posted to the list - I was waiting to see what kind of responses I got here first. For the time being I'm okay, I took down one of the firewalls.
What I think could be happening is, the dedicated carp link between the two machines may be a bad cable. Originally I didn't have spanning tree enabled on the bridges, because I wanted an instant failover, and the bridge would be down on the backup firewall anyway, so STP was pointless. So, if there was a bad cable, I could easily see the two servers fighting over who's master, and who's slave, since they would be talking over that dedicated link with a bad cable.
The ideal scenario is getting the hosting company to let me control .1, which I'm in the process of talking about with them. That would allow me to completely get rid of the bridge, and just use carp to handle .1 - done deal.
Anyway, thanks for the URL, I may set up a test environment just to check it out anyway.
By the way, if you're curious why this is such a huge deal to me, read http://forum.pfsense.org/index.php/topic,7668.0.html. We've doubled in size since then.