Is my Watchguard dead?
A little over a year ago I bought three watchguards to play with.
I ended up deploying a pair of them (upgraded CPU & RAM) into a production data center thousands of miles away. Running PFSense 2.1.4 and have no plan to upgrade them due to the horror stories I have seen about upgrading these boxes.
Have been using high availability sync between them, never had an issue. The secondary reports it has been online for 427+ days so that's cool.
Once there was an issue and the the high availability failed over and everything worked fine.
In the last month I have now had three failures of the primary box, where it goes dead. The high availability however does not kick in, because the interface I am using for CARP monitoring stays the primary box is still up.
The result is the backup box does not recognize any problem with the primary and does not take over.
I can't access the primary box from my remote location and the secondary does not take over (I can connect to the secondary box directly).
I have the data center staff power cycle the primary and it comes back up.
Is my box dead? Difficult to tell from remote and there are not logs when they power cycle I think the logs get wiped out.
Anyone suggest a troubleshooting tip?
The sync interface has no relation to whether or not CARP switches over. The interfaces that have CARP IPs on them are all up and working if CARP stays master on the primary.
Still might be the box dying, but you'll need to troubleshoot further to determine how the NICs keep functioning but it stops passing traffic. That could be any number of issues, some unrelated to the firewalls entirely that a reboot will fix for some period of time. I guess it's unreachable on LAN and WAN when it stops responding? If you can SSH or web into it via some interface while it's not working, reviewing the logs and interface status that way would be helpful in narrowing down the problem.
Frankly if you have the boxes in a datacenter, using recycled Watchguard hardware is a bad idea. Old hardware, and questionable quality NICs on many of them, isn't a recipe for high uptime. Maybe decent if you want something cheap for home non-critical usage. Personally I'd put in something better. The cost of any of the new hardware we sell is probably trivial vs. the datacenter bill.
Running the Firebox WatchGuard was just too cheap to pass up when I was looking to deploy a pair of rack mount firewalls last year. They performed perfectly well for 1+ year, I bought two to give myself redundancy knowing they were old. I was expecting a power supply would fail, not some strange networking dropping issue.
There was simply nothing on the market that could even come close to the price for the performance and rackable ability.
Guess it is time to upgrade, poor little Watcguard's :-(
For the time being I disabled CARP on the primary so all the traffic is going to the backup (hey, that was the purpose of having two right!?!). Now the backup can operate until I can purchase some new hardware.
I was pleasantly surprised to see you using recycled hardware in a data center. The comments Chris made me go look up what co-location costs these days. I was a litttle stunned.
This place charges $80mo. for 1U space and $150 minimum for bandwith.
So I guess "Access Card" means physical access to site? $250-$500 for "Remote Hands" reboots?
I though free market would have brought rates down further by this point in time. At $230mo/per 1U I would think it would be worth doing a double ITX rack build. Even limited Server amps on some plans..
What are you using it for. Web Hosting? What hardware did you use for your services -hardware wise?
Don't let that data center pricing fool you. I have space in several data centers, I get 10U of space, 50A power and bandwidth for $300 on average. I have one that is $275 a month. These are not crap data centers either.
I had deployed the Watchguard Firebox X750e Core, with upgraded CPU and RAM. I knew they were old but the performance was good enough for our needs. By putting two of them with high availability I thought it was worth the risk if I lost one (like I may have now). My cost per unit was something like $50 when I built them (the shipping was more than the units!).
Since yesterday I went shopping for new 1U systems, you know what I found? Nothing good. I'm looking at $1,300 for a pair of 1U rack mount servers to replace the Firebox setup.
I'm going to hold out and keep running on the backup Watchguard. It might last 1 week, it might last another 5 years… Our infrastructure is all redundant across multiple data centers, so even if we loose the entire data center because a firewall goes down no end users know there is an outage so recycled hardware is worth it. Our entire rack is built of off-lease servers, never had a failure but we designed everything to expect failure.
Long live Watchguard in the data center?!?
Well jump up one generation to the Astaro rackmounts. The LGA775 racks are worthy and newer. Slap in a Q9550S and ready for action. Lanner is the quality stuff but commands a good price used. Nexcom makes the Astaro.
Watch the REV# on these as only the newer ones had gigabit.
Which NICs are you using on the Firebox?
The box clearly doesn't fail entirely since it's still sending CARP advertisements preventing the Secondary taking over. It's not reachable on any interface?
That's an odd failure condition. However, unfortunately, as Chris says the age of the capacitors in those boxes means unreliability can creep in.
I'm running 2.2.X on the fireboxes I have FYI. No real issues upgrading other than the switch to DMA by default which can be worked around.