Uptime 2+ years, then loss of WAN connectivity



  • I've been searching the forums, and haven't found this specific case…

    We have a few Watchguard Firebox 750e with PfSense 2.1, all running happily in several racks at a data center.  They have been bullet-proof.  Everything is set with static IP addresses- there is no DHCP running for the LAN or the WAN.

    These boxes all have the CF card, and no disk.  Minimal packages are installed, namely PfBlocker.

    Then last Sunday, at 2:37 a.m. one went off-line.  (An off-site monitor started sending us alerts.)  I could not ping the box, or any of the hosts behind it.  So at 7:30 a.m. I made the short trip to the data center, and found I could log into the LAN side without issue- and all of the hosts on the LAN were reachable- but there was no outside connectivity.  The dashboard showed the WAN and LAN (and Opt1) were all up.  I could not ping to Google, or 8.8.8.8, etc.

    So from the GUI, I rebooted.  The box came back up just fine, with full connectivity.

    I've seen other posts about loss of WAN connectivity because of an ISP issue, and the WAN doesn't come back up with the ISP.  Some users have implemented cron-triggered scripts that will ping the outside, and reboot when there's no response.  I'll plan on implementing that, if nothing else.

    But curious if anyone has seen this issue, and if there's a fix.

    Thanks!

    Peter



  • The NICs on those Watchguard boxes aren't very good. They're also really old at this point, they went end of sale 6 years ago so that's maybe nearing a decade old at this point.

    I'd get new, better hardware for datacenter usage, personally.



  • Thanks Chris-

    Will plan on updating all of the Watchguard boxes with something more modern.  Will study the options, including the XG-2758 and other boxes at the PfSense store.

    Really need bulletproof reliability, and am certainly willing to pay for it.  (But want to keep with PfSense!)

    Peter



  • Those went EOL this last December so Watchguard wont even admit their existence at this point…  ;D

    If your using ports 0-3 try moving over to ports 4-7 until you get your replacement.  Once the ports start going they tend to run in pairs...



  • Might be CPU overkill but I'd go with a 1U HP Proliant DL360 of some recent generation. Plenty of expansion, dual power, iLO (kvm over IP) and easily replaceable NICs, hot swap RAID, and seriously over-engineered hardware. Plus, you can get used G6 models cheap. Heck, you can even configure them for fail-over memory modules. Excellent servers. Just an idea!



  • Matty-CT,

    Might be CPU overkill but I'd go with a 1U HP Proliant DL360 of some recent generation.

    Interesting you should mention it…

    I've been sold on HP's DL 360 servers as far back as when they were Compaq...!  And these are the servers we have always used in the DC.  Our first "go" with PfSense was on the DL 360 G3 servers, and then the G5.  But the power consumption of those is staggering compared to the Watchguard, or Atom-based boxes, etc.

    Still, with the excellent ILO capability, I'm considering going back to PfSense on a DL360.  We have plenty of G5 decommissioned boxes just taking up space.

    Thanks!



  • chpalmer-

    If your using ports 0-3 try moving over to ports 4-7 until you get your replacement.  Once the ports start going they tend to run in pairs…

    Thanks for the heads-up on the NIC ports.

    I remember reading in the excellent "Firebox" section of the forums that the right side ports (msk0 thru msk3) were suspicious, and based  on the comments I've avoided using those.  So I'm reluctant to make any change pending the replacement firewall(s.)

    Still, wondering about the root cause.  I was under some duress, and didn't copy the logs before rebooting.  A quick glance at the dashboard gave the false impression all was okay.  Seems like the auto-reboot script similar to https://forum.pfsense.org/index.php/topic,17243.0.html could have brought the box up without my intervention.  Seems there's mixed thought on reboot scripts, but I've now added a variation that might come in handy, if called upon.

    Thanks everyone…

    Peter