Help me diagnose a firewall that goes unresponsive every 24 hours or so
-
Retyping this because the last try failed when my attachments were determined to be too large.
I have a newish Netgate Hamakua (basically a 1GHz Celeron with 1 gig of memory and a fanless nanobsd install) at a datacenter I'm setting up. It came with pfSense 1.2.3, but I upgraded to the RC version, then after reading posts here moved to the daily snapshot cycle. Since then I've had 3 failures that required a hard reboot, and I'm trying to determine whether it's the firmware, the box itself, or if it's some combination of features I'm running that aren't really compatible with the current firmware versions.
Originally I installed BandwidthD before upgrading to 2.0, and I don't know if it's been uninstalled or not. Other than that I have had Snort installed (failures have happened when it was running and when it wasn't), and I have had some IPSec tunnels up as well. I've been running an Alix with comparable firmware versions at home with no issues until this morning, when it went stupid as well. The commonality there was the IPSec tunnels, but they've been disabled.
Anyway, I turned on Zabbix monitoring before the last failure and I hope these graphs will help explain something about what's happening. At least it helps give a snapshot of the state just before failure.
I'm not sure what this last one is. pfStates -> Searches or something comparable.
-
then after reading posts here moved to the daily snapshot cycle.
I don't know what this means: you upgrade the firmware every day? you have upgraded to a particular snapshot build? If so, which one?
Since then I've had 3 failures that required a hard reboot, and I'm trying to determine whether it's the firmware, the box itself, or if it's some combination of features I'm running that aren't really compatible with the current firmware versions.
What was displayed on the console when you required a reboot? What was it about the system that led you to do a reboot? (From the Subject I presume it was because the pfSense box was "unresponsive" but "unresponsive" to what?)
Originally I installed BandwidthD before upgrading to 2.0, and I don't know if it's been uninstalled or not.
Is BandwidthD on the list of installed packages? (In web GUI: System -> Packages, click on the Installed Packages tab)
-
I don't know what this means: you upgrade the firmware every day? you have upgraded to a particular snapshot build? If so, which one?
I've updated the firmware each time the firewall went down, thinking it might be a firmware issue. Before then it was probably once a week. Right now I'm on the current version.
What was displayed on the console when you required a reboot? What was it about the system that led you to do a reboot? (From the Subject I presume it was because the pfSense box was "unresponsive" but "unresponsive" to what?)
I can't answer that, as I'm nine hours away from the firewall (by car.) When I say "unresponsive" I mean that no connections happen over the WAN, and the Zabbix machine on an internal network can no longer connect to the firewall to gather information.
Is BandwidthD on the list of installed packages? (In web GUI: System -> Packages, click on the Installed Packages tab)
Nope. Which is what worries me about it, as I think it probably should be.
I mentioned that because the traffic widget was showing on the firewall, and not on my other firewall. I didn't look into available widgets first, so that's not tied to a failed removal of BandwidthD or anything.
-
There isn't enough information here to even suggest a likely cause. With some console output (or at least a description of what is on the console display when the system reboots) it would probably be possible be to distinguish between a number of possibilities including
-
something causing a power dip and the box halts because of power loss and the BIOS is not configured to automatically restart when the power comes back
-
a serious OS software error (panic)
-
a serious hardware error (e.g. faulty memory, faulty hard drive, bad spot on the hard drive etc etc)
Presumably you restart the box by asking someone on site to do it. Perhaps they could take a photo of the console (by digital camera or mobile phone) and email it to you or describe what is on the console when you ask them to reboot.
-
-
Well, if it happens again I'll be going down there to do a swap with replacement hardware I've purchased. If so, I'll post an update.
Thanks.