How to get around Pfsense ZFS crashing on sudden power loss (electricity)
-
Hi Guys, Good morning...
The Problem:
I know it's standard practice to use a UPS with a computer to avoid catastrophic data loss or damage to components, and this is even MORE critical if you are using Bitlocker to "protect" your data or network because that's one nightmare you ain't gonna be waking up from. We lost our main UPS two months ago, and the unit is no longer manufactured, and just as the devil would have it, the electrical company at the same time, and probably the cause (who knows) for the loss of our UPS system. We had constant power losses during the day, and during the night and Pfsense ZFS as resilient as I thought it was supposed to be kept crashing and I just had to keep re-installing and re-uploading the current config file backup. What a effing pain in the ass that was.The Solution:
On the same computer (Fujitsu TX1320 M3) instead of re-installing PFsense, I installed VSphere ESXI 8 and installed Pfsense as the sole VM using Passthru option for the Quad Intel Server Network Card. After installation and configuration, I used VCenter installed on another server to manage the system where Pfsense is installed, and then created a Snapshot, and also download the configuration file as normal just in case. Still, we still have no UPS in place, and it's been twice we've had electrical loss and twice Pfsense ZFS has crashed and failed to boot. I was able to quickly revert to last working snapshot and within 3 minutes we were back up.So, doing it this way, if you make any changes in Pfsense, you will obviously need to download the config file, and when you can, (Shutdown Pfsense) create another snapshot.
Also..
- In BIOS, set system to power on after Power Failure.
- For Passthru to work, there are settings in the BIOS that must be turned on such as VT-d or IOMMU. Check your BIOS for details, or Google it.
- Set PFsense VM to auto-power on, in ESXI or VCenter. Which ever you choose, but I believe snapshots can only be created with VCenter.
-
@starcodesystems Using a RAM disk will limit active disk writes, thus lower the chance of an issue, though at the trade off of potentially not having current logs for an actual crash. Is your router doing a lot of logging maybe? Just thinking out loud for why it seems to be a recurring issue.
Glad you found a solution to a quick recovery. Though, unless the host is running ZFS I'd think the host would be subject to file system corruption issues...?
-
I will try that. For the moment, I'm only running NTOP.
[internet] ---- [Pfsense] --- [Cisco Catalyst InterVlan routing SVI's] --- [Network switches, servers, services] --- [Ubiquiti WLAN] ((((--)))) [Remote office's]
I could go even more advanced by creating iSCSI on Synology server with dedicated vlan / multipathing. Set up another VCenter Server and enable fail over switching using iSCSI for VM's in case ESXI server decides to crash one day too, which (fingers crossed, knock on wood) hasn't happened thus far.
-
@starcodesystems said in How to get around Pfsense ZFS crashing on sudden power loss (electricity):
running NTOP
That's probably it, per https://www.netgate.com/supported-pfsense-plus-packages it's on the "Requires SSD/HDD" list since it will exhaust eMMC storage write life. We don't use it so I don't know if a RAM disk will help there, or how big it might need to be. Depends on where it writes I suppose.
We typically do things like turn off logging of the default block rules for the firewall, turn off the HTTP log in Suricata, etc., to lower disk writes.
-
Perfect. I'll try that. Thanks very much!