Disaster recovery process to protect against boot media failure?
-
Hi
Summary:
Is it possible to check the boot media e.g. usb stick from within pfsense?Background:
My pfsense firewall (Intel J3455 cpu 4GB booting off a Sandisk USB stick) has run for several years with no unplanned downtime, and a handful of upgrades in CE and then latterly to Plus have been uneventful.One morning I found broadband was down (it turned out that one of the reasons was that my ISP had changed my DSL login without warning - doh!)
Faced with no internet, there was no response from Pfsense via the network or physical VGA console after keypresses.
So I rebooted - but it seemed to be stuck on boot. I rebooted again with no luck - see photo for the second time it failed. 2023-09-23 09.38.39 (75%).jpgThen I tried to clone the boot usb stick with Balena Etcher - but it failed with 'unknown error'.
Realising that I'd have to install from scratch I tried to find the download for Plus on https://www.pfsense.org/download/ but I could only find CE, so had to go with that.
I created the usb boot install and then installed pfsense back onto the original USB stick (I know, but I had no spare available), restored from a manually saved xml backup, upgraded to Plus again and everything seems fine. I realise the usb stick could be going bad so I need to check that somehow.
Also I'd completely forgotten about the Auto Config Backup feature so I could have just restored from that.But to protect against this happening again, what should I do for next time?
(I need to get plenty of spare USB sticks and write memo to my son to request he desist from nicking them!)-
How can I (periodically) check the current boot usb stick for media errors?
-
Should I clone the usb boot stick (clone function is under system/boot environments)? Or is it preferable to install fresh and restore from backup?
-
Where are the Plus install downloads?
-
-
@mr-brunes said in Disaster recovery process to protect against boot media failure?:
Where are the Plus install downloads?
There is no + download, unless you are on netgate hardware - then you can request install media from TAC.
Will this maybe change in the future - good question..
As to cloning usbs? I don't think I would take that route, I would just have copy of the current version your running, you can always install to some usb stick in a few minutes. I have a quite a few "spares" brand new un opened. I buy them when I see a good price because I like to have around in case I want to give someone a bunch of something and its easier and simpler to just let them keep the stick..
As to backup - not a bad idea to have a backup config, and sure you could/should have ACB running.
-
If you want to boot off of a USB stick I think a viable approach is to build a system to work like you want, then clone the stick and put it somewhere safe.
Going forward backup the config on a regular basis. When the next one fails, make a new clone from the safely stored copy and boot. Once it is up, reload the latest config backup.USB sticks are disposable and I have not seen one that has health information. USB SSDs do have health information and will last far longer due to TRIM being used.
-
@mr-brunes I got to say that I like your subject...sounds official...disaster recovery despite appearing over-thinking. As John said, having a bootable current version of pfSense and a backup of the latest configuration are sufficient.
-
Yup I would just image the USB drive if you are going to do that. But running from USB is generally not a great option.
If you really have to though I would install as UFS and enable RAM disks to minimise drive writes.
Steve
-
Tx for all the replies - useful info!
My DR terminology stems from working with enterprise systems planning i.e. what the recovery plan is when (not if) a single point of failure or non-FT component fails. It's especially relevant when kit is installed at remote (dark) sites where there is no one with skills to do a re-install. (This is effectively the situation when I'm away!) Since the reinstall entails obtaining the image (not even possible with Plus when using non-Netgate h/w), flashing it onto a USB stick, reinstallation of PFsense onto another boot device, booting and then restoring a manual config. (since the ACB restore doesn't appear to be available without Pfsense back up and running). It's a fair number of steps, not to mention made harder by potentially losing internet access!
That is why I thought cloning the boot device would be much easier.
I looked at the Boot Environments / Clone boot environment feature but it seems to be concerned with making a snapshot backup of something concerned with the boot process, rather than cloning the boot device.
Other systems can take a snapshot image of the boot media and store it remotely which is very handy.
I guess with Pfsense one would have to clone the boot device while the sytem was down.In terms of monitoring the boot device for errors, the SMART status tools don't know how to interpret the USB bus device
[code]
smartctl 7.3 2022-02-28 r5338 [FreeBSD 14.0-CURRENT amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org/dev/da0: Unknown USB bridge [0x0781:0x5583 (0x100)]
Please specify device type with the -d option.Use smartctl -h to get a usage summary
[/code]
My Linux is pretty rusty so will have to do some digging on this aspect ... I'm looking for an equivalent of Windows' 'chkdsk /r' (scan drive for bad sectors and recover them if possible)As for using USB as a boot device, at least it is easily swappable, if not ideal for repeated write environments as wear levelling is not available. On the latter I've not seen a write up of how Pfsense uses the boot device media, or if it runs mostly in RAM (notwithstanding the package differences). Will have to check out the UFS and RAM disk options as that sounds interesting given the default fs is ZFS.
What is the recommended device (and why) for simple setups? I couldn't see anything in the H/W part of the docs. -
@mr-brunes Try this and related pages:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-check.html
ZFS supports RAID:
https://docs.netgate.com/pfsense/en/latest/install/install-zfs.html -
@SteveITS that fs and disk troubleshooting has lots of very useful info - shame it's buried there!