Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Disaster recovery process to protect against boot media failure?

    Scheduled Pinned Locked Moved General pfSense Questions
    8 Posts 6 Posters 829 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mr-brunes
      last edited by

      Hi

      Summary:
      Is it possible to check the boot media e.g. usb stick from within pfsense?

      Background:
      My pfsense firewall (Intel J3455 cpu 4GB booting off a Sandisk USB stick) has run for several years with no unplanned downtime, and a handful of upgrades in CE and then latterly to Plus have been uneventful.

      One morning I found broadband was down (it turned out that one of the reasons was that my ISP had changed my DSL login without warning - doh!)
      Faced with no internet, there was no response from Pfsense via the network or physical VGA console after keypresses.
      So I rebooted - but it seemed to be stuck on boot. I rebooted again with no luck - see photo for the second time it failed. 2023-09-23 09.38.39 (75%).jpg

      Then I tried to clone the boot usb stick with Balena Etcher - but it failed with 'unknown error'.

      Realising that I'd have to install from scratch I tried to find the download for Plus on https://www.pfsense.org/download/ but I could only find CE, so had to go with that.
      I created the usb boot install and then installed pfsense back onto the original USB stick (I know, but I had no spare available), restored from a manually saved xml backup, upgraded to Plus again and everything seems fine. I realise the usb stick could be going bad so I need to check that somehow.
      Also I'd completely forgotten about the Auto Config Backup feature so I could have just restored from that.

      But to protect against this happening again, what should I do for next time?
      (I need to get plenty of spare USB sticks and write memo to my son to request he desist from nicking them!)

      • How can I (periodically) check the current boot usb stick for media errors?

      • Should I clone the usb boot stick (clone function is under system/boot environments)? Or is it preferable to install fresh and restore from backup?

      • Where are the Plus install downloads?

      johnpozJ NollipfSenseN 2 Replies Last reply Reply Quote 0
      • johnpozJ
        johnpoz LAYER 8 Global Moderator @mr-brunes
        last edited by

        @mr-brunes said in Disaster recovery process to protect against boot media failure?:

        Where are the Plus install downloads?

        There is no + download, unless you are on netgate hardware - then you can request install media from TAC.

        Will this maybe change in the future - good question..

        As to cloning usbs? I don't think I would take that route, I would just have copy of the current version your running, you can always install to some usb stick in a few minutes. I have a quite a few "spares" brand new un opened. I buy them when I see a good price because I like to have around in case I want to give someone a bunch of something and its easier and simpler to just let them keep the stick..

        As to backup - not a bad idea to have a backup config, and sure you could/should have ACB running.

        An intelligent man is sometimes forced to be drunk to spend time with his fools
        If you get confused: Listen to the Music Play
        Please don't Chat/PM me for help, unless mod related
        SG-4860 24.11 | Lab VMs 2.8, 24.11

        1 Reply Last reply Reply Quote 1
        • AndyRHA
          AndyRH
          last edited by

          If you want to boot off of a USB stick I think a viable approach is to build a system to work like you want, then clone the stick and put it somewhere safe.
          Going forward backup the config on a regular basis. When the next one fails, make a new clone from the safely stored copy and boot. Once it is up, reload the latest config backup.

          USB sticks are disposable and I have not seen one that has health information. USB SSDs do have health information and will last far longer due to TRIM being used.

          o||||o
          7100-1u

          1 Reply Last reply Reply Quote 1
          • NollipfSenseN
            NollipfSense @mr-brunes
            last edited by

            @mr-brunes I got to say that I like your subject...sounds official...disaster recovery despite appearing over-thinking. As John said, having a bootable current version of pfSense and a backup of the latest configuration are sufficient.

            pfSense+ 23.09 Lenovo Thinkcentre M93P SFF Quadcore i7 dual Raid-ZFS 128GB-SSD 32GB-RAM PCI-Intel i350-t4 NIC, -Intel QAT 8950.
            pfSense+ 23.09 VM-Proxmox, Dell Precision Xeon-W2155 Nvme 500GB-ZFS 128GB-RAM PCIe-Intel i350-t4, Intel QAT-8950, P-cloud.

            1 Reply Last reply Reply Quote 1
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Yup I would just image the USB drive if you are going to do that. But running from USB is generally not a great option.

              If you really have to though I would install as UFS and enable RAM disks to minimise drive writes.

              Steve

              M 1 Reply Last reply Reply Quote 1
              • M
                mr-brunes @stephenw10
                last edited by mr-brunes

                Tx for all the replies - useful info!

                My DR terminology stems from working with enterprise systems planning i.e. what the recovery plan is when (not if) a single point of failure or non-FT component fails. It's especially relevant when kit is installed at remote (dark) sites where there is no one with skills to do a re-install. (This is effectively the situation when I'm away!) Since the reinstall entails obtaining the image (not even possible with Plus when using non-Netgate h/w), flashing it onto a USB stick, reinstallation of PFsense onto another boot device, booting and then restoring a manual config. (since the ACB restore doesn't appear to be available without Pfsense back up and running). It's a fair number of steps, not to mention made harder by potentially losing internet access!

                That is why I thought cloning the boot device would be much easier.
                I looked at the Boot Environments / Clone boot environment feature but it seems to be concerned with making a snapshot backup of something concerned with the boot process, rather than cloning the boot device.
                Other systems can take a snapshot image of the boot media and store it remotely which is very handy.
                I guess with Pfsense one would have to clone the boot device while the sytem was down.

                In terms of monitoring the boot device for errors, the SMART status tools don't know how to interpret the USB bus device
                [code]
                smartctl 7.3 2022-02-28 r5338 [FreeBSD 14.0-CURRENT amd64] (local build)
                Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

                /dev/da0: Unknown USB bridge [0x0781:0x5583 (0x100)]
                Please specify device type with the -d option.

                Use smartctl -h to get a usage summary
                [/code]
                My Linux is pretty rusty so will have to do some digging on this aspect ... I'm looking for an equivalent of Windows' 'chkdsk /r' (scan drive for bad sectors and recover them if possible)

                As for using USB as a boot device, at least it is easily swappable, if not ideal for repeated write environments as wear levelling is not available. On the latter I've not seen a write up of how Pfsense uses the boot device media, or if it runs mostly in RAM (notwithstanding the package differences). Will have to check out the UFS and RAM disk options as that sounds interesting given the default fs is ZFS.
                What is the recommended device (and why) for simple setups? I couldn't see anything in the H/W part of the docs.

                S 1 Reply Last reply Reply Quote 0
                • S
                  SteveITS Galactic Empire @mr-brunes
                  last edited by

                  @mr-brunes Try this and related pages:

                  https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-check.html

                  ZFS supports RAID:
                  https://docs.netgate.com/pfsense/en/latest/install/install-zfs.html

                  Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                  When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                  Upvote 👍 helpful posts!

                  M 1 Reply Last reply Reply Quote 1
                  • M
                    mr-brunes @SteveITS
                    last edited by

                    @SteveITS that fs and disk troubleshooting has lots of very useful info - shame it's buried there!

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.