Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    zfs zpool status DEGRADED - correct procedure to replace the failed disk ?

    Scheduled Pinned Locked Moved Hardware
    12 Posts 3 Posters 3.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      Alactus @mer
      last edited by

      @mer Its a Dell R210 1u server, there arn't any spare sata ports (i've spare disks sat on the shelf)

      Because of its 1u nature it will have to be powered off to get the faulted drive out the case, sad that dell never put 2.5" drive sleds in this design; knew i should have have nabbed the hp 1u server rather then this :D

      I did try looking for answers on this, but they all point to systems that have disks that are spares already, really wish Netgate had the option in the ui that said replace failed disk etc; i'll go poke that google search you suggest.

      Thanks

      Drew

      1 Reply Last reply Reply Quote 0
      • A
        Alactus @mer
        last edited by

        @mer

        So first hit on google looks to be just the ticket

        https://www.adminbyaccident.com/freebsd/how-to-freebsd/how-to-replace-a-disk-on-a-zfs-mirror-pool/

        Slight tweak for the swapsize but it seems to cover the task needed, thanks

        1 Reply Last reply Reply Quote 1
        • A
          Alactus
          last edited by

          @alactus said in zfs zpool status DEGRADED - correct procedure to replace the failed disk ?:

          https://www.adminbyaccident.com/freebsd/how-to-freebsd/how-to-replace-a-disk-on-a-zfs-mirror-pool/

          So replaced the disk and followed the guide but its not right - ended up with two ada1p3 and for some reason the zpool status in pfsense doesn't give the option to remove this.

          pool: zroot
          state: DEGRADED
          status: One or more devices could not be opened. Sufficient replicas exist for
          the pool to continue functioning in a degraded state.
          action: Attach the missing device and online it using 'zpool online'.
          see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
          scan: resilvered 3.83G in 00:01:58 with 0 errors on Mon Apr 17 16:00:44 2023
          config:

          NAME        STATE     READ WRITE CKSUM
          zroot       DEGRADED     0     0     0
            mirror-0  DEGRADED     0     0     0
              ada0p3  ONLINE       0     0     0
              ada1p3  UNAVAIL      0     0     0  cannot open
              ada1p3  ONLINE       0     0     0
          

          errors: No known data errors

          I am tempted to just back the cfg up and install fresh

          A 1 Reply Last reply Reply Quote 0
          • A
            Alactus @Alactus
            last edited by

            So

            zpool detach zroot ada1p3

            This is odd, would think it would give you the guid of the disk to remove (like the guide seem to suggest, it feels like the zfs support in pfsense is a little half baked ?

            pool: zroot
            state: ONLINE
            status: Some supported and requested features are not enabled on the pool.
            The pool can still be used, but some features are unavailable.
            action: Enable all features using 'zpool upgrade'. Once this is done,
            the pool may no longer be accessible by software that does not support
            the features. See zpool-features(7) for details.
            scan: resilvered 3.83G in 00:01:58 with 0 errors on Mon Apr 17 16:00:44 2023
            config:

            NAME        STATE     READ WRITE CKSUM
            zroot       ONLINE       0     0     0
              mirror-0  ONLINE       0     0     0
                ada0p3  ONLINE       0     0     0
                ada1p3  ONLINE       0     0     0
            

            errors: No known data errors

            Think the lesson here is to back the cfg up and reinstall perhaps

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              The 'features not enabled' is just an artifact of how the ZFS pool is created in the installer and should not be any issue here. You can run the upgrade to enable them if you want.

              That guide seems to have additional steps, though they explain why.
              In the past I have used this: https://farrokhi.net/posts/2020/05/replacing-a-faulty-disk-in-zfs/

              We should have our own docs for that though, I'll open a request.

              A 1 Reply Last reply Reply Quote 3
              • A
                Alactus @stephenw10
                last edited by

                @stephenw10

                The thing that threw me was that i had just created ada1p* and attached ada1p3 to the pool, to then detach the same 'device' [even though it isn't the same device anymore] it didn't feel like the right command to run.

                I guess zfs support in pfsense doesn't expose the raw names for the device ? - different feature sets i guess ?

                Maybe the correct procedure would have been to run zpool replace option, but with the fact there is nothing official from netgate i was left to try, worked out in the end.

                Slight differences but i don't suspect they will cause a issue (worst case is i install fresh)

                => 40 976773088 ada0 GPT (466G)
                40 1024 1 freebsd-boot (512K)
                1064 984 - free - (492K)
                2048 67108864 2 freebsd-swap (32G)
                67110912 909662208 3 freebsd-zfs (434G)
                976773120 8 - free - (4.0K)

                => 40 976773088 ada1 GPT (466G)
                40 1024 1 freebsd-boot (512K)
                1064 67108864 2 freebsd-swap (32G)
                67109928 909663200 3 freebsd-zfs (434G)

                M 1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Yeah, I wouldn't expect the difference in unformatted space to make a difference.

                  1 Reply Last reply Reply Quote 1
                  • M
                    mer @Alactus
                    last edited by

                    @alactus You've run into why some folk (myself included) prefer to partition devices for ZFS instead of raw devices. "One man's 256GB device is not the same size as another's 256GB device".

                    In general ZFS will be "smallest size" of the devices. You have a 1TB device and add a 3TB device as a mirror: ZFS says "you have a 1TB mirror".

                    The cool thing is:
                    I have a 1TB device, I add a 3TB device to make a mirror. I have a 1TB mirror, wait for it to resilver, then add another 3TB device to the mirror, wait for it to resilver, then remove the 1TB device from the mirror. Now I have a 3TB mirror.

                    yes there are a couple of steps left out, but that is how you can grow a mirror device. I've done it and it's pretty neat.

                    A 1 Reply Last reply Reply Quote 1
                    • A
                      Alactus @mer
                      last edited by

                      Having a poke round in

                      /var/log/bsdinstall_log

                      [23.01-RELEASE][admin@pfSense.localdomain]/var/log: grep "freebsd-boot" bsdinstall_log
                      DEBUG: zfs_create_diskpart: gpart add -a 4k -l gptboot0 -t freebsd-boot -s 512k "ada0"
                      DEBUG: zfs_create_diskpart: gpart add -a 4k -l gptboot1 -t freebsd-boot -s 512k "ada1"
                      [23.01-RELEASE][admin@pfSense.localdomain]/var/log: grep "freebsd-swap" bsdinstall_log
                      DEBUG: zfs_create_diskpart: gpart add -a 1m -l swap0 -t freebsd-swap -s 34359738368b "ada0"
                      DEBUG: zfs_create_diskpart: gpart add -a 1m -l swap1 -t freebsd-swap -s 34359738368b "ada1"
                      [23.01-RELEASE][admin@pfSense.localdomain]/var/log: grep "freebsd-zfs" bsdinstall_log
                      DEBUG: zfs_create_diskpart: gpart add -a 1m -l zfs0 -t freebsd-zfs "ada0"
                      DEBUG: zfs_create_diskpart: gpart add -a 1m -l zfs1 -t freebsd-zfs "ada1"

                      Gives me the exact commands run when i set this up (only found this trick out in my travels of looking this up)

                      Realistic this should all be sat behind a menu in the gui to do this.

                      1 Reply Last reply Reply Quote 0
                      • A
                        Alactus @Alactus
                        last edited by

                        @alactus

                        So just a mini write up of the actions of the above for future reference (so its all in one spot)

                        Assumptions

                        pFsense setup with 2 disks in a zfs mirror, ada0 and ada1 (as seen from the WebUI)

                        One of the disk fails in the mirror, you can see this if you have the WebUI widget on to monitor the disks etc

                        You have backed up your config and you have a usb key with the install image on ready to go again in case of issues

                        You have physically removed the failed disk from the system and replaced it with a new disk of the same size or bigger

                        Enable the option to ssh into the firewall via the WebUI, use your favourite client to ssh into the firewall and get to the root shell

                        zpool status

                        This will show you the status of the zpool mirror, in my case it said it was degraded because of one failed disk

                        We create the partition table on the new disk ada1 (change this for the actual disk in the mirror you are replacing)

                        gpart create -s gpt ada1

                        The sizes in the following commands are all based on my own sizes that got used at the time i installed pFsense on this hardware, if you wish to check the exact size used you can check the install log (bsdinstall_log) that is located in /var/log/

                        example

                        [23.01-RELEASE][admin@pfSense.localdomain]/var/log: grep "freebsd-boot" bsdinstall_log
                        DEBUG: zfs_create_diskpart: gpart add -a 4k -l gptboot0 -t freebsd-boot -s 512k "ada0"
                        DEBUG: zfs_create_diskpart: gpart add -a 4k -l gptboot1 -t freebsd-boot -s 512k "ada1"
                        [23.01-RELEASE][admin@pfSense.localdomain]/var/log: grep "freebsd-swap" bsdinstall_log
                        DEBUG: zfs_create_diskpart: gpart add -a 1m -l swap0 -t freebsd-swap -s 34359738368b "ada0"
                        DEBUG: zfs_create_diskpart: gpart add -a 1m -l swap1 -t freebsd-swap -s 34359738368b "ada1"
                        [23.01-RELEASE][admin@pfSense.localdomain]/var/log: grep "freebsd-zfs" bsdinstall_log
                        DEBUG: zfs_create_diskpart: gpart add -a 1m -l zfs0 -t freebsd-zfs "ada0"
                        DEBUG: zfs_create_diskpart: gpart add -a 1m -l zfs1 -t freebsd-zfs "ada1"

                        Knowing the size you can continue (and the commands, you can change for the ones found in the log if its a different disk etc)

                        Create boot partition

                        gpart add -a 4k -l gptboot1 -t freebsd-boot -s 512k ada1

                        Create swap partition

                        gpart add -a 1m -l swap1 -t freebsd-swap -s 34359738368b ada1

                        Create the partition that will actually be added to the zfs mirror

                        gpart add -a 1m -l zfs1 -t freebsd-zfs ada1

                        in each case ada1 was the disk that had failed in my system, change for the actual one that had failed in yours

                        We can now add this disk (ada1) to the pool.

                        zpool attach zroot ada0p3 ada1p3

                        at this point (if everything is ok) all the data will be copied from ada0p3 to ada1p3 through a process called 're silvering'

                        zpool status will show this.

                        Once the re silver process is done, you need to add the boot code to this zfs boot mirror

                        gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

                        Is the command i had to run for my setup.

                        -i 1 is the partition we are going to add boot code to and ada1 is the disk we are adding it to.

                        To check which is the boot partition (it should be 1 in the case of pfsense but just for your own information) you can run the command gpart show which will list all the disks and the partitions on the disk

                        Once the re-silver is done, the pool might still show a error because of the failed disk still attached, in my case i had to issue the command

                        zpool detach zroot ada1p3

                        Which seems counter because you had just attached ada1p3, well in this case i suspect it knows the original disk is failed and gone and so once the command is run it removed the failed disk and the pool health returns to normal

                        Is this the best way of doing it? possibly not but it worked for this setup and has returned the pool to normal for me; adjust the above commands to fit your own setup.

                        And if in doubt, if you have a copy of your config on a bootable install stick for pfsense, just install the fw again and recover your config that way

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.