Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Replace failed drive with ZFS doesn't work like a true RAID-1 system

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    6 Posts 4 Posters 3.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • cfapressC
      cfapress
      last edited by

      ... maybe I'm doing something wrong ...

      For years we've used pfSense and the GEOM mirror when it was available. On the 2.4 branch we started using ZFS mirroring which looked great.

      Yesterday I attempted to replace a failed drive in the ZFS mirror. My steps were:

      Starting with a powered down pfSense machine:

      1. Physically remove the failed drive
      2. Install the replacement drive
      3. Power up the machine and wait for it to boot from the still functional drive
      4. Reach a console and run the "zpool replace" command to resilver the ZFS array;
      zpool replace zroot /dev/ada1p3 /dev/ada1
      

      That all worked well enough ...

      zpool status
        pool: zroot
       state: ONLINE
        scan: resilvered 849M in 0h1m with 0 errors on Wed Jan  8 08:41:30 2020
      config:
      
             NAME        STATE     READ WRITE CKSUM
              zroot       ONLINE       0     0     0
                mirror-0  ONLINE       0     0     0
                  ada0p3  ONLINE       0     0     0
                  ada1    ONLINE       0     0     0
      
      errors: No known data errors
      

      except when I check the partitions on the Good (ada0) drive compared with the New (ada1) drive.

       gpart show
      =>       40  488397088  ada0  GPT  (233G)
               40       1024     1  freebsd-boot  (512K)
             1064        984        - free -  (492K)
             2048    4194304     2  freebsd-swap  (2.0G)
          4196352  484200448     3  freebsd-zfs  (231G)
        488396800        328        - free -  (164K)
      
      =>       34  488397101  ada1  GPT  (233G)
               34    1023966        - free -  (500M)
          1024000     204800     2  efi  (100M)
          1228800  487168335        - free -  (232G)
      
      

      My expectation is when I resilvered the ZFS mirror that I would have two identical disks - like with GEOM mirroring.

      That's not the case.

      Now, I've read the sticky post about manually setting up partitions and then performing a resilver --- and perhaps that's the correct way to replace a failed drive in a ZFS mirrored system.

      I'm looking for thoughts on what I may be doing wrong or perhaps I should reset my expectations and that having a mirrored ZFS system doesn't mean I can swap drives like used to be done with GEOM - and there's a lot more work to replace a failed drive starting in pfSense 2.4

      And, I'm curious to know how the ZFS mirror is set up during the initial install of pfSense. Perhaps knowing those steps would reveal a clear answer to how we should be replacing failed drives in v2.4 with ZFS mirroring.

      Thanks in advance, Jason

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        @cfapress said in Replace failed drive with ZFS doesn't work like a true RAID-1 system:

        /dev/ada1p3 /dev/ada1

        You replaced one slice with an entire disk, so it did what you asked (which wasn't right). I think you could have just run zpool replace and it would have figured out that it should redo the missing disk.

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        cfapressC 1 Reply Last reply Reply Quote 0
        • cfapressC
          cfapress @jimp
          last edited by

          I did try 'zpool replace' but it wouldn't accept that command without the name of the pool, old device and new device.

          zpool replace
          missing pool name argument
          usage:
                  replace [-f] <pool> <device> [new-device]
          

          I also noted the ZFS pool is only mirroring a partition on the Good (ada0) drive. It's not mirroring the full disk as I was expecting (hoping since that's what GEOM did).

          Is your suggestion to follow the instructions in the forum's sticky thread regarding ZFS when replacing a failed drive - manually partition and then resilver pointing at the explicit partition to be used?

          https://forum.netgate.com/topic/112490/how-to-2-4-0-zfs-install-ram-disk-hot-spare-snapshot-resilver-root-drive

          My experience with ZFS is limited to FreeNAS. They have a GUI that takes care of resilvering boot drives and perhaps they perform some partitioning prior to resilvering too.

          I'm having trouble locating information regarding ZFS and replacing failed drives with pfSense. The documentation is light in terms of ZFS information:
          https://docs.netgate.com/pfsense/en/latest/book/install/perform-install.html

          1 Reply Last reply Reply Quote 0
          • jimpJ
            jimp Rebel Alliance Developer Netgate
            last edited by

            I haven't had to replace one yet myself (or simulated a failure in a VM) but I thought you should be able to just say zpool replace ada1 in your case and it would do the right thing. The "new device" should be optional since the name may not change if you replace the disk exactly the same.

            The docs are sparse (if any exist) for ZFS yet because it's still new and considered experimental.

            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            1 Reply Last reply Reply Quote 0
            • cfapressC
              cfapress
              last edited by

              After additional trials I can confirm that replacing a ZFS mirrored drive is non-trivial.

              My pfSense boxes are old decomissioned Windows workstations. Too old to be useful for Windows but perfect for pfSense routers.

              Here are the steps I took to replace the ZFS mirrored drive.

              1. The new drive must be completely clean, if it was previously partitioned then you need to clear that away prior to installing into the pfSense machine.

              In my case I'm using old HDDs from decomissioned workstations. They have Windows partitions. In particular an EFI partition that's impossible to replace with "gpart". I had to use the Windows "diskpart" tool to unprotect that partition and then destroy it before continuing to Step 2 below.

              1. Plan to manually add all the partitions to the new drive. You should use "gpart" to view the partitions on the Good drive and then partition the New drive identically. An example here, taken from a prior forum post.
              # gpart create -s gpt da2
              # gpart add -a 4k -s 512k -t freebsd-boot -l gptboot2 da2 ###This creates p1, you are using 4k alignment, size is 512k, type is freebsd-boot, label is gptboot2, you are partitioning drive da2
              # gpart add -b 2048 -s 8384512 -t freebsd-zfs -l zfs2 da2 ###This creates p2, you are beginning at block 2048 and stopping at block 8384512
              # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da2 ###This writes the bootcode to p1 of your hot spare
              
              1. Once partitioned you must "zpool remove" the old dead drive partition. The breaks the ZFS mirror.

              Example:

              zpool remove zroot /dev/ada1p3
              
              1. Then "zpool attach" the new good drive. The creates a new ZFS mirror.

              Example:

              zpool attach zroot /dev/ada0p3 /dev/ada1p3
              

              It's important to note that ZFS mirror is online mirroring a portion of the hard drive. The boot and swap partitions are not mirrored. It's not a true RAID-1 situation as I expected, like with GEOM.

              And, I suggest you read about how to use zpool before altering any production systems. Everything I've shown above came from testing with spare hard drives in an old desktop computer.
              REFERENCE: https://www.freebsd.org/cgi/man.cgi?zpool(8)

              1 Reply Last reply Reply Quote 0
              • Sergei_ShablovskyS
                Sergei_Shablovsky
                last edited by

                Try to thinking this way: just buy used on eBay IBM-branded RAID card (or LSI, Dell, Adaptec, MSI, SiliconImage), better PCI-X 2 or 2.1, install two HDD (I suggest bullet-proof Ultrastar 7K3000 3TB 3.5-INCH ENTERPRISE HARD DRIVE SATA model HUA723030ALA640), configure to mirror and sleep well another 5 years. :)

                You cannot write exactly where failure happened (in appliance or desktop), so I decide that You have desktop system.

                Better to spend $50 on card and sleep well 5 years rather spending hours with failed drive and dancing around ZFS.

                —
                CLOSE SKY FOR UKRAINE https://youtu.be/_tU1i8VAdCo !
                Help Ukraine to resist, save civilians people’s lives !
                (Take an active part in public protests, push on Your country’s politics, congressmans, mass media, leaders of opinion.)

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.