Pfsense 2.4 ZFS File System
-
I have a SG-2440 w/128GB msata SSD and have been trying to install 2.4 with ZFS File System. I selected Auto ZFS with non-redundant strip. If will not proceed saying not enough drives selected. How would I get ZFS to installed?
-
After you selected stripe it should take you to a screen listing your disks, you have to select your disk (press spacebar when your disk is highlighted) an asterisk will appear between the brackets "[ * ]" for your disk, then press Enter on OK to proceed, if you just press enter without selecting a disk, then you are trying to install onto 0 disks when there is a 1 disk minimum :).
-
Thanks I figured it would be something simple.
-
I'm trying to figure out how to successfully resilver my pool and reboot after losing a disk in the boot array.
I'm testing it out in a VM, I shutdown the VM, remove a drive from the VM, reboot, resilver. Resilvering always completes successfully.
I set it up as follows:# gpart create -s gpt adaX # gpart add -a 4k -s 512k -t freebsd-boot -l gptbootX adaX # gpart add -t freebsd-zfs -l zfsX adaX # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 adaX
When I go to reboot I get these errors:
ZFS: can only boot from disk, mirror, raidz1, raidz2 and raidz3 vdevs ZFS: can only boot from disk, mirror, raidz1, raidz2 and raidz3 vdevs ZFS: can only boot from disk, mirror, raidz1, raidz2 and raidz3 vdevs ZFS: can only boot from disk, mirror, raidz1, raidz2 and raidz3 vdevs ZFS: i/o error - all block copies unavailable ZFS: can't read MOS of pool pfsense gptzfsboot: failed to mount default pool pfsense
When I run gpart show I see two partitions on each drive, under each partition is a second line that says - free - (xxxK).
On the spare drive that I'm using to resilver onto there is no line including - free - (xxxK), I've attached a screenshot for clarification.So my question is, what am I doing wrong and how can I get ZFS on Root to boot after resilvering?
-
On the third line you're adding the freebsd-zfs partition without any alignment requirement, gpart will happily slap it right after the freebsd-boot partition and that's where the difference comes from. You can use gpart add -b 2048 -t freebsd-zfs -l zfsX adaX instead to make it identical to the other disks.
I don't think that is reason for the boot failure though. Try rewriting all the other bootblocks with 'gpart bootcode' too see if that makes any difference.
-
When using a single drive how do you tell ZFS to keep 2 copies?
-
When using a single drive how do you tell ZFS to keep 2 copies?
You have to do this right after the pool creation before any datasets are created or files are written on it:
zfs set copies=2 zpool
This property is a dataset property but gets inherited by any child datasets so it applies to them as well.
-
@kpa:
On the third line you're adding the freebsd-zfs partition without any alignment requirement, gpart will happily slap it right after the freebsd-boot partition and that's where the difference comes from. You can use gpart add -b 2048 -t freebsd-zfs -l zfsX adaX instead to make it identical to the other disks.
I don't think that is reason for the boot failure though. Try rewriting all the other bootblocks with 'gpart bootcode' too see if that makes any difference.
Thank you, I added the offset to p2 and that matches it up for the first part of the drive. But on the second part of the drive it still has no - free - section, both before and after the resilver. Obviously this stuff is all new to me, but it seems to me like this should match the others after the resilver? Or do I need to simply tell gpart where to stop p2 when creating it in order for them to match? I've attached another screenshot.
zpool status shows a successful resilver every time.
I also tried writing the bootcode to all drives but I'm still getting the same error.
Any more ideas as to why it's failing to boot after resilver?
Once I can get this all figured out I'm planning on writing up a quick howto thread to run through installing pfsense to ZFS on 2.4, and setting up hot spares, zfsd, autoreplace, and resilvering so that anyone can set it up and use it easily and effectively. Everything is going great until I resilver a boot drive and reboot.
-
Resilver is not going to alter the partition table, it's a ZFS internal function that only syncs the pool contents between the redundant components of the pool. The components are in this case partitions.
There is still a glaring difference in the sizes of the freebsd-zfs partitions, on ada0 trough ada2 it's 8384512 sectors but on ada3 it's somehow 8386520 sectors. This would have done the job if you had used it instead of what I gave you earlier:
gpart add -b 2048 -s 8384512 -t freebsd-zfs ada3
Neither number is a full 4GBs though and that probably explains the discrepancy, the first three disks were partitioned by the pfSense installer if I guess right?
Still no idea about the boot error though.
Edit: I totally missed that you have a spare on the pool, da4 (based on your earlier posting on the thread). Maybe you need to remove it because the ZFS bootcode might be probing it as well on boot and doesn't like it for some reason.
-
Thanks for the info, I'll partition it that way.
The pool was automatically partitioned with pfsense installer.
The spare is what is being resilvered. So when I am rebooting the spare is in use in the pool.What I am trying to accomplish is to assign a hot spare, turning autoreplace=on for the pool, set zfsd to start on boot, and install the boot code to p1 of the hot spare so that (in theory?) If the system loses a disk, it will automatically resilver to p2 of the hot spare and reboot properly without any further intervention.
-
I wouldn't use spares on a boot pool, it's not worth the effort and you might run into complications just like this one because at the boot time the ZFS boot code wants to probe every device in the pool. If you still want to use spares on a boot pool the spare must be partitioned properly beforehand for the use and it has to have the ZFS boot blocks just like the other disks in case it is selected as the boot device.
Hot spares is basically a feature for very large data pools with serious availability concerns when a disk breaks and has to be replaced. A firewall/router is hardly such a use case.
-
@kpa:
I wouldn't use spares on a boot pool, it's not worth the effort and you might run into complications just like this one because at the boot time the ZFS boot code wants to probe every device in the pool. If you still want to use spares on a boot pool the spare must be partitioned properly beforehand for the use and it has to have the ZFS boot blocks just like the other disks in case it is selected as the boot device.
Hot spares is basically a feature for very large data pools with serious availability concerns when a disk breaks and has to be replaced. A firewall/router is hardly such a use case.
I'm definitely not trying to use spares as a normal boot pool solution. I want the hot spare(s) to be properly configured to boot ahead of time so that if a boot drive fails and the hot spare is placed into the pool, the system will still be able to boot if it has to.
As I understood it these commands are partitioning the spare and installing the boot blocks to it?# gpart create -s gpt adaX # gpart add -a 4k -s 512k -t freebsd-boot -l gptbootX adaX # gpart add -b 2048 -s 8384512 -t freebsd-zfs -l zfsX adaX # gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 adaX
Obviously, as you pointed out it doesn't seem to be working. I'll try partitioning with a stop at the end of p2 as you suggested and see if that works.
I'm hoping (and assuming) that it is just something that I am messing up, not a feature that just doesn't exist/work.The use case in my mind is if you set up a system somewhere that you won't have frequent access to if you need to replace a bad disk. This way zfs just resilvers the bad disk and in the event that the system needs to reboot for whatever reason, it does and everything works just fine with a fresh disk in the pool until you can get around to replacing it.
About the only ways I could see this making sense is if you have the above restraints on accessing the system AND
1. Need an exceptionally reliable firewall
2. Are on a tight budget and using cheap install media like thumb drives
3. Will literally never physically touch the hardware again and want a system that just works in a closet for a VERY long timeDefinitely fringe cases, and if it is something that just doesn't work (at all or well) with ZFS then it should be avoided, but if it's something that you can do with a few simple commands and it works then it would be useful. Primarily for tight budgets that want to install on thumb drives.
EDIT: Adjusting the partitions to exactly match the rest of the pool still doesn't boot.
Two more things I'm thinking:
1. possibly reinstall bootcode to all devices so that they match up with the new ada?
i.e., remove ada0, the hot spare which was ada4, is now ada3, ada1 = ada0, ada2=ada1, ada3=ada2.
So redo bootcode & labels (although I would think once bootcode is in p1 it doesn't matter what ada it is? and idk how much labels matter for booting?) so that EITHER:
The NEW ada3=ada3, ada0-ada0, ada1=ada1, ada2=ada2
OR
The new ada3=ada0, or whatever place it took in the pool2. gpart list shows "mode" or what appears to be permission, r0w0e0 for all p1's, p2's are different on the spare. Possibly changing this to match the rest, but I don't know why it would matter since it's booting from p1?
If anyone has thoughts on what I'm messing up to get this to work I'd appreciate them!
-
Yes if the spare was ada3 or anything that is on the SATA bus but your spare shows up as 'da4' in your earlier post, you have to adjust your commands for da4. Also the spare can not be just 'da4', it has to be the freebsd-zfs partition on it which is 'da4p2' after partioning. If you read the Sun ZFS documentation you probably thought that the spare would be there as a whole disk and the system would automatically sync the partitions on it as well, this is not the case on FreeBSD.
-
@kpa:
Yes if the spare was ada3 or anything that is on the SATA bus but your spare shows up as 'da4' in your earlier post, you have to adjust your commands for da4. Also the spare can not be just 'da4', it has to be the freebsd-zfs partition on it which is 'da4p2' after partioning. If you read the Sun ZFS documentation you probably thought that the spare would be there as a whole disk and the system would automatically sync the partitions on it as well, this is not the case on FreeBSD.
OK thanks I'll give that a shot, I really appreciate your help! I did read Suns documentation and eventually figured that out about the partitions (and some other differences), that's not the way it's setup anymore.
And ignore the earlier post, that's my actual pfsense install. I'm troubleshooting all of this on a VM before I do anything with my actual system. The real box doesn't use a hot spare like this.
On the VM all of the drives are adaX. -
@kpa:
When using a single drive how do you tell ZFS to keep 2 copies?
You have to do this right after the pool creation before any datasets are created or files are written on it:
zfs set copies=2 zpool
This property is a dataset property but gets inherited by any child datasets so it applies to them as well.
Thanks for the answer.
Setting this at the command line - is it permanent or does in need to be put into a file?
-
@kpa:
When using a single drive how do you tell ZFS to keep 2 copies?
You have to do this right after the pool creation before any datasets are created or files are written on it:
zfs set copies=2 zpool
This property is a dataset property but gets inherited by any child datasets so it applies to them as well.
Thanks for the answer.
Setting this at the command line - is it permanent or does in need to be put into a file?
Zfs properties are stored in the pool metadata, they are permanent.
-
Thanks again.
Do you believe there is any benefit in setting copies to 2? I've been reading a bit about it and from what I've read it has an impact on speed.
Also when the page I was reading tested its ability to stop corruption and while it reduced the amount of corruption it didn't eliminate it. So I't left me under the impression that it would have some merrit where important data of photos are stored, but not so much on a firewall.Would I be correct?
-
Any kind of reduncancy has at least a small effect on write speeds, it's unavoidable because the data has to be duplicated somehow be it a straight second copy or some kind of parity system you have on raid-z. Two copies is not a bad idea on a single disk if you can't use two disk mirror for some reason, it can save your bacon because disks don't usually blow up completely just like that but start to slowly develop a bad sector here and bad sector there and it's very unlikely that with two copies of the same data you lose both copies at the same time.
-
Well I tried modifying the labels on all of the partitions to match their ada#, and reinstalled the bootcode to each p1. Nothing changed though, still same boot error.
gpart modify -l gptboot0 -i 1 ada0 gpart modify -l zfs0 -i 2 ada0 gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0 #and so on for the rest of the drives
Any more ideas? It seems like this should be doable?
-
Finally figured it out. Detach the bad drive from the pool and the hot spare becomes a permanent part of the pool.
I detached the bad disk after resilvering was complete, the hot spare became part of the pool and reboots as if nothing changed. I kept removing two more disks that were part of the original pool and it still boots great with the hot spare and one disk from the original pool.
zpool detach poolname baddiskname
https://blogs.oracle.com/eschrock/entry/zfs_hot_spares
If you want a hot spare replacement to become permanent, you can zpool detach the original device, at which point the spare will be removed from the hot spare list of any active pools.
So at the end of the day it looks like it could potentially be useful to replace a boot disk remotely so long as you SSH in and offline the bad disk. Or if you wrote some sort of script that would offline the bad disk after resilvering was complete. But that's beyond me.