Pfsense hangs after replacing hdd from zfs pool
-
Hello,
Pfsense 4.5.1. ZFS with mirror two hard drives (ada0 and ada1)
2nd Hard drive (ada1) fails. Added a new hard drive.
Detached 2nd hard drive and added the new hard drive to pool.
After resilvering, copied boot file to new hard drive. Rebooted and the system boots normally. So far so good.I manually removed the 1st (original) drive and tried booting with the new replaced drive. The booting sequence starts. but after "Firewall Configuration done" , the system waits for ever. Pressing any key simply reboots the system. Putting back the 1st drive boots the system normally.
Tried replacing the new drive with another drive... Adding two more new drive... Upgrading the system to 2.5.2 ... but NOPEs the system boots completely only if the 1st drive is attached.
Any Pointers ?
Ashima
-
@ashima said in Pfsense hangs after replacing hdd from zfs pool:
Any Pointers ?
Hi,
I saved (Firefox favourites :)) this when we changed drives in our TrueNAS system, also FreeBSD + ZFS RAID.
It can help you too:
https://forums.freebsd.org/threads/replacing-a-failed-drive-in-an-encrypted-zfs-raidz-array-with-both-boot-and-root-pools.65199/
+++edit:
minus encrypt
-
Thanks @DaddyGo I tried with following commands :
gpart create -s gpt ada1
gpart add -s 409600 -t efi -l zfsefi ada1
gpart add -a 4k -s 512k -t freebsd-boot -l gptboot2 ada1
gpart add -b 411648 -s 4194304 -t freebsd-swap -l zfswap ada1
gpart add -b 4605952 -s 307974144 -t freebsd-zfs -l zfs2 ada1(gpart show -- shows partition of ada1 same as ada0)
zpool attach -f zroot ada0p4 ada1p4
After 100% resilvering
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada1
zpool status --- shows both the drives online.
Similar steps are shown in the link provided by @DaddyGo
minus the encryption.But the system starts booting fine and stops after the sequence "Configuring Firewall rules" (after all my WAN interfaces are detected). I fail to understand the issue.
Any Clue ?
Regards,
Ashima -
Hmmmm,
Then I have to say that something went wrong when rebuilding ZFS RAID on the replaced drive...
(ZFS RAID does this without problems, but there is no way to know)since you write that if you boot from the working drive (orig.) - everything works fine
What I'd do is "save a big one" and build a ZFS RAID from scratch with two new drives and throw the saved pfSense config back on it...
anyway, you're wasting your time, because I don't think there's anything more to say.... - and at least you'll have a new ZFS RAID system with a fresh HW
BTW:
I'm guessing if you bought in pair of RAID elements (HDD / SSD) it's likely that the other one, will have similar problems soon...
-
Yes that seems to be only work around. I shall try that tomorrow morning (its 50 past midnight, here in India)
So I have this system (now upgraded to 2.5.2) with captive portal and freeradius installed. I have around 250 users created with freeradius. I was trying to avoid creating these IDs again.
So what would be best way to take backup of these users and restore in new system.
I have few other packages like openvpn client, shellcmd, sudo, cron and mailreports installed. But these are not much of work. I can configure them again if required.
Thanks for all the input.
-
@ashima
If the system was running zfs under 2.4.x and was updated to 2.5.2, it may be safer to backup and re-install. From my recollection, there were many changes related to zfs in recent versions. This may cause problems with older zfs installs. -
I'm not aware of anything that would cause a problem specifically but I would still re-install clean coming from 2.4.X if you can.
Steve
-
@dotdash Well I was facing the issue with 2.4.5 so thats when I decided to upgrade to 2.5.2. But the problem persist,
Another thing I noticed after running "zpool scrub" my original hard drive gave 13 Checksum error, so I just did "zpool clean" and the error disappeared.
What's surprising is that the booting sequence stops midway
"Configuring Firewall...done" and pressing any key reboots the system. I guess "Generating RRD graph ..." doesn't happen.As suggested by all of you I am planning to do a fresh installation on the two new drives. Is there any way I can transfer the users (freeradius) to new system.
BTW I have plenty of Pfsense sites which I upgraded from 2.4.5 to 2.5.2 remotely and all of them were on zfs. No issues so for. Is there any thing I need to take care for my remaining sites or should I stick to 2.4.5 for the rest of the sites.
Do share your feedback for freeradius users restoring in new system,
Regards,
Ashima -
The Freeradius user data is all stored in the main config file. When you restore that all package data should also be restored.
Steve
-
Thanks @stephenw10 .
So I booted my device with the original drive and took a Cloud Backup (Thanks to pfsense team for providing this facility).
Hooked two new drives. Install a fresh copy, downloaded all the packages and restored configuration from the backup. OMG the device was as original.... all my settings and my users IDs were there. But then I couldn't login through captive portal (using freeradius2 for authentication) page. I guess its not authenticating or some SSL issue. Couldn't get enough time to solve the issue. I guess I'll start a new thread in case I fail to do so.
Meanwhile if someone has any pointers for freeradius2 issue please do reply back. That'll save me lot of work for Monday Morning.Regards,
Ashima -
@ashima said in Pfsense hangs after replacing hdd from zfs pool:
some SSL issue.
My bet would be
and in this case you are not in as much trouble as you think
I wish you a good job
-
@DaddyGo Thanks for response.
So if SSL issue...
I need to create new freeradius certificate and need freeradius2 configuration (I guess EAP tab) to point to new certificates.Is that right ?
-
It depends how/why it's failing. If that config works on the old device I would expect it to work here too. The MAC addresses will be different so clients may see it as a new network. Perhaps they are simply using the wrong stored logins?
First make sure you can authenticate against Freeradius from Diag Auth.
Steve
-
@stephenw10 ... it finally worked.
Created new CA/Certificates for Freeradius.
Created new CA/certificates for Captive Portal.Finally what actually worked :
User Manage : Authentication Server : Selected Radius Server and saved it again.And every thing started working. Kept it under testing (finger crossed)