Fun with zfs , snapshots and rollback
-
NOTE: If you have pfSense plus I recommend you use the new excellent GUI "snapshot feature" under System --> Boot Environment.
have been reading the thread here
https://forum.netgate.com/topic/95148/pc-engines-apu2-experiences/577?_=1627030893766And that sparked my interest for trying zfs snapshots.
Initially I went with the "Checkpoint method in the above thread" , but follwing the instructions in the thread ,didn't work for me.
I was always getting:
# zpool import zroot cannot import 'zroot': pool may be in use from other system use '-f' to import anyway # zpool import -f zroot cannot mount '/zroot': failed to create mountpoint
I decided to go with the snapshots instead , and think i have found a "Clean way" to rollback a snapshot, when booting from (here a 2.4.5-p1) USB install stick , and select the rescue shell.
Prereqs:
pfSense has to have been installed on a zfs filesystem
User must be familiar with ssh and "console/shell commands"Begin:
ssh as admin into your pfSense Box (Select 8 in the menu to get a shell)Pre snapshot: Find the zfs pool name (here it's zroot)
zfs list NAME USED AVAIL REFER MOUNTPOINT zroot 541M 54.7G 88K /zroot zroot/ROOT 525M 54.7G 88K none zroot/ROOT/default 525M 54.7G 525M / zroot/tmp 184K 54.7G 184K /tmp zroot/var 10.2M 54.7G 10.2M /var
Set snapshot listing on (on the above found name)
zpool set listsnapshots=on zroot
1: Make recursive snapshot (The part after the @ is the snapshot name ... Of your choice)
zfs snapshot -r zroot@2.4.5-p1
1.a: Make sure the snapshots for all "partitions are taken" , note the snapshot name after the @
zfs list NAME USED AVAIL REFER MOUNTPOINT zroot 541M 54.7G 88K /zroot zroot@2.4.5-p1 0 - 88K - zroot/ROOT 525M 54.7G 88K none zroot/ROOT@2.4.5-p1 0 - 88K - zroot/ROOT/default 525M 54.7G 525M / zroot/ROOT/default@2.4.5-p1 0 - 525M - zroot/tmp 184K 54.7G 184K /tmp zroot/tmp@2.4.5-p1 0 - 184K - zroot/var 10.2M 54.7G 10.2M /var zroot/var@2.4.5-p1 0 - 10.2M -
Now you have made a snapshot named : 2.4.5-p1 , and can revert to that if needed.
2: Rollback to snapshot (Boot from Install USB , and select : Recovery)
2.a: Make a "ZFS Root" mountpoint in the tmpfs (Seems like USB disk is "ro")
mkdir /tmp/mnt
2.b: Import the "ZFS Root"
zpool import -f -o altroot=/tmp/mnt zroot
2.c: Rollback the interesting partitions (We can discard the tmp partition) & reboot after rollback.
zfs rollback zroot/var@2.4.5-p1 zfs rollback zroot/ROOT/default@2.4.5-p1 zfs rollback zroot/ROOT@2.4.5-p1 zfs rollback zroot@2.4.5-p1 shutdown -r now
3: pfSense should now boot up and be excactly in the state when the snapshot was taken.
DoneCleanup snapshot if desired.
Destroy/Delete (recursive) the snapshotzfs destroy -r zroot@2.4.5-p1
!!! Rollback on a running remote system !!!
This is DANGEROUS and NOT RECOMMENDED, and might fail , and you will need to power off/on (Reset) the system afterwards, as the shutdown doesn't seem to be executed.
This method is "Pulling the carpet" under all diskpartitions except tmp , and causes numerous coredumps during/after the rollback.
Only use it you are "DesperateBut during my tests (6..8) "SSH remote rollbacks" , the system came up functioning , after a "hard remote system poweroff/poweron".
You need to be able to do a (Reset) poweroff/poweron either physically or via some "Smart power device - APU ??") - Remember your pfSense network & VPN's will be down.1: The rollback commands have to be executed from a "shell script" .. Do NOT try to paste them.
NB : The pfSense Box will drop the SSH connection , and prob. all network access will be lost.Rollback the interesting partitions (We can discard the tmp partition) & reboot after rollback.
1.a: Login via ssh , as admin
Put the below commands in a shell script file Ie. x.shzfs rollback zroot/var@2.4.5-p1 zfs rollback zroot/ROOT/default@2.4.5-p1 zfs rollback zroot/ROOT@2.4.5-p1 zfs rollback zroot@2.4.5-p1 shutdown -r now
1.b: Make x.sh executable
chmod +x x.sh
1.c: Execute the commands in the newly created shell script x.sh
./x.sh
1.d: Now wait 60 sec , and then power off/on the system
Cross your fingers and pray it comes up on the old "snapshot"Here is a "Console snip" during the rollback.
pfSense is NOT happy - But it has booted the roolback system every time after a "Reset" in my tests (8..10 now i think).login: pid 35748 (openvpn), jid 0, uid 0: exited on signal 11 (core dumped) ovpnc1: link state changed to DOWN pid 83413 (ladvd), jid 0, uid 142: exited on signal 11 pid 49695 (cat), jid 0, uid 0: exited on signal 11 (core dumped) pid 762 (devd), jid 0, uid 0: exited on signal 11 (core dumped) pid 49710 (sshg-parser), jid 0, uid 0: exited on signal 11 pid 50002 (sshg-blocker), jid 0, uid 0: exited on signal 11 pid 83262 (ladvd), jid 0, uid 0: exited on signal 11 (core dumped) pid 50370 (sh), jid 0, uid 0: exited on signal 11 (core dumped) pid 88734 (zabbix_agentd), jid 0, uid 122: exited on signal 11 pid 24383 (ntpd), jid 0, uid 0: exited on signal 11 (core dumped) pid 87613 (zabbix_agentd), jid 0, uid 122: exited on signal 11 pid 50091 (sh), jid 0, uid 0: exited on signal 11 (core dumped) pid 7911 (pcscd), jid 0, uid 0: exited on signal 11 (core dumped) pid 37589 (filterlog), jid 0, uid 0: exited on signal 11 (core dumped) pflog0: promiscuous mode disabled pid 88170 (zabbix_agentd), jid 0, uid 122: exited on signal 11 pid 90915 (dpinger), jid 0, uid 0: exited on signal 11 (core dumped) pid 55087 (bsnmpd), jid 0, uid 0: exited on signal 11 (core dumped) pid 49586 (sh), jid 0, uid 0: exited on signal 11 (core dumped) pid 90281 (dpinger), jid 0, uid 0: exited on signal 11 (core dumped) pid 337 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped) pid 3318 (minicron), jid 0, uid 0: exited on signal 11 (core dumped) pid 3062 (minicron), jid 0, uid 0: exited on signal 11 (core dumped) pid 14668 (getty), jid 0, uid 0: exited on signal 11 (core dumped)
Right now this works for me.
But i might try out the "Checkpoint" using the alternate mount , at a later point.Hope this helps , and explains why zfs is such a nice thing to use.
I'm a linux guy , so don't ask me any BSD related questions
And if you don't know ssh & vi don't even try this.Edit: All test made on 2.4.5-p1 (snapshot + rollback base) and 2.5.2 as the "upgrade"
/Bingo
-
@bingo600 said in Fun with zfs , snapshots and rollback:
zpool set listsnapshots=on zroot
Just made a snapshot on version 2.5.2
Since Netgate changed the zfs root-name from zroot to pfSense
On the new 2.5.2 CE version. And made some other zfs changes.I decided to make a full reinstall of my "boxes", booting from a 2.5.2 USB stick , and reinstalling from scratch.
This is the new layout on my boxes
root: zfs list NAME USED AVAIL REFER MOUNTPOINT pfSense 1.02G 222G 96K /pfSense pfSense/ROOT 800M 222G 96K none pfSense/ROOT/default 800M 222G 800M / pfSense/cf 5.58M 222G 96K /cf pfSense/cf/conf 5.48M 222G 5.48M /cf/conf pfSense/home 212K 222G 212K /home pfSense/tmp 476K 222G 476K /tmp pfSense/var 228M 222G 3.37M /var pfSense/var/cache 120K 222G 120K /var/cache pfSense/var/db 223M 222G 223M /var/db pfSense/var/empty 96K 222G 96K /var/empty pfSense/var/log 880K 222G 880K /var/log pfSense/var/tmp 136K 222G 136K /var/tmp
I just ran the above commands with the new zfs root , names pfSense
zfs list zpool set listsnapshots=on pfSense zfs snapshot -r pfSense@2.5.2
Here's the layout after the snapshot.
/root: zfs list NAME USED AVAIL REFER MOUNTPOINT pfSense 1.02G 222G 96K /pfSense pfSense@2.5.2 0 - 96K - pfSense/ROOT 800M 222G 96K none pfSense/ROOT@2.5.2 0 - 96K - pfSense/ROOT/default 800M 222G 800M / pfSense/ROOT/default@2.5.2 0 - 800M - pfSense/cf 5.58M 222G 96K /cf pfSense/cf@2.5.2 0 - 96K - pfSense/cf/conf 5.48M 222G 5.48M /cf/conf pfSense/cf/conf@2.5.2 0 - 5.48M - pfSense/home 212K 222G 212K /home pfSense/home@2.5.2 0 - 212K - pfSense/tmp 476K 222G 476K /tmp pfSense/tmp@2.5.2 0 - 476K - pfSense/var 230M 222G 3.37M /var pfSense/var@2.5.2 0 - 3.37M - pfSense/var/cache 120K 222G 120K /var/cache pfSense/var/cache@2.5.2 0 - 120K - pfSense/var/db 225M 222G 223M /var/db pfSense/var/db@2.5.2 1.78M - 223M - pfSense/var/empty 96K 222G 96K /var/empty pfSense/var/empty@2.5.2 0 - 96K - pfSense/var/log 952K 222G 880K /var/log pfSense/var/log@2.5.2 72K - 880K - pfSense/var/tmp 136K 222G 136K /var/tmp pfSense/var/tmp@2.5.2 0 - 136K -
I haven't played with restore etc. yet, but expect it to behave as above.
We might (will) have to take the new partitions made in 2.5.2 into consideration./Bingo
-
-
-