ZFS boot environment automatic fallback and recovery. documentation? can it be controlled?
-
So I was having problems with shutting down and starting my Netgate 4200. I thought I'd somehow did a factory reset but only later did I figure out I'd booted into a different BE.
I've been trying to find documentation or posts on this and so far it's seems to be slim pickings. The only decent one I've found is this 9:34 netgate video about new ZFS BE features in 24.03, roughly 3:44 - 7:02 talks about this feature.
https://www.netgate.com/resources/videos-zfs-boot-environments-pfsense-plus-24.03I think the only control you have of this is when rebooting into a different BE where it will automatically reboot into the previous BE if you don't log in and disarm the count down. Kind of a anti lockout feature.
I haven't seen a way to remove BEs from this feature, so it seems like you should make sure all BE's are production ready or delete them after testing. This feels like something that should be an opt in feature. Am I missing something?
-
@FlyingBean on what pfSense version are you?
Have you read through the documentation about snapshots?
https://docs.netgate.com/pfsense/en/latest/backup/zfsbe/how.html
How to manage (create and delete):
https://docs.netgate.com/pfsense/en/latest/backup/zfsbe/gui.html
-
I've read about 3-4 times now. I see it describing how the feature is used automatically for upgrades but not detailing how it is always active waiting for the boot watchdog timer to trigger a reboot and BE change, nor how it can be used when swiching BEs to automatically revert if you didn't log in and disarm the timer ("verify" the BE.)
Edit:
version 25.07.1 on a netgate 4200Edit2:
for completeness sake I should mention I was running 24.11 when the BE switched on me, at the time I decided to just do full fresh install from 25.07.1 via USB drive and only later figured out what happened. -
@FlyingBean said in ZFS boot environment automatic fallback and recovery. documentation? can it be controlled?:
how it can be used when swiching BEs to automatically revert if you didn't log in and disarm the timer ("verify" the BE.)
Are you referring to "Boot Verification", a setting in System / Update / Update Settings?. Where you have a certain time to verify that upgrade was successful otherwise pfSense will boot back into the previous snapshot/BE?
If yes, then I think that this is not enabled by default.
https://docs.netgate.com/pfsense/en/latest/install/upgrade-guide.html#boot-environments
If that setting is not enabled then the it could indicate that the upgrade failed and it reboots into the previous BE.
-
Interesting I thought it was automatic on update not opt in, and after re-reading the documentation it doesn't mention if it enabled by default. (I feel like the phrasing suggests it is)
https://docs.netgate.com/pfsense/en/latest/backup/zfsbe/how.html
Starting with pfSense Plus software version 24.03, this changed to a more efficient and robust procedure: The upgrade process creates a new Boot Environment and performs the upgrade inside that entry before rebooting. It makes sure the upgrade succeeded and then reboots into the newly upgraded environment. It detects any errors during boot and if there is a problem it can automatically roll back to the previous Boot Environment.In the part you were quoting I was referring to is how in
Diagnostics / Rebootwhen you select a different BE you can selectManual Boot Verification.Just to bring it back to what I think is the bigger issue, of automatic BE fallback that is always running and waiting for a the watchdog reboot timer to trigger. Can it be controlled? Selectively disabled?
-
@FlyingBean said in ZFS boot environment automatic fallback and recovery. documentation? can it be controlled?:
of automatic BE fallback that is always running and waiting for a the watchdog reboot timer to trigger. Can it be controlled? Selectively disabled?
I do not know that. The BE fallback is only supposed to happen if the upgrade was not successful, that at least was my understanding.
Maybe somone else has more experience with that. -
@patient0 said in ZFS boot environment automatic fallback and recovery. documentation? can it be controlled?:
@FlyingBean said in ZFS boot environment automatic fallback and recovery. documentation? can it be controlled?:
of automatic BE fallback that is always running and waiting for a the watchdog reboot timer to trigger. Can it be controlled? Selectively disabled?
I do not know that. The BE fallback is only supposed to happen if the upgrade was not successful, that at least was my understanding.
Maybe somone else has more experience with that.Watch this from about 3:44 - 4:30, that's the most relevant part, though it does have some relavance until 7:00.
https://youtu.be/LKtE0zxnF4I?t=224
That video was taken from
https://www.netgate.com/resources/videos-zfs-boot-environments-pfsense-plus-24.03 -
The default is automatic boot verification. So if you rebootit will automatically verify the boot and disable the watchdog. If it fails to boot for some reason it will hit the watchdog and revert to the last known good BE.
You can disable the automatic verification in which case the user must login and manually accept the boot to prevent rolling back the BE.
This happens at upgrade because the reboot during upgrade is set for one-time only so a subsequent reboot will roll back.
To make that happen during a normal reboot (not upgrade) you would need to select the BE to boot into from the BE menu.
Temporarily activate the ZFS Boot Environment one time and reboot
https://docs.netgate.com/pfsense/en/latest/backup/zfsbe/gui.htmlBut it will happen at any boot that fails because that BE is then marked as failed to boot and will not be selected until a user clears that.