Boot Environments - unexpected behavior
-
@joedan said in Boot Environments - unexpected behavior:
I can confirm this was the case with me as well for both packages although ntopg I had to actually zero out my config before it worked again. Unfortunately I didn’t have the chance to troubleshoot much. I swapped between 22.05 and 23.01 two times and now back on 22.05 for the moment. I too was surprised that the boot environments didn’t appear to restore everything back to how it was. I put it down to my lack of understanding of how boot environments work.
Thx for the confirmation
It’s very odd and disappointing.
Maybe it’s a known bug?
Anybody @Netgate
-
Hmm, that's odd. I would also have expected it to roll back entirely. Hard to see what those packages would have been hitting.
Do you have the errors you saw?Steve
-
@stephenw10 said in Boot Environments - unexpected behavior:
Hmm, that's odd. I would also have expected it to roll back entirely. Hard to see what those packages would have been hitting.
Do you have the errors you saw?Steve
Yes
pfBlockerNG-devel
had errors:2023-02-10 07:46:43,351|ERROR| [pfBlockerNG]: Failed to load python module 'maxminddb': No module named 'maxminddb' 2023-02-10 07:46:43,351|ERROR| [pfBlockerNG]: Failed to load python module 'sqlite3': No module named '_sqlite3'
It took reinstalling to fix them.
I don't use
ntopng
on regular basis, but what happened was thatService Watchdog
started spamming me with emails thatntopng
stopped and needed to be restarted. I decided to reboot and tried opening it and it would not start, then after it was unsuccessful, tried reinstalling and that actually did not work, then removed it completely and installed it. And that worked.@stephenw10 But I think those errors are kinda unrelated. The main question is - how is it even possible that Boot Environments did not restore a snapshot to whatever exact version it was before!?
-
@chudak re:pfB there were other recent posts with similar errors but related to a package bug not ZFS:
https://forum.netgate.com/topic/177212/pfblockerng-devel-v3-1-0_19-10/28 -
@stephenw10 logged an issue
-
@steveits Boot environments are full filesystem snapshots, so we are VERY likely looking at this issue:
The user created a boot environment snapshot manually (with all services running). That means a snapshot with all files open and locked for write - VERY likely to cause the observed issues because of corrupt files.
If you create a BE snapshot manually - make sure all packages and most/all possible services are stopped before snapshotting. -
@keyser For BE snapshots to be really trustworthy, they should optimally ask for a reboot when the user tries to create a snapshot, and then snapshot in the bootphase - before anything is started.
-
@keyser said in Boot Environments - unexpected behavior:
@steveits Boot environments are full filesystem snapshots, so we are VERY likely looking at this issue:
The user created a boot environment snapshot manually (with all services running). That means a snapshot with all files open and locked for write - VERY likely to cause the observed issues because of corrupt files.
If you create a BE snapshot manually - make sure all packages and most/all possible services are stopped before snapshotting.I see your point. But if this is true and users have to manually stop all packages for the BE to work, it’s completely sacrifice entire idea of BE snapshots.
Let’s wait what @netage guys say.
-
@chudak said in Boot Environments - unexpected behavior:
I see your point. But if this is true and users have to manually stop all packages for the BE to work, it’s completely sacrifice entire idea of BE snapshots.
Let’s wait what @netage guys say.
If I’m not mistaken BE snapshots does what I described automatically when performing an upgrade. As I understand it an upgrade will create a new upgrade snapshot automatically at reboot. So that should work as expected. It’s the manual snapshots that are suceptible to corrupted open files.
-
Hmm, I'm not sure about that. When you run pfSense-upgrade the BE snapshot is created at the very start of the process. It has to be because there are several things that get upgraded before the reboot. If it took the snapshot then when you restored it would be in the middle of the upgrade process.
When you saw this error how old was the snapshot?
I've rolled back to earlier snapshots many times and never seen any errors but in that specific use case. I'll try to test it when I have time.
Steve
-
@stephenw10 said in Boot Environments - unexpected behavior:
Hmm, I'm not sure about that. When you run pfSense-upgrade the BE snapshot is created at the very start of the process. It has to be because there are several things that get upgraded before the reboot. If it took the snapshot then when you restored it would be in the middle of the upgrade process.
When you saw this error how old was the snapshot?
I've rolled back to earlier snapshots many times and never seen any errors but in that specific use case. I'll try to test it when I have time.
Steve
Do you think
Service Watchdaog
could mess things up trying to restart stopped services? E.g. does BE stop serves when/before creating a snapshot? -
No I don't. As far as I know it doesn't stop services before taking the snaps.
It's a boot environment not an instance snapshot like you might do for VM. When you roll back it reboots into it complete with all the usual boot scripts that start the services etc.