Why do upgrades always fail! 22.05 to 23.01 failing.
-
I have a mix of Netgate 3100 and virtual installs, and every time there is an upgrade it turns in to a late night! WHHHHHHHYYYYYYY!!!!
Anyway, now that is off my chest, I am trying to upgrade a 22.05 running on Hyper-V to 23.01 and it keeps failing. Firstly, the upgrade page, which says not to close or refresh, keeps hanging! If you have the dev tools open, you can see the error happening:
Its not the end of the world, as I have found if you switch to the "Update Settings" tab and back again it is still going (until is hangs again a minute later! ).
So when this eventually completes, with the necessary tab-switching\nursing, it reboots and then does this:
At which point I have no option but to hastily switch back to a previous checkpoint and try again. I have tried in every way I can think of (cloned the VM, adjusted RAM, CPU etc.) but this is the same result each time...
P.S. I thought I would install a clean install and restore a backup config, but that installation was no more successful See here
-
-
Ok first I would try upgrading from the CLI using
pfSense-upgrade -d
so you can see where it's actually failing.It may succeed at the CLI. But I not I would roll back the 22.05 snap, set the repo to 22.05 and then force reinstall the required components required for upgrade as shown here:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/upgrades.html#upgrade-not-offered-library-errors
pkg-static clean -ay; pkg-static install -fy pkg pfSense-repo pfSense-upgrade
Then set the repo back 23.01 and retry upgrading from the CLISteve
-
@stephenw10 OK, I ran the upgrade from the CLI (option 13 from the boot screen I assume you mean?) and this also failed in the same way.
During the upgrade, the download of all the packages seemed to complete OK, and after the reboot I tried to watch the many screens of text, but some went past too quick to read. I noticed this error though:
Is the "No space left on device" the issue?
It has a 4Gb VHD, but it isn't very full:
-
That's almost certainly the problem though.
Check if you have any ZFS BE snapshots you can remove.
If you actually run
pfSense-upgrade -d
from the command line you get additional debug output that can help in some situations. Not needed here though in light of that error.Steve
-
@stephenw10 ZFS BE snapshots? How would I find\delete those?
-
I was just trying to recall when we added that to the GUI, so you might see it in the System menu. Otherwise running
bectl list
at the cli will show them. -
@HeMan321 these may help:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-shrink.html
https://docs.netgate.com/pfsense/en/latest/backup/zfsbe/space.html
and to delete:
https://docs.netgate.com/pfsense/en/latest/backup/zfsbe/gui.html@stephenw10 ...which does say, "If the Boot Environment menu entry is missing, the firewall does not support ZFS Boot Environments." Though I suppose it's unclear if that is "current version" or "ever." :)
-
I will also say that 4G is the absolute minimum required for ZFS. Some of the early SG-4860s have 3.6G eMMC (after formatting) and those fail an upgrade with ZFS.
-
@SteveITS said in Why do upgrades always fail! 22.05 to 23.01 failing.:
which does say, "If the Boot Environment menu entry is missing, the firewall does not support ZFS Boot Environments
Hmmm, it's not clear what's going on here...
So on my 3100, running 23.05.1, the menu option is there, but when you go to it, you get this:
Which I guess is fine and clear.
But on my virtual install, both under 22.05 and 23.01 there is no Boot Environments option at all, which might make you assume ZFS BE is not supported, and yet if I do the "zfs list -t snapshot" command I am seeing four entries reported. This means I cannot delete them, as I have no GUI to do it within? Is there a CLI way of deleting them?
Also, can you expand a pfSense partition? I can make the drive bigger than 4GB easy enough, but presumably I would then need to manually expand a partition to use the new space?
-
The 3100 doesn't support ZFS at all it only runs UFS which is why you see that there.
Boot environments have always been available in ZFS there's just no GUI for it in CE. You can use
bectl destroy
to remove snapshots.Yes it's possible to grow the filesystem though I would only try this after backing up.
Run:
touch /root/force_growfs
Then reboot. The system will try to expand the filesystem into the available space at next boot.
Steve
-
Sorry for the delay in getting back; it has been a long few days and then the Netgate forum for reporting problems was, erm, suffering a problem and out for a while!
Anyway, after a combination of deleting the Boot Environments and expanding the partition, first in Hyper-V and then with the "touch /root/force_growfs" command, it finally successfully upgraded from 22.05 to 23.01!
It strikes me that if space is an issue, there shoud be some sort of check and warning prior to starting an upgrade that is going to fail, resulting in a trashed appliance.
And then, all will to live was sucked from me as the horror of seeing another upgrade was due to 23.05.1! So I have to go through all upgrades sequentially do I?! Sigh...
So, with all the space in the world, so no risk of a reoccurrence of the space issue, I thought "what could go wrong...". Well, this is pfSense, so I settled down for another late night!!!
I ran the upgrade from option 13 on the command line interface, and got this:
Well twist my nipple nuts and send me to Alaska! What an (un-)surprise... Several more restore\attempts later having deleted BEs, rebooted, whatever else I could think of and just by chance I launched it again from the web interface instead and hey-presto; it completed successfully!
For no reason I can see, it failed three times in a row from option 13, but then succeeded first time from the web GUI...
Don't get me wrong, I love pfSense and have standardised on Netgate boxes for any routers I am involved in. But this installation\upgrade stuff, for which I have suffered in agony on every upgrade I can remember, going back to 2020 when I first started using pfSense in anger, is really not on. And I have had the same thing with both physical Netgate hardware and VMs, so whilst it would be churlish to moan too much on the CE stuff, this is worse than that.
I guess running a Linux OS with the router stuff on top has some advantages, but whereas other manufacturers can just flash the entire ROM in one, it seems your approach of upgrading hundreds of components individually is just too complicated to be sure one of a million things doesn't go wrong along the way...
-
Hmm, I no explanation as to why it would fail then succeed. I would expect it to either fail or succeed the same way every time.
23.01 is a special case. It was the first version to introduce dynamic package repos. As such everything on an earlier version has to update to 23.01 first and then to the current version.
Steve