Upgrade from 2.6.0 to 2.7.0 / 2.7.1 trashed install...
-
I'm posting my experience here for the record as I think it should be mentioned and to try to understand what could have caused this on 2 HA systems, one physical (2 DL360 Gen8) and 2 virtual (vmware). All systems were cleanly originally installed from 2.6.0 and were upgraded to 2.7.0 shortly after the availability using the regular update process which (apparently) concluded successfully. Both additionally had pfBlocker Development installed (which wasn't touched before the upgrade) with openvpn export wizard and vmware tools for the vm one. On the first HA setup the upgrade from 2.7.0 to 2.7.1 on the secondary box apparenly did complete but then I noticed that pfBlocker wasn't working correctly (the forced update would do anything) and the console showed some libraries missing as shown here:
Also no command options from the console menu would execute apart from the shell so I simply rebooted. I then glimpsed at the ILO console output and I was surprised to see a series of "rm"s deleting whole trees and the boot sequence stopped. Needless to say that after this there was nothing left of the system. Luckily I always save a recent backup config and simply reinstalled from scatch 2.7.1 from the ILO and after 20mins had a working sytem again without issues. Here is what the output looked like after failed reboot (this is a screenshot of the 2.7.0 reboot of the 1st node which wouldn't update to 2.7.1 but the same had happened on the other node after reboot):
I initially thought this was some random buggy behavior so after this secondary node was up and running I went ahead and tried to update the first node to 2.7.1 but alas the same exact thing happened after reboot. Again I went ahead and reinstalled from scratch 2.7.1 and restored from backup (this was on 2 x DL360 Gen 8 physical hosts).
I then tried to update another HA setup, this time 2 vms in vmware on 2 different hosts - same thing happened on the 2nd node; I updated from 2.6.0 to 2.7.0 in this case (the 2 DL380s were already on 2.7.0 for some time) apparently successfully, then 2.7.1 was proposed so I went ahead and did that but after reboot same thing, trashed system which I had to reinstall. So this time on the first HA node before doing anything I just rebooted it which it did and came back up without issues. This time however it was not proposing the 2.7.1 update as before and it was reporting 2.7.0 as current. So I went ahead and ran a "certctl rehash" and shortly after this 2.7.1 was proposed so I ran it and it updated successfully this time and rebooted without issues.
I suspect that after the 2.7.0 update something (pending script operation) is not completing properly and is getting left behind, possibly a clean flag or something which if re-rebooted does properly complete (I recall seeing some cleaning prompts after I rebooted the 2.7.0 box the second time before running the 2.7.1 update). I advise to stay on the sfae side that after every update especially from 2.6.0. to 2.7.0 an additional reboot is performed keeping an eye on the console to see if there are indeed in house cleaning operations performed and completed and only then update to 2.7.1.
Anyhow I thought I'd share my experience - just be careful if you are updating to 2.7.1 and haven't yet rebooted your box do this before you perform the update. I understand that for small/SOHO environments reboots are more frequent for various reasons so the issue may not come to light but in more corporate environments or HA setups the time before a system is rebooted can span months and triggered only due to upgrades which is exactly my situation. Maybe someone can shed more light on this update issue and add additional checks to avoid this in the future (I have noticed several posts lately about trashed or unresponsive systems after a 2.7.0/2.7.1 reboot, maybe something is up).
-
Do you have a complete upgrade log from any of those failed upgrades? Or any way to get one, like rolll back and reupgrade maybe?
-
@stephenw10 unfortunately I have no logs (the nightly maintenance time window was limited) - I only have these console screenshots. What I can add is that when I ran the 2.7.1 installer and attempted a backup config recovery before installing it didn't find any as if everything was wiped clean. I may still have a 2.6.0 replica restore point of the vm but it's a production system so I'm not so keen on messing with it now that it's all updated and running
-
If I'm not wrong, I'm a new victim of the upgrade process from 2.6 to 2.7.
The package manager was throwing errors so I scheduled a system maintenance for the upgrade. Launched the upgrade and after it rebooted, a message saying something about missing dependencies for php was shown.
So I rebooted and then happened the same as the OP. (I've tried to add some screenshots)
It was 3am when I managed to raise VM from another server and overwrite the configuration with the backup, as the site didn't had inter-vlan nor internet communications.The old VM is still there, but it seems to have the disk completely messed.
-
@SeRiusRod Can you confirm that you hadn't rebooted 2.6.0 before launching the upgrade to 2.7.0? In my case my "trashed" systems were on 2.7.0 but were throwing missing libcrypto library errors as shown above and hadn't been rebooted. I noticed on one recent update that after I rebooted (now I reboot in any case) before launching the upgrade I saw some "in house cleaning" operations done on the console while booting and after having rebooted it updated to 2.7.1 without issues after running certctl rehash as the update was no longer proposed. This may or may not have something to do with the culprit but it kind of makes sense as it appears after reboot it reverts back to some previous certificate state as if the update process wasn't completed cleanly (hence the missing libs & rms).
-
@IT_Luke Yes, I didn't reboot the system before applying the upgrade.