23.05 firmware upgrade crashed a 3100 and an 1100
-
I tried to upgrade a clients 3100, from 23.01 to 23.05, remotely, and it became unresponsive. I then tried a 23.05 from 23.01 on an 1100 here, and it will also not boot now.
The 1100 shows:
Setting currdev to zfs:pfSense/ROOT/default:
ERROR cannot open /boot/lua/loader.lua: no such file or directoryI now have two broken routers, and don't know how to fix the local one, or worse, the remote one.
What can I do in this situation?
-
@JSB Did you boot either manually during the upgrade (before the web GUI came back up)? Did they happen to boot ok once but not restart a second time?
One option is to reinstall.
https://docs.netgate.com/pfsense/en/latest/solutions/sg-1100/reinstall-pfsense.html
They usually respond quickly to the ticket.@stephenw10 Similar error… but we were installing via USB/reinstall: https://forum.netgate.com/topic/180432/certificate-verification-failed/7
FWIW I’ve upgraded two 2100s from 23.01 to 23.05 without issue, the only upgrades to .05 I’ve done.
-
@SteveITS no, I selected update from the UI, and let both run for over 40 minutes.
The 3100 was updated from 22.05, to 23.01, not 23.05, my mistake - it stopped responding because the 23.01 update caused the OpenVPN daemon and service to stop - got that fixed.
The 1100 seemed to have a bad block in the MBR.
TAClite sent me the 23.05 manual update, and after I did that, the 1100 came back up.All is well now, thanks!
-
@JSB we had an MBR error too in that thread but we were on site reinstalling via USB because of the small partition. We thought it was a bad USB.
Re: VPN, yeah I’ve read that sort of thing here. I’d suggest allowing your public IP to its WAN:443 during an upgrade to bypass the VPN just in case.
-
The EFI loader must be updated when upgrading to 23.05, or it can fail to mount the zroot. The upgrade of the pfSense-boot package had the ability to silently fail to copy it to the ESP, and this package was also upgraded after the kernel if I recall correctly. I've put changes in to pfSense-upgrade that are now live for 23.01 that should prevent one possible cause for the copy failure (failure to mount the ESP), and moves the pfSense-boot upgrade to the beginning of the upgrade process and aborts the upgrade early if the copy still fails for any reason. This should dramatically improve the situation.
-
@rlinnemann Ah, thanks for digging into it. The weirdest part is it would boot fine the first time.
-
@rlinnemann Also, in our cases it was on a USB stick install. Both cases had the small EFI partition. I've reinstalled on at least one other 2100 without issue but it was my home
and I only had a 4 GB stick so used 22.05 not 23.01 which was current at the time.
-
@SteveITS be careful of leaving USB sticks inserted also, they can share the glabels that we use to identify the ESP on the root device, and can get in the way. I intend to make this more robust in the near future as well.
-
@SteveITS said in 23.05 firmware upgrade crashed a 3100 and an 1100:
@rlinnemann Ah, thanks for digging into it. The weirdest part is it would boot fine the first time.
This is a detail that I had not seen specified before, and potentially explains something that I was puzzling me about the loader failing to mount the zroot, which I did't expect to have been upgraded at the point that the system first reboots. The zpool may be undergoing a backward incompatible change after booting the new kernel for the first time, that would explain this behavior. I'll be looking further into that as well.
-
@rlinnemann Always fun to get new puzzle pieces. Info in the thread I mentioned above, and sort of hijacked: https://forum.netgate.com/topic/180432/certificate-verification-failed/7
-
@rlinnemann re: leaving the stick in, I know for sure my case a while back was not that since it happened after I shut it down and mounted it back on the wall. (What! It was working!) The case a week or two ago was a coworker on a different 2100 but I don’t think that was the issue there.
In my case I assumed it was dying because it didn’t boot after a power outage, though I didn’t even try to diagnose it because it had the small EFI. I also didn’t write down the error since I assumed it was dying.
-
@rlinnemann FWIW a coworker reinstalled the "dead" 2100 with the same 23.05 USB he used a couple weeks ago and it seems to be fine in very limited usage. He's restarted it several times.
-
-
@SteveITS I'd be very surprised if the loader failed to copy on a recovery install. The upgrade failures that we hit updating the EFI loader are mostly or entirely due to complications with live systems that may have arbitrary filesystems mounted, additional storage media that aliases labels, IO errors or out of space conditions, etc. I'm glad to hear the 2100 is back in action. Thanks for your feedback!
-
@rlinnemann Yeah me too for obvious wipe and repartition reasons. Yet it seems to have happened twice on two units, after the restore from GUI.
That (same) 23.01 stick was very probably created from a partner vault download but I don’t recall offhand if it was shortly after 23.01 released or later when I created the stick with it. Either way, time for a new stick.
-
@SteveITS just to clarify, did you restore from a 23.05 USB image or did you restore to 23.01 and upgrade?
-
@rlinnemann The most recent attempt was in that other thread but to recap in one spot with more detail, we had two scenarios. Both 2100s had the small EFI to begin with so needed a reinstall.
-
May 5. Client had a short power event. We found the UPS was defective, 10 seconds runtime. The router did not boot up afterwards. On the phone they said they unplugged it. I went there with a spare 2100 and a 23.01 USB install...that image was downloaded February 18 by the way, I looked now that I'm logged in. I didn't even try to troubleshoot it, I just reinstalled since I knew we had to. It booted up, I restored the config file in the web GUI, and then realized it didn't boot up after. Figured it was dead, and had about 15 minute left, so put in place the new 2100. I didn't really pay attention to the errors at the time.
-
June 6. A coworker went out to reinstall on a 2100 at a client's office to get it over with. He used the same USB I made since he didn't want to burn another. Installed, restored the config file in the web GUI, and it wouldn't boot, as above. Those were the logs and screen caps in the other thread. He reinstalled again, it failed the same way again, and he called me. Fortunately he could plug his laptop into the Comcast modem and downloaded 23.05 to burn on a spare USB. He installed that and it was fine after.
In hindsight we think 2) was the same issue as 1). This week the same tech used his USB stick to install 23.05 on the router from 1) above and it seems fine after a few reboots.
I still have the "bad" USB, it's on my desk at work. Not sure how the USB stick or its image could be a problem like this but it didn't get recycled yet.
When I reinstalled on my 2100 at home, a few months ago, I used 22.05 because I couldn't find an 8 GB stick at home. (edit: Etcher, btw, doesn't warn you about the space needed if you try to write the compressed image directly, it just fails x% of the way through)
Edit: I had found that other thread because there are very few search results for the "cannot open /boot/lua/loader.lua" error.
-
-