Netgate 3100 failed to come back up after upgrade, PHP dumping core after fsck, help!
-
I hit upgrade in the web UI tonight and my Netgate 3100 failed to come back up. After finding a laptop and the right cable, I got to the console to see a ufs_dirbad error. I ran fsck five times and I got past it. This was frustrating enough, but now PHP is dumping core, filling up /var/run and boot stalls from there. I tried clearing /var/run and running fsck again from the shell, but it didn't fix it. I've included a photo of the errors. Does anyone have any suggestions on how to fix this?
-
I wound up asking support for a new pfSense Plus image and reflashing the eMMC. Works fine now but I will be pretty worried about doing upgrades from now on.
-
And what will be if you install it on a M.2 ssd?
You could then boot from there and it is not so hard
using the small eMMC storage inside. -
@send9 The new install will have ZFS and this Boot Environments which should make recovering easier.
-
@steveits Unfortunately SG-3100 and any other 32-bit ARM Netgate system does not currently have ZFS (and since marked EOS, I'm certain it won't be getting it). ZFS is overall far less tested on 32 bit systems, and while it's possible to run it on 32 bit ARM, the bulk of code out there makes a lot of 64 bit assumptions. It can definitely work, but I doubt Netgate will be putting in the rigor to validate it can work safely on their hardware, and instead complete its lifecycle on UFS.
-
@rune-san Sorry send9 I keep forgetting that. We have a bunch of 3100s at clients and our office and I remember ZFS, and then remember it needs a reinstall and 64 bit.
-
@steveits No worries, I ran fstyp on mine and noted it's still UFS. Ah well, I have the recovery image sitting in the rack now and I turned on the config backup option as well.
-
-
@gertjan said in Netgate 3100 failed to come back up after upgrade, PHP dumping core after fsck, help!:
@send9 :
You saw :
Focus on the last 5 words ;)
The upgrade system does check for available space before it runs, but if it's close it can still fail during the upgrade since the only space calculation it can do is the total space difference when the upgrade completes. Not what a particular portion might end up using in the middle of unpacking itself when upgrading.
-
@gertjan I'm not sure if it's really because my drive was full from logs or whatever (could be), if that's what you mean. It looked to me like /var/run was being filled on boot by PHP core dumping. I could be wrong though.
-
@send9 said in Netgate 3100 failed to come back up after upgrade, PHP dumping core after fsck, help!:
It looked to me like /var/run
/var/run should contain small (1 k? depending the cluster size) PID files.
You've re installed, that always a fast and smart way to be sure everything is ok. Re installing also give you access to the new ZFS file system, that's alaso a plus.
For the next time upgrade , package install, or just for fun : always check space left on all partitions.
The newer 22.05 offers me a nice graphical view :
but I don't trust it ;) I dive into SSH, (or console) and play with the 'df' command.
And keep in mind : there are several packages that can create huge (log) files. Suricata is one of them. If you use one of these packages, do not beleive what they tell/promise ypou ;) : baby suit these packages 24/24h.
Or use something like this, that will alert you by mail when something goes wrong, like disk partition full at xx %
-
@gertjan Thanks, that's good advice; I tend to prefer command line as well so I will definitely take a peek at df next time. Unfortunately, no ZFS for me, for reasons discussed above (no support on ARM32 chips).