SG-1100 firmware corruption - repeatedly...
-
We have deployed approximately 30 x SG-1100's and a couple SG-1000 throughout South Africa.
The country experiences regular loadshedding (controlled blackouts). Sometimes when power is restored the SG-1100 "cycle-boots" (keeps rebooting) due file corruption.
Reloading the device with "pfSense-netgate-SG-1100-recovery-2.4.4-RELEASE-p3-aarch64.img.bz2" reinstates the device and connectivity is restored. This is however time consuming and results in downtime for the site.
The SG-1000's have never failed - even when used to replace a "corrupted" SG-1100.
No alternative to the "firmware reload" has been offered by the local distributor.
The SG-1100 appears to be fragile when compared to the SG-1000. -
Use a (small) UPS with an USB port that enables communication between the SG and the UPS.
Use the NUT package to execute clean shut down on power loss.
File system problems => gone. -
If you cannot guaranty the power in an installation, for whatever reason, you should enable ramdisks.
Doing that reduces writes to the filesystem on the drive significantly meaning the chances of causing damage by having the power fail during a write are massively reduced.
https://docs.netgate.com/pfsense/en/latest/book/config/advanced-misc.html#ram-disk-settings
Steve
-
@stephenw10 Why aren't we experiencing the same fault with the SG-1000?
-
@BrianSA
might use a different disk or disk controller that handles power outages better somehow.the freebsd filesystem is crap & handles recovery very poorly
-
Hard to say if the config is identical. I would expect both to recover equally.
One thing you may be able to do if it gets stuck in a boot loop because of a filesystem issue is to run a filesystem from the single user mode at the console:
- At the SG-1000/SG-1100 console press any key to interrupt the boot loader when you see:
'Hit [Enter] to boot immediately, or any other key for command prompt.' Note that this is the second prompt for a keystroke you will see when booting the device. Please let the first one ("Hit any key to stop autoboot") pass without taking any action.
At the 'loader>' prompt enter:
boot -s
That will boot in single user mode to a question asking for a path to the shell, just press return to reach the # prompt.
- At the # prompt run the following command:
fsck -y /
Run the fsck command at least 3 times; Repeat the command until no errors are reported, even if fsck claims the filesystem has been marked "clean".
- Reboot by running:
reboot
That will recover from almost all loops of that sort without having to re-install and is something you may be able to get remote access for if you have someone with a laptop and a cell phone on site.
Steve
- At the SG-1000/SG-1100 console press any key to interrupt the boot loader when you see:
-
I wouldn't necessarily recommend doing this, but you could add (or create) an entry in
/boot/loader.conf.local
which setspfsense.fsck.force=5
or so. Then on every boot it would perform that many iterations of fsck to check/repair potential problems, even when the filesystem is marked clean.It would drastically slow down the boot process and is typically unnecessary, but it might at least help with some of these situations. It's definitely not something we'd ever ship with set by default.