Netgate 4200 - boot problems
-
Saturday morning and I wake up to my network down. It appears that sometime overnight, my SG-4200 had an issue and it won't boot up. It is on a UPS so I don't suspect a power issue. From the console, it appears that it can't find a boot partition and it sits on Start pxe over IPv4.... Eventually, it times out and gets to a SHELL> prompt. Any tips to recover from here?
-
If you enter the bios setup do you see the boot device listed?
Was it booting from eMMC?
-
@stephenw10 My assumption was yes. It was booting from whatever was configured out-of-the-box. I didn't modify any configurations when I purchased it. back in March '24.
After reading this post, it sounds like a drive failure and I might need an NVME drive but I'm looking for options before I do. I've got a memstick ready to go too.
-
Try booting from the installer on a memstick and see if it shows the eMMC as an available target. If it's not there then you almost certainly have a failed drive unfortunately.
-
I see the first boot is da0p1 which should be the eMMC. I moved my USB up to 2 and 3.
On powerup, it skipped right over the eMMC and booted directly into my USB memstick. I tried to run the installer, but it said it could not locate a storage device.
Any other ideas before installing an NVMe drive?
-
If you've done a full power cycle there isn't much else that can be done. If the installer doesn't see it then it almost certainly isn't responding.
-
For what it is worth, a little bit more information for thread. I disconnected the cables and noticed how hot the system was. I let it cool down for a while and plugged it back in to see if anything changed after a bit of a cool down. It tried to boot up to the eMMC but it generated many errors. I tried to reinstall the software and the same errors.
Not sure if this is something a recovery tool of some sort can fix.
I've got an NVME now and will let you know when I'm back online.
-
Back online with an NVMe. However, the whole wiping of existing disks (and references to it) was a total failure. No matter which commands I tried to wipe the disk, none worked because it couldn't read/write to the eMMC (da1) drive to "delete" it. The best I could do was not allowing it to be part of the boot process.
Even after installing the new NVMe, I'm still getting errors from da1 on boot up. I don't know if during the boot it does some kind of sanity check on all the drives first but once it is done, it boots on to the new NVMe.
So, for now, I'm back online. I'm going to spend some time tomorrow exploring why I'm still getting drive issues.
Netgate got me on this one. 14 months in and I had to dish out some extra $$ because I'm just outside their warranty window. I've had good luck in the past with an sg-1100, 3100, and 5100. The 4200 was the POS. Here's hoping I get a few more years out of it.
-
The external config loader tries to read da devices at boot to check for config files in root. That's probably what you're seeing there.
So it just throws errors when you try to gpart destroy or zpool labelclear it?
-
@pfsense16v said in Netgate 4200 - boot problems:
Netgate got me on this one. 14 months in and I had to dish out some extra $$ because I'm just outside their warranty window. I've had good luck in the past with an sg-1100, 3100, and 5100. The 4200 was the POS. Here's hoping I get a few more years out of it.
Maybe you've seen this massive thread on the general topic of zfs wearing out SSDs, at least, with the default configuration; if not, have a look. I'm linking to the bottom line, but there's wealth of information above it in the thread:
https://forum.netgate.com/post/1214498
The bottom line shows basic config changes that should increase the service life of the SSD.
-
@stephenw10 Yes, sir. Of course, the commands to look up information worked, but any command I used to write over or delete would throw errors.
Side note. As I'm not too savvy with all the Linux commands, I used AI as my assistant to help troubleshoot. I gave it my scenario and even gave it my outputs, and it generated the custom commands, which I needed to troubleshoot my system. I found it funny that it too was perplexed that nothing was working. It also kept commenting on how these results indicate serious problems with the drive. Eventually, it recommended that we give up trying to delete the eMMC and move on with the installation of the NVMe.
I use AI daily, which has been invaluable in this scenario. It feels like I'm chatting with tech support. I highly recommend it if you need help, especially late at night.
-
@Mission-Ghost Thank you. Reading it now.
-
@pfsense16v said in Netgate 4200 - boot problems:
As I'm not too savvy with all the Linux commands, I used AI as my assistant to help troubleshoot. I gave it my scenario and even gave it my outputs, and it generated the custom commands, which I needed to troubleshoot my system. I found it funny that it too was perplexed that nothing was working.
You might want to start by making it clear to the AI assistant that you are asking about a FreeBSD system rather than a Linux system…
-
I did.