Netgate 4200 - boot problems
-
Hello,
I decided to write this post , because the boot problem I had, happened already for the second time.
Yesterday, in the evening I shut down the pfsense ("Halt System" from the menu) , I have wait until is done and then I switched the power off from the box.This morning, I switched power on , netgate starts booting , but didn't boot successfully. After few minutes I checked the status led on the front panel and the circle led was solid orange and nothing happened. I have wait a few more minutes, but nothing changed.
So I used ACPI power button on the back panel (Graceful shutdown, hard power off (Hold 10s), power on) ... nothing. So I switched the power off and then on ... and then netgate boot successfully.
After successful boot, I have checked the logs and I found some errors (please find attached screens).I wouldn't write this post, if that would be the the first time ... but this is the second time with the same scenario.
Am I doing something wrong ?
Previously I had netgate 2100 and I never had problems with switching the power off/on after graceful shutdown.best regards
Tom -
Hmm, none of those errors would prevent booting. They are common and usually harmless. The WiFi drivers are not used and Unbound starts anyway.
Are you booting from eMMC or SSD?
If it happens again try to connect the serial console and see what's happening there.
Steve
-
@stephenw10
from SSD , but with Netgate 2100 (where I didn't have such a problems) I had also SSD built in. -
The 2100 is very different hardware. But obviously both device should boot normally. If there is nothing logged then only checking the console might show what's happening. Can you replicate it?
-
I appreciate this reply adds nothing in terms of troubleshooting content, but I too have had boot issues on my Netgate 4200 after using 'Halt System', with the same symptoms as you (status LED stuck at solid orange).
I did capture the logs via serial but discarded them after I switched to safe boot mode (from the boot options menu I believe), booted successfully, then rebooted in normal mode, as I thought it was a one-off issue. I did try Single User mode first but that too failed to boot.
Now I know it's happened to others, I'll capture the serial output if/when it happens again, and I'll keep an eye on this thread.
If it helps with any troubleshooting, only ~20 lines of output were observed after the boot menu before it stalled.
EDIT: To add, I'm using the 4200 MAX model, so I am booting from SSD rather than eMMC.
-
Hmm, odd. So you were still able to access the bios setup? It was trying to boot it sounds like but the LED never changed to blue?
-
@stephenw10 Yeah, that's right - it got to the boot menu, then the boot process started to scroll on the screen but stalled ~20 lines in, and it was always the same line it stalled on. I tried a full power drain too, but it still stalled at the same point.
I should have really saved the output from the session but as I say, I thought it was just a one-time oddity - sorry I don't have more info to provide, but if it ever happens again, I will make sure to save the output and post on the forums.
-
@stephenw10 Sorry for replying after almost one month (private reasons).
So it happened again about one week ago. So I connected to the console via USB, but there was no output there. The device just freeze with the solid orange circle led and nothing happened.
best regards
Tom -
Mmm, I wouldn't expect any output if it's in standby which is indicated by the orange LED.
Did it boot back normally after power cycling?
-
@stephenw10
After second power off and on yes. -
Can you replicate the failure with the console already connected? That may show why it's going to standby.
-
Saturday morning and I wake up to my network down. It appears that sometime overnight, my SG-4200 had an issue and it won't boot up. It is on a UPS so I don't suspect a power issue. From the console, it appears that it can't find a boot partition and it sits on Start pxe over IPv4.... Eventually, it times out and gets to a SHELL> prompt. Any tips to recover from here?
-
If you enter the bios setup do you see the boot device listed?
Was it booting from eMMC?
-
@stephenw10 My assumption was yes. It was booting from whatever was configured out-of-the-box. I didn't modify any configurations when I purchased it. back in March '24.
After reading this post, it sounds like a drive failure and I might need an NVME drive but I'm looking for options before I do. I've got a memstick ready to go too.
-
Try booting from the installer on a memstick and see if it shows the eMMC as an available target. If it's not there then you almost certainly have a failed drive unfortunately.
-
-
If you've done a full power cycle there isn't much else that can be done. If the installer doesn't see it then it almost certainly isn't responding.
-
For what it is worth, a little bit more information for thread. I disconnected the cables and noticed how hot the system was. I let it cool down for a while and plugged it back in to see if anything changed after a bit of a cool down. It tried to boot up to the eMMC but it generated many errors. I tried to reinstall the software and the same errors.
Not sure if this is something a recovery tool of some sort can fix.
I've got an NVME now and will let you know when I'm back online.
-
Back online with an NVMe. However, the whole wiping of existing disks (and references to it) was a total failure. No matter which commands I tried to wipe the disk, none worked because it couldn't read/write to the eMMC (da1) drive to "delete" it. The best I could do was not allowing it to be part of the boot process.
Even after installing the new NVMe, I'm still getting errors from da1 on boot up. I don't know if during the boot it does some kind of sanity check on all the drives first but once it is done, it boots on to the new NVMe.
So, for now, I'm back online. I'm going to spend some time tomorrow exploring why I'm still getting drive issues.
Netgate got me on this one. 14 months in and I had to dish out some extra $$ because I'm just outside their warranty window. I've had good luck in the past with an sg-1100, 3100, and 5100. The 4200 was the POS. Here's hoping I get a few more years out of it.
-
The external config loader tries to read da devices at boot to check for config files in root. That's probably what you're seeing there.
So it just throws errors when you try to gpart destroy or zpool labelclear it?