SG-5100 takes over 20 minutes to boot after eMMC failure
-
The eMMC in my SG-5100 failed a couple of weeks ago. I managed to get things going again by installing an SSD drive and I thought everything was working fine again until I had to shut down the 5100 yesterday. Now, the 5100 takes over 20 minutes (close to 30) to POST from power off or a warm reboot. The console displays an A2 code followed by a B4 some 5-10 minutes after that. Eventually, the boot sequence continues and pfSense starts up without further issue.
Immediately after reinstalling pfSense to the SSD, I didn't experience this startup delay, it's only started happening some time in the last few days; I wonder if the eMMC has further degraded such that it is providing some sort of false signal to the 5100's BIOS rather than the timeout errors I was initially seeing.
I've found at least a couple of threads here that discuss this problem but I haven't seen a resolution to it. How can I get the eMMC to be completely ignored by the BIOS? I see it as a boot option but I've disabled it everywhere I could find a reference to it in the BIOS… yet it still appears to be interfering with the boot process.
Help!
-
@hayescompatible What do you see on the console? What are your boot settings at in the BIOS? How did you install pfSense after installing the M.2? Have you tried resetting the BIOS to defaults and seeing if that helps?
I have seen a failed eMMC, in the worst case stage, can effect the ability of the system to boot a system (this is not limited to the just the 5100).
-
@rcoleman-netgate I'll post screenshots of the console as I get them. I have some time this afternoon to reboot the router without affecting anyone else in the household.
I obtained the latest pfSense Plus image from Netgate support, installed to the SSD from a USB key and then restored my most recent pfSense backup. Prior to the SSD installation, the 5100 was complaining of timeout errors on boot when accessing the boot partition on the eMMC.
Was hoping there was a quicker way of getting to the BIOS settings or at least resetting the settings to factory defaults via jumpers on the motherboard but I'm afraid I will have to wait the half hour while it reboots and then possibly another half hour if the BIOS reset doesn't work…
-
@hayescompatible said in SG-5100 takes over 20 minutes to boot after eMMC failure:
installed to the SSD from a USB key
Did you install as UEFI or MBR?
I suspect it's a case of how the BIOS is looking for the devices.
There aren't any jumpers or anything you need on the board. The BIOS or the boot install is likely the culprit. -
@rcoleman-netgate I installed as UEFI because otherwise, the SSD is not recognized as a boot device by the BIOS.
Here is the pfSense info as presented on the dashboard:
After I rebooted, I connected a keyboard and serial cable to the 5100, and jumped into the BIOS and loaded optimized defaults.
Here is the main BIOS screen as it looked in my console:
Here are the contents of the Boot screen:
Here's the Save & Exit screen mainly to show the available boot disks:
After saving the changes and resetting, the 5100 froze. I did not bother waiting 30 minutes for it to come up, but I did wait about 5 minutes before I pulled the power and turned it on again.
This time, the POST took about 30 seconds then the system came up. Progress?
However, upon rebooting pfSense, the 5100 just locks up solid. The lights on the igb0 and igb1 ports remain solid amber, no blinking. I left it like that for about an hour while I ran errands but the 5100 never came back up:
This is 100% repeatable. I even went to the trouble of reinstalling pfSense Plus from the ISO I got from Netgate support with the BIOS+UEFI scheme but I get the same result.
So to summarize:
- From cold boot (power off), the 5100 boots after about 30 seconds.
- From warm boot (restart from pfSense), the 5100 locks up solid.
-
Oh, and when I say it locks up solid, I mean after the 5100 actually restarts, like after I see these messages in the console:
Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-2' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-3' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-4' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-5' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-6' to stop... done All buffers synced. Uptime: 11m0s ukbd0: detached uhid0: detached
The console doesn't clear so I'm unclear as to whether it even POSTs.
-
@hayescompatible said in SG-5100 takes over 20 minutes to boot after eMMC failure:
However, upon rebooting pfSense, the 5100 just locks up solid. The lights on the igb0 and igb1 ports remain solid amber, no blinking. I left it like that for about an hour while I ran errands but the 5100 never came back up:
What's plugged into the USB-3(a) port?
-
@rcoleman-netgate said in SG-5100 takes over 20 minutes to boot after eMMC failure:
@hayescompatible said in SG-5100 takes over 20 minutes to boot after eMMC failure:
However, upon rebooting pfSense, the 5100 just locks up solid. The lights on the igb0 and igb1 ports remain solid amber, no blinking. I left it like that for about an hour while I ran errands but the 5100 never came back up:
What's plugged into the USB-3(a) port?
That was my keyboard.
-
@hayescompatible Try taking the CD/DVD out of the boot sequence and move the UEFI above the MBR options.
-
@rcoleman-netgate said in SG-5100 takes over 20 minutes to boot after eMMC failure:
@hayescompatible Try taking the CD/DVD out of the boot sequence and move the UEFI above the MBR options.
Tried that, same result (cold boot OK, warm boot locks up):
Save & Exit:
When I cold boot, the console clears and I see POST codes flash on the console, like A2, 99, B2 and maybe B4. On a warm boot, I see none of these (console doesn't even clear).
-
@hayescompatible said in SG-5100 takes over 20 minutes to boot after eMMC failure:
That was my keyboard.
Why do you have a keyboard connected directly? Is it always connected?
If so that differs from almost all other 5100 installs.
Steve
-
@stephenw10 said in SG-5100 takes over 20 minutes to boot after eMMC failure:
@hayescompatible said in SG-5100 takes over 20 minutes to boot after eMMC failure:
That was my keyboard.
Why do you have a keyboard connected directly? Is it always connected?
If so that differs from almost all other 5100 installs.
Steve
?
It was my own keyboard. I only had it connected because it’s easier than using my laptop keyboard to get into the BIOS settings on boot. In any event, this boot problem occurs whether or not a keyboard or anything else is connected to the console or USB ports.
-
-
@hayescompatible I am experiencing the same on my 5100 box. I also replaced my failed eMMC with an SSD. Did you find a solution to the problem?
-
@haraldinho said in SG-5100 takes over 20 minutes to boot after eMMC failure:
@hayescompatible I am experiencing the same on my 5100 box. I also replaced my failed eMMC with an SSD. Did you find a solution to the problem?
My "solution" is to halt the system, unplug it and then plug it back in whenever I need to reboot. This seems to be the only way to prevent the system from locking up. Cold boot is fine, warm boot locks up the 5100. I've played with a bunch of different BIOS settings but nothing seems to help, even with "optimized defaults".
The thread's gone quiet so I can only assume since the 5100 is no longer sold that I'm SOL on this. I just wish I had known about the limitations of the eMMC beforehand so I could have installed an SSD sooner to avoid this.
-
Mmm, unfortunately I'm not aware of any solution to that issue. It's possible it could be solved by a future BIOS update but I've not seen anything there either.
-
@hayescompatible are you able to provide a link to the m.2 SATA you purchased? I’m going to learn from your sad misfortune and install m.2s in my 5100s before I meet a similar fate. It really sucks that something like this can basically ruin perfectly good hardware.
-
@gabacho4 said in SG-5100 takes over 20 minutes to boot after eMMC failure:
@hayescompatible are you able to provide a link to the m.2 SATA you purchased? I’m going to learn from your sad misfortune and install m.2s in my 5100s before I meet a similar fate. It really sucks that something like this can basically ruin perfectly good hardware.
Here's what I got:
https://www.cendirect.com/main_en/tech-specs-TS128GMTS430S-JO357QQB.html
Link to the product page on Transcend's site:
https://ca.transcend-info.com/product/internal-ssd/mts430s
I got the 128 GB drive since it's the smallest size they offer.
Had I known how prone the eMMC is to failure (I don't log much to the pfSense box, it's just the default logs and pfBlocker updates that likely do most of the writing to the drive), I would have installed this from the start a couple of years ago when I bought the 5100. I only found out over the course of troubleshooting the boot slowdown post-eMMC failure that there is a utility that can provide the health status of the eMMC:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html#emmc
Alas, too late for me.
-
@hayescompatible appreciate the info on the SSD and the MMC monitoring. I’ll look into that tomorrow as I cannot afford for these to crap out, especially as one is a vpn endpoint 7000 miles away.
-
@hayescompatible just to be sure, the ssd that the 5100 takes is indeed a sata III ( with 3 connection points versus 2)?
-
Well for the very distant 5100, my results are:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x0b eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
For my local box, it gives me:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x0b eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
How worried should I be? Seems like the drive is normal but the A and B values are at or above 100%