SG-4860 Crashing with umass0 disconnecting
-
Thanks for your response!
I was leaning toward a spotty drive, but I've never seen a storage device failure manifest itself in this way. Complete disconnect/reconnect seems odd, usually I'd see read/access errors or something, but these are the only errors in any log that seems out of the ordinary until services start failing completely. I have some concern that it's a device controller failure.
Is there any way to validate the storage issue prior to disassembling the box? I'm not so familiar with flash/SSD diagnostics.
The unit has been in a rack with tons of ventilation and max environmental temps around 22C. It's lived a pretty easy life, mechanically-speaking. I do have the occasional power fluctuation, but these disconnection events occur seemingly at random.
-
That device is the soldered-on eMMC storage device in the 4860 so there really isn't much to do in the way of diagnostics for it. Somehow the controller is losing contact with the storage. The fact that the device appears to disconnect despite being permanently connected is concerning because it likely means the device itself is failing in some way.
-
I don't have the device to disassemble at the moment but it looks like there are some SATA connections on the board based on photos online. Can I disable the internal eMMC device and use a SATA external drive instead?
-
There is an mSATA connector inside. You can install an mSATA drive and it will boot from there. You don't have to disable the eMMC.
-
if it's this one
i don't see the emmc soldered-on
are you sure?
maybe you only need to clean the contacts -
That is an mSATA disk.
-
If that SG-4860 is still in warranty you should open a ticket with us: https://go.netgate.com
Steve
-
Thanks, but unfortunately I bought it in 2017.
I had a laptop platter drive and an mSATA cable lying around, so I installed it today to see if that remedies the issue. It's back up and running, so I'll monitor it. If it solves the problem, I'll probably install an SSD.
Thanks for your response and support. I've had great experience with everyone from Netgate.
If it's motherboard related, do you sell replacements?
-
If the eMMC has failed it would require a replacement board. If you open a ticket we can quote you for that.
I recommend going the mSATA route though, it will be a lot less expensive and we have seen that work reliably in similar cases. A bad eMMC is not an indication anything else on the board will fail.
Steve
-
@kiokoman Sorry, I should have mentioned it's actually the SG-4860-1U; forgot there was a pretty significant hardware difference between the two. I don't have the device shown everything is soldered onto the motherboard.
I do have three mSATA connectors in front of the CPU, however, and was able to add a new disk, install pfSense, and give it a test.
I expect at some point to see the same error messages, as the eMMC is still connected, but isn't being used for anything. Hopefully, it will keep working as normal afterward though. pfSense shows the correct size for the root disk so I know it's not using the eMMC as the system device.
-
You have three SATA connectors on the board, for using regular SATA drives. The 1U also has SATA power connectors on the PSU.
There in only one mSATA socket and two mPCIe sockets. Just FYI if you use that.
https://docs.netgate.com/pfsense/en/latest/solutions/sg-4860-1u/msata-installation.htmlSteve
-
mSATA. They are like $20 on Amazon.
https://www.amazon.com/TCSUNBOW-MSATA-60GB-Solid-Machine/dp/B077YWJVXB/
By far your cheapest option and it will be "snappier" than it was on eMMC.
-
@stephenw10 yep, turns out I don't know what I'm talking about. :) I definitely used one of the SATA connectors with the laptop platter drive and the power supply connector. It's been up for about 24 hours now and working normally so far. No kernel messages after bootup complete.
If it seems to be okay for a couple weeks I'll probably order the mSATA device as I'm sure it will be faster than an old 320GB laptop hard drive.
Thanks again for everyone's help!
-
Turns out the kernel logged that same sequence of messages last night, but as expected, the device continued to operate normally as none of the services are relying on the eMMC. Seems like the motherboard is operating normally. Another week or so of solid operation, and I'll look for an mSATA.
Thanks again!
-
when you have the time try to clean it with some isopropyl alcohol and a toothbrush, it does a fair job of getting rid of both water-based (oxide) and oil-based contaminants that can cause intermittent connection. if it's not enought a reballing/reflaw would be necessary but for that you need a tecnical expert able to do it.
... or just ignore it and mount an msata -
@kiokoman As mentioned above, on the SG-4860-1U that I have, there is no contact surface to clean - the eMMC is directly soldered to the PCB. Aside from trying to resolder it, there isn't much to do, and at that point it's too big a risk vs. the mSATA and ignoring kernel messages. I could probably change the config to avoid mounting da0 in the first place if it gets that frequent/irritating.