Crucial M4 SSD sleep bug - AHCI timeout after upgrade from 2.2.5 to 2.3.1
I configured myself a pfSense box back in november to get some hands on experience and replace my home router. I chose a Shuttle DS81 and used some parts I had lying around (i3 cpu, Crucial M4 SSD, 4Gb Kingston DDR3).
I think I installed 2.2.5 back then. Things had worked perfect ever since. Last saturday I decided to upgrade to 2.3.1 and that process went without errors.
The following sunday my network was completely down. After a hard reboot things would work for a little while (usually less than 30 minutes). I swapped my old router back in and did some testing with the pfSense box.
I decided set it up next to me with a monitor and hopefully see what happens. SMART data is fine and memtest reports no errors (8hrs+). After a while I saw ahcich1 timeouts (no BIOS settings were ever changed). I tried the following and tested after each step:
- clean install 2.3.1
- update BIOS
- clean install 2.3.1
- fiddle around in BIOS
- reset BIOS and did a clean install of 2.2.5
The problem remained.
As a last resort I checked for possible firmware updates for the SSD. I found Crucial did release a critical update to correct an issue that an SSD would not wake from sleep.
Resolved a power-up timing issue that could result in a drive hang, resulting in an inability to communicate with the host computer. The hang condition would typically occur during power-up or resume from Sleep or Hibernate. Most often, a new power cycle will clear the condition and allow normal operations to continue. The failure mode has only been observed in factory test. The failure mode is believed to have been contained to the factory. This fix is being implemented for all new builds, for all form factors, as a precautionary measure. The fix may be implemented in the field, as desired, to prevent occurrence of this boot-time failure. To date, no known field returns have been shown to be related to this issue. A failure of this type would typically be recoverable by a system reset.
Right now my box has been up for 5 hours, so it's looking good. One thing I have to ask though: how did a software update trigger the SSD firmware bug? It ran fine for months. Was this one of those rare coincidences? Or were changes made to sleep behavior past version 2.2.5 (I went through the blog posts but couldn't find anything).
pfSense 2.2.* is based on FreeBSD 10.1
pfSense 2.3.* is based on FreeBSD 10.3
So anything that happened for FreeBSD 10.2 and then 10.3 in regard to drivers etc would have the possibility to trigger different behaviour.