Intel Atom C2xxx LPC failures
-
All the information available online leads me to believe that the issue is not limited to Cisco. I.e. Intel is very clear about AVR54 and the affected processors. It's safe to assume that the ADI/NetGate/pfSense boxes are affected, too (as well as some Synology NAS')
Unless there's some magic "firmware bullet" I'm convinced that most vendors will just ride this issue out - or at best manage the issue on a Case by Case basis. If Cisco could've pushed out a firmware fix they would've done it in a heartbeat.
I'd be more than happy to receive replacements boards from Netgate/pfSense (for my 2240, 2440 and 4860's) but my hopes are indeed very slim.
The pfSense store shows One year manufacturer's warranty and my boxes are already 1+ year old. That absolves pfSense of the responsibility to replace my SG Series Appliances - which by the way have not yet even failed.Do I sleep comfortable that my network sits on a time bomb ? No.
Do I sleep comfortable that my employer's network sits on a time bomb ? No.Perhaps ADI/NetGate/PfSense do not have the same level of clout with intel as Cisco does, but I'd surely hope ADI/NetGate/PfSense will work out some sort of arrangement with intel to reduce the impact on existing customers.
I would suggest that affected Rangely/Avoton customers should receive a heavy discount when buying from the pfSense Store again.
-
This issue has nothing to do with warranty since its a design flaw…replacement or upgrade program is needed.
Personally I have my core network based on C2000, pfSense and Synology units.
There is an open thread on Synology. I also wrote to Supermicro in order to get a solution before failure occurs. -
It's eerily quiet here …. :-\
-
iXsystems FreeNAS Mini at risk too.
-
I also wrote to Supermicro in order to get a solution before failure occurs.
Please let us know what they say… there are plenty here that have SuperMicro boards with the Atom C2xxx processors on them.
-
I also contaced Supermicro yesterday, seems at least the European support did not know about the issue yet. Sent them an explanation and some links. Now they are checking with the PM of the motherboard.
-
Hmm, the Intel doc says stepping B0 is affected. My Supermicro board says:
CPU: Intel(R) Atom(TM) CPU C2758 @ 2.40GHz (2400.07-MHz K8-class CPU) Origin="GenuineIntel" Id=0x406d8 Family=0x6 Model=0x4d Stepping=8
From what I have gathered about steppings, the version normally consist of a letter followed by a number. So what could "8" mean?
I'd love to think that Cisco got the whole B0 stepping, but then again my (and all the googled dmesg) results are missing the letter…Any experts on this?
-
CPU: Intel(R) Atom(TM) CPU C2558 @ 2.40GHz (2400.06-MHz K8-class CPU) Origin="GenuineIntel" Id=0x406d8 Family=0x6 Model=0x4d Stepping=8
Well, yeah, that is not very helpful at all.
-
I also contacted Supermicro (EU) and asked them if it only affects one specific stepping.
Apparently even they don't know if it only affects the B0 stepping, because Intel doesn't want to give out too many details.
The hardware update Supermicro has in place (or will have in place) is for all A1 motherboards though.
-
CPU: Intel(R) Atom(TM) CPU C2558 @ 2.40GHz (2400.06-MHz K8-class CPU) Origin="GenuineIntel" Id=0x406d8 Family=0x6 Model=0x4d Stepping=8
Well, yeah, that is not very helpful at all.
Have a look here:
https://www-ssl.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-c2000-family-spec-update.pdf
Page 15 Table 9
CPUID: 406D8
-
Oh, well. Thanks! :-X
Guess I will buy/build a new pfSense appliance and then RMA my board. Not cool at all.
-
Supermicro support told me to RMA my board to get a "reworked" one. They do not handle RMAs directly with the customer, though. I bought it from Amazon.de. So much for my reworked version…
-
We're still investigating internally, we'll put out an official response once we have enough information.
You can also follow some additional conversation on the topic here: https://www.reddit.com/r/PFSENSE/comments/5s8pwi/intel_c_series_processor_recalls_are_pf_official/
-
crap!!
CPU: Intel(R) Atom(TM) CPU C2558 @ 2.40GHz (2400.06-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x406d8 Family=0x6 Model=0x4d Stepping=8 -
FWIW, you probably don't need to go check your stepping. By Intel's data sheet, there has only been one stepping (B0) released to date for the Atom C2000 family.
-
And in case anyone missed it:
https://blog.pfsense.org/?p=2297
A very respectable response from Netgate.
-
@Jim:
Although most Netgate Security Gateway appliances will not experience this problem, we are committed to replacing or repairing products affected by this issue for a period of at least 3 years from date of sale, for the original purchaser.
That is a good post and I only have one related question now.
Does this mean that only devices that actually fail within the three years will be repaired/replaced or any devices with the susceptible CPU will be repaired/replaced within the three years regardless of whether they have actually suffered from the problem or not?
-
@Jim:
Although most Netgate Security Gateway appliances will not experience this problem, we are committed to replacing or repairing products affected by this issue for a period of at least 3 years from date of sale, for the original purchaser.
That is a good post and I only have one related question now.
Does this mean that only devices that actually fail within the three years will be repaired/replaced or any devices with the susceptible CPU will be repaired/replaced within the three years regardless of whether they have actually suffered from the problem or not?
For a lot of enterprise customers the replacement cost of a faulty device is insignificant. Spending 500$ on a pfSense SG Appliance or 5000$ on a Cisco Router isn't really the problem. The problem for them is unpredictability and the impact and risk a component failure may produce. That's despite redundancy and the knowledge that failures will always happen.
Consider the potential downtime (and asdociated loss of business), change control, travel cost, overtime etc…
Cisco have chosen to pro-actively replace affected components because tgey do not want to expose their customers to any additional risk. The life expectancy of enterprise kit is approximately 3-5 years because by then the technology will be technically superseded. I've cetainly seen kit run 10+ years.I'm a pfSense customer (for my employer and home) and my purchase decissions were made because of:
- quality intel Nic's in a purpose built product
- a large commnity develpping/supporting the software/pfsense project
- low power consumption
- a long life expectancy, certainly greater than 3 years
- no moving parts and fans
I am also keen to understand whether pfsense/netgate will do a pro-active replacement (like Cisco) or whether this will be a "fix-on-fail" program ?
"Fix-on-fail" means that pfSense is asking its customers to wear the risk mentioned above.
So back to liontaur's question?
- will my 4 appliances be replaced within 3 years irrespective of fault ?
- will my appliances only be replaced if they fail?
- what happens if my appliances fail after 3years+1days?
My expectation as a consumer is that my appliances will last well beyond 3 years of operation.
-
A very respectable response from Netgate.
Opinions differ. :) 3 years is a pretty short clock for fundamental design flaw.
^^^ When I wrote this I didn't realize the netgate warranty was only 1 year; I was thinking of supermicro's 3 year warranty and read it as a brush off. My bad.
-
My expectation as a consumer is that my appliances will last well beyond 3 years of operation.
First, you'll need to appreciate that, while I know the modeled failure rates of the component in-question, I can't release same.
Second, your appliance will, in all likelihood, last longer than three years. The majority of at-risk Netgate products will not experience this failure over their entire service lifetime.
Third, Cisco's offer isn't as "pro-active" as you suggest. A careful read of Cisco's Ts & Cs should reveal the truth.
Fourth, we feel we have a strong replacement policy, as it is not limited to the original warranty period or to systems covered by an existing support agreement, as others have announced. Considering the likelihood of the failure occurring, we feel our limited extended warranty is the best course of action, because it results in less overall inconvenience, downtime, and demands on our customers and partners.