C2758 "Clock Signal Component Issue"



  • Dear All,

    I am using multiple Supermicro C2758 boards as pfsense routers. With some of them, I am experiencing reboot issues with the devices hanging in legacy boot with the enclosed output ending "Event time "LAPIC" quality 600" "ACPI APIC Table: <intel  tiano ="">". Another reboot usually goes through without issues. However, if I did not have carp redundancy, I would have had more substantial problems with this.

    Does someone know if this is an indication of the "clock signal component issue".

    Regards,

    Michael

    </intel >





  • yes saw that on my c2558 already replaced with a j1900 filed rma and waiting on supermicro


  • Moderator

    If that would be a failure for the clock signal issue, then the system won't boot. That was stated in both, various articles as well as Ciscos Statement about the issue as well. If that clock timer degrades and fails, then the system won't boot AT ALL. So I wouldn't jump to conculsions about a rebooting issue being the "big scary beast" everyone is so panicky about.

    Also RMA isn't really a great idea, as supermicro itself doesn't have a mentioned "platform fix" ready. So you'd just get another board with the same potential fault.



  • @JeGr:

    Also RMA isn't really a great idea, as supermicro itself doesn't have a mentioned "platform fix" ready. So you'd just get another board with the same potential fault.

    Perhaps not… I thought read somewhere (the other C2xxx thread?) that pfsense was sent a batch of advance replacement motherboards to help them deal with this issue.  I don't know if those boards have a new C2xxx chip stepping, or the Intel mentioned board work-around (but it'd be kind of useless to go through the expense of an advance replacement with the same known defect.)  It does imply, though, that they at least have the beginnings of a work-around (even if that work-around isn't quite in their RMA department.)

    As well, if someone RMA's a board with a known issue, and supermicro knowingly sends back something with the same defect, they'd be violating the terms of their own warranty (which obligates them to repair the defect, replace with something that doesn't have the defect, or reimburse for the current value of the warrantied product.)  It'd also be really poor business practices which would hurt their reputation significantly.


  • Moderator

    My contact to supermicro - although located in germany - told me last Friday there aren't ANY fixed boards/SOCs available neither from Intel nor Supermicro as of yet, so RMA would be superflouss ATM.

    JWT's comment was:

    Supermicro is supplying us with advanced stock so we can turn an RMA for this issue around in a day, rather than the timeline experienced by your friend.

    That it already contains a fixed SOC is nowhere to read! AFAIK there is no fix now from whatever vendor, so all they can offer now is to replace a board gone bad. Also many vendors wrote (and a few I spoke with told me) they don't even expect that thing to fail on their product anywhere in that timeframe given. One of them I spoke with has appliances and security devices with various C2000 chips that are around for 3+ years and from the very few failures none (nil!) actually had to do with that clock signal component. As of now, to me it seems like an issue that was way blown out of proportion through Cisco and the first article about it. But that's just my feeling.



  • @JeGr:

    That it already contains a fixed SOC is nowhere to read!

    You are correct and I did make the assumption that the replacements would contain either a revised chip or the board work-around.

    However, I still wouldn't discourage people from RMA'ing boards that are currently under warranty iff they have alternate h/w to run in place of the board while it's being "repaired."  First, it obligates supermicro to repair, replace or reimburse the board if the RMA reason lists the intel defect.

    Second, supermicro only offers a 1 year warranty on the parts on these boards.  So, if you wait until the chip fails, you will likely be outside the warranty period and supermicro likely will refuse to cover the repair cost.  At least in the US, there'd be no recourse for a customer.

    On the other hand, if you RMA while still in the warranty period, supermicro sends back a board that does NOT resolve the known defect, and then it fails (from this specific defect) after the warranty period, you can rightfully claim that supermicro failed to meet their warranty obligation and therefore must cover the repair cost even after the 1 year.  (WARNING:  This statement is my opinion and I am NOT a lawyer.  You're an idiot if you take my opinion alone as sound legal advice.)

    Even if there would be no valid legal claim in the above statement, there is most certainly a ethical claim, and I doubt Supermicro would want to pay the cost to their reputation if they started sending back "repaired" boards with a known defect.

    Keep in mind that supermicro lists THREE possible remedies to warranty claims:  repair, replacement, or reimbursement.  They might have to fall back to that last remedy if they are unable to repair/replace.

    Again, I'm NOT a lawyer.  This is all just my opinion.



  • @JeGr:

    My contact to supermicro - although located in germany - told me last Friday there aren't ANY fixed boards/SOCs available neither from Intel nor Supermicro as of yet, so RMA would be superflouss ATM.

    That's odd. I got this from supermicro.nl support a week ago:

    We have apparently already created a fix for this issue. Please request RMA for the motherboards, they will be reworked by RMA which fixes this issue.

    But support never did mention how long exactly this rework thing would take, so…



  • If RMAing for a repaired unit rather than a NEW replacement.  Be sure to record your serial numbers (chassis, board, etc.).  Maybe even mark it so that you can visually identify it as well.  The last thing you want to end up with is someone else's repaired unit.  Who knows what abuse they subjected it to.



  • Cisco already have a fix in place since beginning of Dec 2016. They knew well before everybody else.

    http://www.cisco.com/c/en/us/support/web/clock-signal.html#~faqs

    If Cisco could rework the systems as early as of 3 Dec 2016 they would have known about the issue well before, considering it takes time to adjust production for the required fix for all affected platforms.

    Cisco have the expertise and resources to do their own independent research into C2000 failure rates and do not have to rely on intel to feed them BS. Cisco would also have plenty of failed units to backup their research and their claims to make a Case against intel.

    It is very likely that Cisco discovered the flaw first and confronted intel with their findings. Cisco don't just peddle C2000 CPUs, their UCS systems also use Intel - so Cisco have enough weight to throw around.

    Eventually intel had to admit flaw - and since board members high up need to have their arses covered they had to announce the issue in the recent earnings call. Then they worded AVR54 so innocuously, hoping it would slip under the radar.



  • @NOYB:

    If RMAing for a repaired unit rather than a NEW replacement.  Be sure to record your serial numbers (chassis, board, etc.).  Maybe even mark it so that you can visually identify it as well.  The last thing you want to end up with is someone else's repaired unit.  Who knows what abuse they subjected it to.

    In the case of supermicro, you very well might get someone else's repaired unit…  I'd even think it would be more common to get a "refurb" than a repair if the repairs take more than a few hours to perform.



  • @garyd9:

    In the case of supermicro, you very well might get someone else's repaired unit…  I'd even think it would be more common to get a "refurb" than a repair if the repairs take more than a few hours to perform.

    Posted this in another thread but a friend RMA'd a Supermicro C2758 last year for what we both suspect was the aforementioned issue.  In addition to the RMA taking 3 months, he got back the same board, as verified by serial number.

    I expect that now that the cat is out of the bag, RMAs will be quicker, but who knows how it will ultimately shake out.



  • @whosmatt:

    In addition to the RMA taking 3 months, he got back the same board, as verified by serial number.

    THREE MONTHS?

    From SM's warranty (available here: https://www.supermicro.com/support/Warranty/):

    If returned products are: a) within the warranty period, b) accompanied by the proper Return Materials Authorization ("RMA") and c) defective as determined by Supermicro; Supermicro will, at its option: 1) repair the defective product within 10 working days, 2) replace the defective product with a refurbished product or 3) issue a credit to the Customer for the current value of the product. For purposes of this Limited Warranty, "refurbished" means a product or part that has been returned to its original specifications.

    (note the "10 working days")  In addition, in regards to replacement, there is this:

    All product replacements are subject to quantity available in Supermicro stock. If Supermicro does not have stock to replace the returned products, Supermicro shall have at least 30 days to manufacture and replace the returned products.

    It sounds like your friend is considerably more patient than I am.  After the 10 working days, I'd have been on the phone with SM quoting their own warranty text to them.  If they hadn't repaired within 10 days, they should have sent a refurb.. or manufactured a brand new one for him/her.



  • @garyd9:

    @whosmatt:

    In addition to the RMA taking 3 months, he got back the same board, as verified by serial number.

    THREE MONTHS?

    From SM's warranty (available here: https://www.supermicro.com/support/Warranty/):

    If returned products are: a) within the warranty period, b) accompanied by the proper Return Materials Authorization ("RMA") and c) defective as determined by Supermicro; Supermicro will, at its option: 1) repair the defective product within 10 working days, 2) replace the defective product with a refurbished product or 3) issue a credit to the Customer for the current value of the product. For purposes of this Limited Warranty, "refurbished" means a product or part that has been returned to its original specifications.

    (note the "10 working days")  In addition, in regards to replacement, there is this:

    All product replacements are subject to quantity available in Supermicro stock. If Supermicro does not have stock to replace the returned products, Supermicro shall have at least 30 days to manufacture and replace the returned products.

    It sounds like your friend is considerably more patient than I am.  After the 10 working days, I'd have been on the phone with SM quoting their own warranty text to them.  If they hadn't repaired within 10 days, they should have sent a refurb.. or manufactured a brand new one for him/her.

    Yeah, I don't really know.  I asked him about it today after reading this post.  He was in contact with SM the entire time. The board went to Taiwan (from California) for repair.  Apparently he is indeed very patient.  It was a A1SRi-2758F IIRC.