Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Intel Atom C2xxx LPC failures

    Scheduled Pinned Locked Moved Hardware
    168 Posts 39 Posters 57.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Offline
      MordyT
      last edited by

      @jwt:

      @MordyT:

      @MordyT:

      So when purchasing, they did the math and said "Chance of it breaking vs cost of HA - we will risk it". Now that the chance is significantly higher due to this specific Intel issue…

      And chance are we put them both in and turned them on same day. Which means that they would probably fail at the same time (or very close to it).

      You're wrong here, and I'm bound to not explain why or how.

      Fair enough.
      If you can respond to this at least….
      I'm wrong on which part - the chance of it failing being significantly higher or the chance of 2 systems put into service same day will fail near each other?

      I understand that you may be restricted from revealing information - don't hold it against you or netgate. Part of the general frustration some of us have is that no one will reveal any information. In the past, when information like this is withheld, it usually turns out it as bad if not worse then people are guessing. And all that does is further feed the general negative feelings.

      1 Reply Last reply Reply Quote 0
      • V Offline
        VAMike
        last edited by

        @MordyT:

        They did a cost vs risk analysis. The information they used in that is now wrong - the likelihood of failure is higher now.

        You're just making stuff up. (Or, if there was a risk analysis, someone made up numbers to go into it.) There is no baseline failure rate, and the delta is unknown. So the math is something like "unknown * unknown = even more unknown". You're not basing this on any kind of real analysis, you're reacting to a scare story.

        Some of us don't believe that to be true (the equipment will be fine). As such, I have proposed another 2 options, namely a 5 year or better warranty for hardware that does have a clock signal failure (since it shouldn't happen according to netgate, this should be a no brainer), or a proactive replacement.

        I'd be interested to hear your reasoning as to why you or anyone else (really I want to hear from Netgate) are opposed to this.

        Because then the company is saddled with an ongoing responsibility to deal with incoming claims, whether valid or not. E.g., if someone static zaps their board 4 years from now, netgate is going to have to deal with the claim that the c2xxx bug was the problem. They're going to have to either maintain spares and just hand them to anyone who asks for one, or keep people around who remember how to deal with a long-obsolete board, or they'll have to just give people free new computers whenever they ask for one. For something that's unlikely (with an unknown magnitude) that's excessive for a small company to commit to.

        Repeat after me: cisco is only giving out free computers to people who are paying something around the parts value of a netgate firewall every year for maintenance. If I offered to replace your netgear routers proactively if you would agree to enter a 5 year $150/yr service contract, which would also cover future failures, would you take me up on that deal? Heck, if enough people say yes I'd actually consider it–there's a decent profit to be made.

        1 Reply Last reply Reply Quote 0
        • G Offline
          gcu_greyarea
          last edited by

          Let's assume for a moment that I have redundant power, UPS'es, hot spares,  cold spares, followed all the best practices and have mature process in place….

          All of the above considered - Netgate still sold me a unit with a component that is likely to fail prematurely.

          Even worse, Netgate is continuing to sell these units.

          It is not of Netgate's concern what I do with my SG series appliance - and how I use it. As far as Netgate is concerned I could use it as a coaster to put my beer on.

          What IS of Netgate's concern is that they have an unhappy customer who'se beer coaster has a faulty CPU. And I want it fixed.

          The advertisement says "This system is designed for a long deployment lifetime." This is misleading because a key component of the system has - according to its manufacturer (intel) - a higher than projected failure rate, starting at around 18 months of use.

          Therefore the SG series products are not fit for purpose and the "lemon law" should apply.

          1 Reply Last reply Reply Quote 0
          • chpalmerC Offline
            chpalmer
            last edited by

            @gcu_greyarea:

            Netgate is continuing to sell these units.

            Says you.  Ive seen no evidence of that either way.  Remember NDA's cover allot of ground.

            Until I hear from them either way Im going to hold any judgement. AFAIC Im not so sure that I won't get 100 years out of them until I see differently. So what if Im wrong. Ill deal with that road when I get to it. Im going to continue to put equipment out knowing what I know now and choose what I put out based on the knowledge I have. If that means I put something else out from someone else that is my decision. That product might have an unknown bug which does show up and ruin my day in the summer of 2019 in which the Netgate product would have still worked flawlessly.  I don't have a crystal ball and I don't pretend I could read it if I did.  I wish others would follow suit but I digress.

            No one is forcing you to buy or distribute something you don't trust. But just because you don't trust it doesn't mean anyone else shouldn't.

            Personally I blame no one but Intel on this one.  Im of the belief that this could be an attempt to limit the life of a product in order to increase future profits, and that they screwed up the math . But that's just me.

            I will hold companies like Arris and Linksys (among others) accountable for the PUMA6 debacle  because that screwup was apparent from the get go, and testing should have shown it.  Netgate and others could have never know this "fault" (thread subject) was possible from their provided documentation from Intel and Im willing to take a little responsibility with that. My customers appreciate that and will not fault me for not understanding the Crystal Ball instructions.  8)

            :)

            Triggering snowflakes one by one..
            Intel(R) Core(TM) i5-4590T CPU @ 2.00GHz on an M400 WG box.

            1 Reply Last reply Reply Quote 0
            • D Offline
              doktornotor Banned
              last edited by

              @gcu_greyarea:

              unhappy customer who'se beer coaster has a faulty CPU. And I want it fixed.

              GOTO. Now, you'd better use one of these for any of your firewalls.

              1 Reply Last reply Reply Quote 0
              • V Offline
                VAMike
                last edited by

                @gcu_greyarea:

                All of the above considered - Netgate still sold me a unit with a component that is likely to fail prematurely.

                You have no real basis for your hysteria. Please cite a credible source for "likely to fail prematurely".

                1 Reply Last reply Reply Quote 0
                • luckman212L Offline
                  luckman212 LAYER 8
                  last edited by

                  I noticed ADI has a new 01.00.00.12 BIOS out for the RCC-VE platform.  I haven't tested it, and am not recommending you run out & flash it. Just posting this for informational purposes.  The release notes can be found in this pdf on Github.  But here's a nugget from the last page:

                  RELEASE ADI_RCCVE-01.00.00.12
                  Release Date: 03/01/2017

                  The versions of software components used in this release are:
                  • SageBIOS: SageBios_Mohon_Peak_292.
                  • FSP: RANGELEY_FSP_POSTGOLD3.
                  • microcode: M01406D8125 for B0 stepping.
                  • Descriptor: ADI unlocked

                  New Features
                  • Workaround for Intel C2000 Errata AVR.58
                  A software workaround for Intel C2000 Errata AVR.50 has been implemented in this release. The
                  workaround disables SERIRQ to prevent indeterminate interrupt behavior for systems that do not have
                  external pull up resistor on SERIRQ PIN.

                  1 Reply Last reply Reply Quote 0
                  • M Offline
                    MordyT
                    last edited by

                    @VAMike:

                    @MordyT:

                    They did a cost vs risk analysis. The information they used in that is now wrong - the likelihood of failure is higher now.

                    You're just making stuff up. (Or, if there was a risk analysis, someone made up numbers to go into it.) There is no baseline failure rate, and the delta is unknown. So the math is something like "unknown * unknown = even more unknown". You're not basing this on any kind of real analysis, you're reacting to a scare story.

                    This is veering off topic - you can speculate all you want on how we did our analysis, I won't get into that.

                    @VAMike:

                    @MordyT:

                    Some of us don't believe that to be true (the equipment will be fine). As such, I have proposed another 2 options, namely a 5 year or better warranty for hardware that does have a clock signal failure (since it shouldn't happen according to netgate, this should be a no brainer), or a proactive replacement.

                    I'd be interested to hear your reasoning as to why you or anyone else (really I want to hear from Netgate) are opposed to this.

                    Because then the company is saddled with an ongoing responsibility to deal with incoming claims, whether valid or not. E.g., if someone static zaps their board 4 years from now, netgate is going to have to deal with the claim that the c2xxx bug was the problem. They're going to have to either maintain spares and just hand them to anyone who asks for one, or keep people around who remember how to deal with a long-obsolete board, or they'll have to just give people free new computers whenever they ask for one. For something that's unlikely (with an unknown magnitude) that's excessive for a small company to commit to.

                    Repeat after me: cisco is only giving out free computers to people who are paying something around the parts value of a netgate firewall every year for maintenance. If I offered to replace your netgear routers proactively if you would agree to enter a 5 year $150/yr service contract, which would also cover future failures, would you take me up on that deal? Heck, if enough people say yes I'd actually consider it–there's a decent profit to be made.

                    Actually, if you offered me a SG-2440 for $100 a year (so $500 in 5 years) in a HaaS (Hardware as a service), I would consider it strongly. Heck, I already work with a company that does WaaS (Wireless as a Service) that does something similar.

                    To all those who say "well cisco is so expensive, blah blah blah… smartnet... "... well then maybe Netgate should charge more if more needs to be charged. Some of us rather pay a premium for a premium product and not worry then not pay that premium and then have to worry. The clients I service made that call when they picked me - I'm certainly not the cheapest one around (not even close).

                    @chpalmer:

                    @gcu_greyarea:

                    Netgate is continuing to sell these units.

                    Says you.  Ive seen no evidence of that either way.

                    Umm, you have seen no evidence that Netgate is continuing to sell these units?
                    May I redirect you to here? https://store.netgate.com/SG-2440.aspx

                    @VAMike:

                    @gcu_greyarea:

                    All of the above considered - Netgate still sold me a unit with a component that is likely to fail prematurely.

                    You have no real basis for your hysteria. Please cite a credible source for "likely to fail prematurely".

                    Absolutely. How about this pdf from Intel?
                    http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-c2000-family-spec-update.pdf
                    And I quote

                    AVR54.
                    System May Experience Inability to Boot or May Cease Operation
                    Problem:
                    The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock outputs) may stop functioning.
                    Implication:  If the LPC clock(s) stop functioning the system will no longer be able to boot

                    This PDF should establish the failure part. As for prematurely, well, the ettera wouldn't have been made if it was normal spec. I'm not sure where the 18 months number comes from that I have seen flying around… but I'm willing to bet it's source is credible.
                    As for the likely part - the only thing I can point to is all the other people who are pointing to failed systems.

                    EDIT: More credible sources:

                    Intel's Robert Holmes Swan, the new CFO and executive vice president, stated:

                    "But secondly, and a little bit more significant, we were observing a product quality issue in the fourth quarter with slightly higher expected failure rates under certain use and time constraints…"

                    1 Reply Last reply Reply Quote 0
                    • chpalmerC Offline
                      chpalmer
                      last edited by

                      @MordyT:

                      @chpalmer:

                      @gcu_greyarea:

                      Netgate is continuing to sell these units.

                      Says you.  Ive seen no evidence of that either way.

                      Umm, you have seen no evidence that Netgate is continuing to sell these units?
                      May I redirect you to here? https://store.netgate.com/SG-2440.aspx

                      My point is - Your assuming they haven't implemented the workaround or similar..

                      Triggering snowflakes one by one..
                      Intel(R) Core(TM) i5-4590T CPU @ 2.00GHz on an M400 WG box.

                      1 Reply Last reply Reply Quote 0
                      • G Offline
                        gcu_greyarea
                        last edited by

                        https://blog.pfsense.org/?p=2297

                        "We apologize for the limited information available at this time. Due to confidentiality agreements, we are restricted in what we can discuss. We will communicate additional information as it becomes available.

                        As always, please be assured we will do the right thing for our customers at Netgate and the pfSense community."

                        So Netgate is unable to tell customers that their SG appliances have a fix for AVR54 because NDA's are in place with intel….? BS!

                        If Netgate had a fix - communicating this to customers would be top priority.

                        Just to be clear - I won't buy Netgate again!
                        The reason is not AVR54, but their attitude, lack of communication, lack of customer empathy and lack of ability to "put themselves into the customers' shoes".

                        1 Reply Last reply Reply Quote 0
                        • V Offline
                          VAMike
                          last edited by

                          @MordyT:

                          This is veering off topic - you can speculate all you want on how we did our analysis, I won't get into that.

                          of course not, it's just some handwaving to make the process seem a lot more impressive than reality, while disregarding the fact that the only thing that changed was sensationalist reporting.

                          @VAMike:

                          Because then the company is saddled with an ongoing responsibility to deal with incoming claims, whether valid or not. E.g., if someone static zaps their board 4 years from now, netgate is going to have to deal with the claim that the c2xxx bug was the problem. They're going to have to either maintain spares and just hand them to anyone who asks for one, or keep people around who remember how to deal with a long-obsolete board, or they'll have to just give people free new computers whenever they ask for one. For something that's unlikely (with an unknown magnitude) that's excessive for a small company to commit to.

                          Repeat after me: cisco is only giving out free computers to people who are paying something around the parts value of a netgate firewall every year for maintenance. If I offered to replace your netgear routers proactively if you would agree to enter a 5 year $150/yr service contract, which would also cover future failures, would you take me up on that deal? Heck, if enough people say yes I'd actually consider it–there's a decent profit to be made.

                          Actually, if you offered me a SG-2440 for $100 a year (so $500 in 5 years) in a HaaS (Hardware as a service), I would consider it strongly. Heck, I already work with a company that does WaaS (Wireless as a Service) that does something similar.

                          Um, no, that's $150/yr on top of the hardware cost.

                          To all those who say "well cisco is so expensive, blah blah blah… smartnet... "... well then maybe Netgate should charge more if more needs to be charged. Some of us rather pay a premium for a premium product and not worry then not pay that premium and then have to worry.

                          So, basically, you just didn't notice that you bought a product with no annual service contract fee? I'd say that most people who bought from netgate consciously chose not to buy into that business model because it's most certainly available from other vendors. Maybe that careful analysis that you don't want to talk about missed some fundamentals?

                          @VAMike:

                          @gcu_greyarea:

                          All of the above considered - Netgate still sold me a unit with a component that is likely to fail prematurely.

                          You have no real basis for your hysteria. Please cite a credible source for "likely to fail prematurely".

                          Absolutely. How about this pdf from Intel?
                          http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-c2000-family-spec-update.pdf
                          And I quote

                          AVR54.
                          System May Experience Inability to Boot or May Cease Operation
                          Problem:
                          The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock outputs) may stop functioning.
                          Implication:  If the LPC clock(s) stop functioning the system will no longer be able to boot

                          This PDF should establish the failure part. As for prematurely, well, the ettera wouldn't have been made if it was normal spec. I'm not sure where the 18 months number comes from that I have seen flying around… but I'm willing to bet it's source is credible.

                          So, again, you don't actually have anything. The intel errata doesn't include any quantification of the failure rate. You don't know what sensationalist reporting told you it was 18 months before the hardware explodes, but you're sure it's credible. Right.

                          As for the likely part - the only thing I can point to is all the other people who are pointing to failed systems.

                          Really? Please cite. Because from everything in my experience and from what I've learned talking to people with very large c2xxx deployments, they are not failing in large numbers, even the ones which are 3+ years old.

                          EDIT: More credible sources:

                          Intel's Robert Holmes Swan, the new CFO and executive vice president, stated:

                          "But secondly, and a little bit more significant, we were observing a product quality issue in the fourth quarter with slightly higher expected failure rates under certain use and time constraints…"

                          I asked for a source for the assertion "a component that is likely to fail prematurely" and you quote "slightly higher expected failure rates under certain use and time constraints". "likely to fail" != "slightly higher…failure rates [with conditions]".

                          1 Reply Last reply Reply Quote 0
                          • C Offline
                            chrcoluk
                            last edited by

                            I agree with the comments netgate should stick 2 fingers up at the NDA, they are the customer of intel and they want to keep your business, so intel will probably do jack about the NDA been breached.  I also think the NDA is technically illegal in various countries and contracts do not override law.

                            pfSense CE 2.8.1

                            1 Reply Last reply Reply Quote 0
                            • G Offline
                              garyd9
                              last edited by

                              @VAMike:

                              I asked for a source for the assertion "a component that is likely to fail prematurely" and you quote "slightly higher expected failure rates under certain use and time constraints". "likely to fail" != "slightly higher…failure rates [with conditions]".

                              I find it amusing that engineers are willing to blindly accept vague phrases such as "slightly higher expected failure rates…" without demanding something more specific.  This isn't directed at you, VAMike (your quote was just handy.)  It's just a general observation.  "slightly higher" can mean "0.00001% higher chance" or it can mean "40% higher chance."

                              I'm of the opinion that if it is "0.00001%", that Intel would come right out and say that in order to put their customers at ease.  The same train of thought suggests that the actual "higher chance" is much greater... at least a high enough percentage that Intel felt the need to hide the number  AND force anyone with an NDA to also hide the number.

                              Perhaps most telling is that not only is the issue costing Intel a significant enough amount of money that it has to be reported to shareholders, but also the speculation that this issue is causing delays in them ramping up the C3xxx lines.

                              Oh, and even netgate is using phrases that are so vague as to be meaningless (probably due to NDA.)  I believe someone from pfSense/Negate posted "The majority of at-risk Netgate products will not experience this failure over their entire service lifetime."  That statement didn't include what "service lifetime" means, and uses the term "the majority."  If 49.9% of the units experience this failure, the statement would still be accurate.

                              1 Reply Last reply Reply Quote 0
                              • V Offline
                                VAMike
                                last edited by

                                @garyd9:

                                @VAMike:

                                I asked for a source for the assertion "a component that is likely to fail prematurely" and you quote "slightly higher expected failure rates under certain use and time constraints". "likely to fail" != "slightly higher…failure rates [with conditions]".

                                I find it amusing that engineers are willing to blindly accept vague phrases such as "slightly higher expected failure rates…" without demanding something more specific.  This isn't directed at you, VAMike (your quote was just handy.)  It's just a general observation.  "slightly higher" can mean "0.00001% higher chance" or it can mean "40% higher chance."

                                I'm of the opinion that if it is "0.00001%", that Intel would come right out and say that in order to put their customers at ease.  The same train of thought suggests that the actual "higher chance" is much greater... at least a high enough percentage that Intel felt the need to hide the number  AND force anyone with an NDA to also hide the number.

                                Well, I'm less of a conspiracy theorist and assume that means that the actual number is highly dependent on other factors, and that a blanket figure is meaningless. But hey, maybe the illuminati really are manipulating this thing. I guess you could demand harder. You could threaten to hold your breath until you turn blue. You could complain a lot on the internet. But it seems unlikely that anything you do here will make intel release specific failure rates for your device. Accept that and plan accordingly. I intend to use my avoton gear until it reaches obsolescence or until it stops working, and I'm not losing any sleep over it. I could probably get supermicro to RMA, but I don't have any deployed as SPOFs without redundancy–so the vague possibility of failure is much less significant than the guaranteed headache of having to bring stuff down, pull it apart, and deal with shipping. If some new data comes out to suggest that failure is imminent, I'll reevaluate.

                                If you really just can't live with a known unknown, throw the thing out, buy yourself unknown unknowns, and move on with your life.

                                1 Reply Last reply Reply Quote 0
                                • G Offline
                                  garyd9
                                  last edited by

                                  @VAMike:

                                  But hey, maybe the illuminati really are manipulating this thing. I guess you could demand harder. You could threaten to hold your breath until you turn blue.

                                  Do you always respond to a different opinion with outrageous sarcasm?

                                  1 Reply Last reply Reply Quote 0
                                  • V Offline
                                    VAMike
                                    last edited by

                                    @garyd9:

                                    Do you always respond to a different opinion with outrageous sarcasm?

                                    It really depends on how ridiculous people are being, and for how long, and how many times they stand the dead horse up to beat it again.

                                    1 Reply Last reply Reply Quote 0
                                    • J Offline
                                      jwt Netgate
                                      last edited by

                                      @chrcoluk:

                                      I agree with the comments netgate should stick 2 fingers up at the NDA, they are the customer of intel and they want to keep your business, so intel will probably do jack about the NDA been breached.  I also think the NDA is technically illegal in various countries and contracts do not override law.

                                      While it's true that NDAs (which are civil contracts) do not trump (criminal) law, most good NDAs (including the one in question) include provisions for notice to the disclosing party should a court compel disclosure.  This allows the disclosing party to seek a protection order prior to disclosure, and you're back to square one.

                                      That you're willing to voluntarily breech an agreement that you freely entered says something about you.

                                      1 Reply Last reply Reply Quote 0
                                      • R Offline
                                        RobertLoblaw20381
                                        last edited by

                                        @luckman212:

                                        The workaround disables SERIRQ to prevent indeterminate interrupt behavior for systems that do not have external pull up resistor on SERIRQ PIN.

                                        Anyone else see this resistor on their Supermicro replacement boards?

                                        1 Reply Last reply Reply Quote 0
                                        • G Offline
                                          garyd9
                                          last edited by

                                          @RobertLoblaw20381:

                                          @luckman212:

                                          The workaround disables SERIRQ to prevent indeterminate interrupt behavior for systems that do not have external pull up resistor on SERIRQ PIN.

                                          Anyone else see this resistor on their Supermicro replacement boards?

                                          If I knew what, exactly, to look for… and where... I'd look (and even take pictures.) I have a(nother) replacement board in front of me right now.

                                          1 Reply Last reply Reply Quote 0
                                          • R Offline
                                            RobertLoblaw20381
                                            last edited by

                                            @garyd9:

                                            @RobertLoblaw20381:

                                            @luckman212:

                                            The workaround disables SERIRQ to prevent indeterminate interrupt behavior for systems that do not have external pull up resistor on SERIRQ PIN.

                                            Anyone else see this resistor on their Supermicro replacement boards?

                                            If I knew what, exactly, to look for… and where... I'd look (and even take pictures.) I have a(nother) replacement board in front of me right now.

                                            I too have both to look at…. The "what" is easy, a pull-up resistor are those little black things all over the board. They look like this:

                                            http://www.galigear.com.au/image/cache/catalog/Misc/a3c5_35-500x500-500x500.jpg
                                            What actually gets written on it will vary, but its a little black rectangular chip that's hand-soldered on the board. It may stand out more than other resistors since its likely done by hand than by a machine like the other resistors.

                                            The "where" is the hard part. Resistors all all over the board so its unclear where the PIN is. I'm currently looking for a resistor that is present on the new board that isn't on the old board. It can be on the top or the bottom of the board.

                                            All that being said, if it can be located, anyone with a precision soldering iron can just put the resistor on there themselves without having to send anything back to SuperMicro.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.