Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Intel Atom C2xxx LPC failures

    Hardware
    39
    168
    46.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      dopey
      last edited by

      @garyd9:

      Did you notice if it's a REV 1 board, or if they bumped the REV to 02?  Is there any sign of jumper wires on the board?

      I found nothing different other than the QA sticker.

      I sent an email asking about any confirmation of the fix

      Thanks for processing this.  I received the RMA on friday.  There doesn't appear to be any distinguishing marking (rev bump stuff like that) to note whether or not the board has the platform level workaround implemented for the atom cpu flaw.  Is there anyway to get some kind of confirmation that it actually has that workaround implemented?

      And this was the response

      Hello
      The replacement has the issue fixed.

      Guess I just have to trust them :)

      1 Reply Last reply Reply Quote 0
      • B
        baggar11
        last edited by

        Does the new replacement board have a different stepping if you check via command line?

        @dopey:

        @garyd9:

        Did you notice if it's a REV 1 board, or if they bumped the REV to 02?  Is there any sign of jumper wires on the board?

        I found nothing different other than the QA sticker.

        1 Reply Last reply Reply Quote 0
        • G
          garyd9
          last edited by

          @baggar11:

          Does the new replacement board have a different stepping if you check via command line?

          I got my replacement last night, and I see NOTHING different whatsoever on the board (other than its an obvious refurb that hasn't been as gently handled as my original.)  The CPU stepping is also identical: Origin="GenuineIntel" Id=0x406d8 Family=0x6 Model=0x4d Stepping=8

          On my replacement (which was a cross-ship), I've had a few problems already.  I've had to clear CMOS a couple times to get it booting, and then it crashed (kernel crash) in the middle of booting pfsense, which then resulted in a corrupt filesystem (and we all know how poorly pfsense 2.3.x deals with that.)

          All of these issues COULD be related to the CMOS being whacked out.

          Since then, I pulled the CMOS battery, erased CMOS again (several times), reconfigured BIOS, completely reinstalled pfsense and restored a backup configuration.  So far, it doesn't seem to be doing anything bad… but it hasn't even been 24 hours since I got it working properly.

          1 Reply Last reply Reply Quote 0
          • D
            dopey
            last edited by

            @baggar11:

            Does the new replacement board have a different stepping if you check via command line?

            The platform level workaround doesn't change the cpu stepping.  I'm not sure if Intel is shipping any new silicon yet.

            @garyd9:

            On my replacement (which was a cross-ship), I've had a few problems already.  I've had to clear CMOS a couple times to get it booting, and then it crashed (kernel crash) in the middle of booting pfsense, which then resulted in a corrupt filesystem (and we all know how poorly pfsense 2.3.x deals with that.)

            All of these issues COULD be related to the CMOS being whacked out.

            Since then, I pulled the CMOS battery, erased CMOS again (several times), reconfigured BIOS, completely reinstalled pfsense and restored a backup configuration.  So far, it doesn't seem to be doing anything bad… but it hasn't even been 24 hours since I got it working properly.

            I didn't have any of those problems and mine's been in place since Friday afternoon without problems so far.

            1 Reply Last reply Reply Quote 0
            • P
              pfcode
              last edited by

              My replacement is getting even worse, keeping shutdown itself for no reason within minutes after rebooting.  Installed back to my original board, working again. Obviously the replacement isn't fix at all.

              Release: pfSense 2.4.3(amd64)
              M/B: Supermicro A1SRi-2558F
              HDD: Intel X25-M 160G
              RAM: 2x8Gb Kingston ECC ValueRAM
              AP: Netgear R7000 (XWRT), Unifi AC Pro

              1 Reply Last reply Reply Quote 0
              • R
                RobertLoblaw20381
                last edited by

                I also got a replacement from Supermicro. So far it's been good, no issues. Compared both boards side-by-side, I see no physical differences on the board itself. I also don't see any QC sticker or anything. I did notice that they removed a sticker on the top of the LAN port (that previously had a serial number on it). I only know because I can still see some adhesive from where the old sticker used to be. They replaced it with a similarly sized sticker that has the barcode, serial, and the date (2/17). When I emailed them asking how I can tell the difference, I literally got the same "The replacement has the issue fixed." response. Very frustrating.

                One thing I did notice that is very different, is now on my pfSense dashboard, under System it says:

                System Super Micro C2758
                Serial: zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz

                With my old board, it actually showed me the serial number. Now, it just shows what looks like a randomly generated UUID. I have no idea what caused that… Does anyone else have the same thing?

                1 Reply Last reply Reply Quote 0
                • G
                  garyd9
                  last edited by

                  @pfcode:

                  My replacement is getting even worse, keeping shutdown itself for no reason within minutes after rebooting.  Installed back to my original board, working again. Obviously the replacement isn't fix at all.

                  Within 24 hours…  The replacement board started to have NIC dropouts on all 3 of the i354 controllers in use.  When this happens, the switch reports that the cable is unplugged (and then later plugged in.)

                  I called supermicro and complained.  A lot.  (They wanted me to send them email.  I explained that email wasn't acceptable and wasn't going to make me go away.)  I ended up having to fill out another RMA.  I have to RMA this replacement board for another replacement board while the original RMA is left open (and the hold on my CC still in place.)  Once I get a working board with the intel issue supposedly resolved, I'll send back my original board and they'll release the CC hold.

                  Damn annoying, but still better than having to wait until the original board completely fails before they'll replace it.

                  @pfcode, I'd suggest calling supermicro's RMA dept and complaining a bit...

                  Take care
                  Gary

                  1 Reply Last reply Reply Quote 0
                  • V
                    VAMike
                    last edited by

                    yeah. so this is why I'm not rushing to replace my supermicro avoton gear…

                    1 Reply Last reply Reply Quote 0
                    • DerelictD
                      Derelict LAYER 8 Netgate
                      last edited by

                      Damn annoying, but still better than having to wait until the original board completely fails before they'll replace it.

                      All boards, Clock Component or not, might fail. Yours might never fail.

                      Chattanooga, Tennessee, USA
                      A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                      DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                      Do Not Chat For Help! NO_WAN_EGRESS(TM)

                      1 Reply Last reply Reply Quote 0
                      • G
                        garyd9
                        last edited by

                        @Derelict:

                        All boards, Clock Component or not, might fail. Yours might never fail.

                        Absolutely true.  However, think of it another way:  You have a car with an expected life of 7 years.  The car manufacturer tells you that with regular maintenance and minor repairs, you can depend on the car lasting that long.

                        Then the car manufacturer discovers that one of their vendors gave them a part (which is 100% critical to the car, not easily replaced, etc, etc) that they've determined has a much "higher expected rate of failure after only 18 months."  No one tells you what "higher expected rate of failure" really means.  It might mean that after 18 months, the expected life drops to only 2 years… or perhaps 5 years.  Or, that only 1 in 50 million will have a shortened life expectancy.  You simply don't know, because the information is being purposely and deliberately hidden.

                        It's reasonable that if there was only a very tiny concern, that the vendor and/or car manufacturer would come out and say something like: "Don't worry about this!  The higher rate of failure only impacts one in every 5 million cars!"  That wasn't said... in fact, Intel is being VERY secretive about the whole thing (and forcing their direct customers to also be very secretive.)

                        So, being an IT paranoid individual, you assume the worst.  (Afterall, you use pfSense... and that defines you as paranoid.  A non-paranoid would just use some off the shelf gateway and not worry about firewalls, IPS/IDS, etc.)

                        Do you continue to drive the car, taking road trips, etc?  Or, do you first ask the car manufacturer to replace the defective part with a more reliable one?

                        Think of it in pfsense terms:  Someone gives you a reliable link to a list of strongly suspected hacking IP addresses (a DNSBL.)  You don't KNOW that malicious activity will come from all or any of those particular IP's.  It's reasonable to believe that you'd get hacked from an IP not on that list before one that IS on the list.  Do you install the DNSBL in snort or pfblockerNG?

                        The DNSBL is the news we have about the C2xxx chips.  snort/pfblockerng is sending the board in for repair or replacement.  Any questions? ;)

                        1 Reply Last reply Reply Quote 0
                        • G
                          garyd9
                          last edited by

                          @garyd9:

                          Damn annoying, but still better than having to wait until the original board completely fails before they'll replace it.

                          Another thing…  The "replacement" board they sent me ran about 6 degrees celsius HOTTER than my old/original with exactly the same fans, fan speeds, location, etc.  The old/original board has the CPU cores showing around 34 degrees and the replacement board was showing 40 degrees.  Room/ambient temperature is controlled at 72 degrees F.

                          Well, the replacement board is removed, and packed up ready to get sent back.

                          1 Reply Last reply Reply Quote 0
                          • D
                            dopey
                            last edited by

                            @RobertLoblaw20381:

                            One thing I did notice that is very different, is now on my pfSense dashboard, under System it says:

                            System Super Micro C2758
                            Serial: zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz

                            My original board and this one all show a random uuid.  For all I know, it's the same uuid.

                            I'm feeling pretty lucky compared to everyone else here then.  I've had no problems with my board, everything runs great, CPU temps are right in line with what they've always been (22-25c)

                            And everything just works.

                            1 Reply Last reply Reply Quote 0
                            • G
                              gcu_greyarea
                              last edited by

                              Gary,

                              If you can afford it go ahead and buy/build a new system based on another CPU platform. This entire C2000 saga is asking for trouble. You don't want a refurbed system that's been handled by - only god knows ….
                              You also dont want to sit on a time bomb.

                              Again - if you can afford it get another production system. Then take the time required to have your C2000 repaired/refunded. They will still make a good system for testing/playing or cold standby.

                              Your purchase dcision is your voting power.

                              1 Reply Last reply Reply Quote 0
                              • A
                                athurdent
                                last edited by

                                I think pfSense uses the serial from BIOS/DMI. If there is a Serial entered it will be displayed, otherwise just a generated ID. Maybe the old boards displaying a serial just had those entered in the BIOS.

                                BTW, my first C2758 board had to be exchanged because of heat problems when I initially bought it 2 years ago. Almost everything would SIGSEGV after a few hours of operation. So if your replacement board behaves odd, I'd suggest to get it replaced again.

                                1 Reply Last reply Reply Quote 0
                                • D
                                  doktornotor Banned
                                  last edited by

                                  @athurdent:

                                  BTW, my first C2758 board had to be exchanged because of heat problems …

                                  … and recycled for another happy customer 2 years later.  ;D :P

                                  1 Reply Last reply Reply Quote 0
                                  • A
                                    awontroba
                                    last edited by

                                    The effect of a pervasive component failure and the manufacturer's reaction has a strong effect on their reputation, if nothing else with their current customers.

                                    Two personal examples.

                                    On the car analogy front.
                                    The BMW Mini 2002-2005 has a problem with its power steering pump, which intermittently fails, making it very hard to control at any speed due to the effort needed to turn the steering wheel. At normal road speeds you suddenly loose the ability to take corners.

                                    In 2015 in the USA and Canada the regulators forced a recall. http://www.autonews.com/article/20151028/OEM11/151029806/bmw-recalls-86018-mini-cooper-models-for-power-steering-glitch

                                    In the UK, the regulator accepted BMW's statement that the problem did not affect safety. Hah! BMW would only replace the pump at a low cost if the Mini had been serviced throughout by a BMW dealer and you could deliver it to a BMW dealer with the fault present. Cost me £850 for a new pump.

                                    I will never buy a BMW car or bike again.

                                    On the Atom clock problem on FreeNAS Minis

                                    https://support.ixsystems.com/index.php?/Knowledgebase/Article/View/289

                                    Will iXsystems replace my FreeNAS Mini motherboard under warranty if I experience this issue? What if my warranty is expired?
                                    iXsystems is proud to stand behind its products. We’re extending the warranty on all second generation FreeNAS Mini motherboards shipped before February 2017 to a total of three (3) years. Any FreeNAS Minis shipped in February and after will have our standard one year warranty and are completely free of this issue.

                                    So, my 18 month old FreeNAS Mini will be fixed if it develops this problem in the next 18 months, and my new FreeNAS Mini XL shipped in February should not suffer from it.

                                    iXsystems are taking the same stance with a recently announced problem with the BMC component (a firmware bug is wearing out the BMC flash).  https://support.ixsystems.com/index.php?/Knowledgebase/Article/View/287/60/asrock-rack-c2750d4i-bmc–watchdog-issue.

                                    I am happy with iXsystems as a supplier.

                                    Moving on to Netgate

                                    https://blog.pfsense.org/?p=2297

                                    A board level workaround has been identified for the existing production stepping of the component which resolves the issue.  This workaround is being cut into production as soon as possible after Chinese New Year.  Additionally, some of our products are able to be reworked post-production to resolve the issue.

                                    I recently bought a SG-2220 for my mother's home, and am keeping the ALIX box I used before as a backup. So I'll cross my fingers and hope that it doesn't die.

                                    I was considering replacing the APU box I use at home with a SG-2440 or SG-2860. Now, until Netgate confirm that shipping models will not suffer from the problem, I will not buy; and may well go the much cheaper barebone or self build route.

                                    1 Reply Last reply Reply Quote 0
                                    • G
                                      gcu_greyarea
                                      last edited by

                                      VW have a problem with their Direct Shift Gearbox (DSG) where clutch discs wear out and the ride becomes very jerky. Related to this issue a mechatronic box that overheats, breaks and leaves the car without power.

                                      My car developped the dreaded gearbox shudder and jerkiness.

                                      • VW tells me there is no issue and I'm imagining it

                                      • VW installs a software update - problem goes away, but comes back soon

                                      • VW installs a new mechatronic box/controller - doesn't fix it

                                      • VW replaces the clutch as "good will" - problem is fixed

                                      • a woman dies as her VW loses power in Australia and becomes sandwhiched inbetween two trucks - VW acknowledge there's a problem - recall for mechatronic controller box.

                                      • clutch issues remain

                                      • 20.000 km later my clutch dies again

                                      • another VW service station want to charge me for the required software update

                                      • i explained that I already had the software update - no charges

                                      • VW confirms clutch out of tolereance and replace clutch pack

                                      • 20.000 km later my clutch dies again

                                      • VW dealership blames clutch issue on driving behaviour

                                      • i explain that I know about the issue, clutch gets replaced under warranty

                                      • car has 60.000km, now third clutch

                                      I will not buy a VW again !

                                      The point is that customers value after sales care. It is equally if not more important than product features.
                                      At least VW didn't point the finger at the clutch manufacturer. I bought the car from VW and I do not care who they source their parts from.

                                      I bought my Netgate Appliance from the Netgate store. That's why Netgate has skin in the game. I bought from Netgate - not Intel.

                                      1 Reply Last reply Reply Quote 0
                                      • G
                                        garyd9
                                        last edited by

                                        @gcu_greyarea:

                                        • 20.000 km later my clutch dies again

                                        I think I see your problem.  If you had driven 12,427 MILES instead of messing with those silly km's, you'd have had better luck with your clutch.  :P

                                        (I wonder if this relates to the thread titled "How is throughput measured?")

                                        1 Reply Last reply Reply Quote 0
                                        • M
                                          MordyT
                                          last edited by

                                          I have been watching this thread closely since the bug was disclosed (and the thread at synology forums). Today - I'll throw my 2 cents in.
                                          I have skin in the game on both these vendors (as well as with Cisco) and was hoping to see more Cisco solutions then Netgate/Synology solutions.

                                          (Proactive replacement vs Reactive replacement)

                                          I do IT for a lot of businesses. Understand then, when I recommend something I do it because I think this is the best solution for the client based on their needs. And they trust me to do so - and arm them with the facts so they can make the best decisions.

                                          I have been using pfSense for 5+ years now (started in the 1.x series). I have pfsense at home (on a NUC), at multiple client sites ( on official SG-2xxx, on other custom builds, even fireboxes, etc). I probably have a 50/50 split on the hardware between store and custom. And the store hardware is more expensive - always. But I buy from the store in order to support the project. The same reason I have a gold subscription.

                                          However, I can't jeopardize / destroy my reputation to support Netgate. When I said to the client "even though it is more money, buying from the store is better since you will support the devs, + get quality purpose built hardware that should last longer then custom builds" they made that decision to do so because I recommended it. And I recommended it because I trusted Netgate to provide what they advertised and make it right if they didn't. A $500 firewall should last more then 18mos (or 36, or even 60 - 5 years on a firewall or switch is normal in this industry. 5 years on a PC with moving parts is even normal.).

                                          I understand this isn't Netgate's fault. Netgate did what Netgate was supposed to do and Intel did not. They are at fault. But the relationship is between Intel and Netgate, not Intel and me. Just like the relationship between me and my client is mine, not Netgates. And that is why I am holding Netgate responsible to make it right - and I hope they are doing the same with Intel (asking Intel to make it right to them). I'm going to have to make it right with my client - I doubt they will see the labor associated with replacement as something they should pay for.

                                          So Netgate, please make this right. 3 year warranty is nice, sure - but what happens in year 4? What happens when it fails and the client is down for days while a new one is shipped (hey HA people, the client didn't know they would need HA at time of purchase. If they had known that there was an expected failure, they could have used that fact while determining what to purchase).  And all the other scenarios that can and will play out. Who pays for my time to troubleshoot, install, configure, etc? Who gets punished for the unexpected schedule change needed to replace a failed unit?

                                          You can have the best product in the world and if your post sales team sucks, your product will not sell well. And the worse product, but if your post sales team makes it right, they will come back (at least the first few times).

                                          So Netgate, please do a proactive replacement. Make Intel pay for the cost to do so (you can bet Cisco is doing that). Because if you don't, I know that I will be forced to make a choice between a product that makes me look bad when it fails and another product. As I will also have to do with my NAS recommendations. And my reputation is more important to me then lining Netgates (or anyone elses) pockets - as much as I love the project.

                                          /ball is in your court now

                                          1 Reply Last reply Reply Quote 0
                                          • A
                                            athurdent
                                            last edited by

                                            @MordyT:

                                            What happens when it fails and the client is down for days while a new one is shipped (hey HA people, the client didn't know they would need HA at time of purchase. If they had known that there was an expected failure, they could have used that fact while determining what to purchase)

                                            So, why didn't they know? Someone should have told them. Hardware breaks, all the time, everywhere, suddenly. So there's always an expected failure.
                                            They have no spare or support contract for their device, they are down. This is not specific to some Intel issue.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.