Netgate 7100 freezes when temperature above 50°C
-
Okay I'm back with new thermal paste and correctly aligned heatsinks.
I ran a CPU benchmark for 45 minutes. It started out with a CPU temp of 40°C
The first half it ran just on default settings at 100% CPU. Temp went up to to 58°C without crashes. So the orientation did improve things greatly which also adds to my suspicion that the problem in the production process was to blame for my freezing problems.
I also tested running the "smart fan control" script which reduced the temperatures back to ~42°C even at 100% load. Here's the graph:
I will check all remaining 3 firewalls of my clients to see if the heatsinks are aligned wrong there too
-
I run that script here and have never seen any issues with it. However there is a risk that if it crashes or is otherwise killed for any reason the fans will just remain at whatever speed there are running. That means if it was very light load it may then be insufficient at higher loads.
As an alternative we do have a script that resets the lookup tables in the fan controller but leaves the controller in charge, independent of the CPU. It's not as good as actually maintaining the temperature though since it still relies on the board sensors.Do you have a temperature value for the CPU without the script running? Just comparing the before and after remounting the heatsink?
Steve
-
@stephenw10 In my last screenshot, the when the temperature rises, this was without the script.
I started the script at between 19:25 and 19:33 and you can see it working and the temperature falling. The script works perfectly (well except for the error you get when the script tries to spin up the fans over a value of 256 which results in errors)
I don't really have a "before" benchmark with CPU load but I will try to do it at another customers location soon
-
Ok great. I'd guess it's not dramatically different. Those numbers seems to indicate that.
-
I wanted to give an update on the matter.
I have opened and checked all of the Netgate boxes of my clients and my suspicions were correct. All clients who have experienced outages and random crashes indeed hat the heatsinks mounted in the wrong direction (against airflow) and all clients who had no problems had them in the correct orientation (same as in the pictures of the official documentation)
After fixing the heatsinks I had no more crashes even with forced heat and burn-in-tests. So this really was the cause for the crashes.
@stephenw10 please talk with your QA people about this
Best wishes from Austria
-
I will do. Thanks for checking that.