Very high CPU temperature after CPU upgrade
-
Hi there,
I was looking for a low TPD CPU to go in my Dell R210, so replaced the stock i3-540 with a Xeon 3426 @1.87GHz and ever since I'm seeing a very high CPU temperature:
[2.3.3-RELEASE][/root]: sysctl -a|grep "dev.cpu.*.temperature" dev.cpu.3.temperature: 68.0C dev.cpu.2.temperature: 66.0C dev.cpu.1.temperature: 67.0C dev.cpu.0.temperature: 65.0C
and goes up to 78C, which is unreasonably high to me for a 1.87GHz CPU. I also see these in the dmesg, which I haven't seen before:
[2.3.3-RELEASE][/root]: dmesg|grep -i temp coretemp0: <cpu on-die="" thermal="" sensors=""> on cpu0 coretemp0: Tj(target) value 69 does not seem right. coretemp1: <cpu on-die="" thermal="" sensors=""> on cpu1 coretemp1: Tj(target) value 69 does not seem right. coretemp2: <cpu on-die="" thermal="" sensors=""> on cpu2 coretemp2: Tj(target) value 69 does not seem right. coretemp3: <cpu on-die="" thermal="" sensors=""> on cpu3 coretemp3: Tj(target) value 69 does not seem right.</cpu></cpu></cpu></cpu>
With 3.06GHz i3, it was mostly 44C - 47C but I was running v2.2.x previously, which is the only difference now.
With ipmitool most of the reading is coming as Disabled but Ambient Temp is only 13C:[2.3.3-RELEASE][/root]: ipmitool sdr elist|grep -i temp Temp | 01h | ns | 3.1 | Disabled Ambient Temp | 0Eh | ok | 7.1 | 13 degrees C Planar Temp | 0Fh | ns | 7.1 | Disabled CPU Temp Interf | 76h | ns | 7.1 | Disabled Mem Overtemp | 20h | ok | 34.1 | Correctable ECC Temp | 0Ah | ns | 8.1 | Disabled
Is it pfSence that cannot read the temp properly or something wrong with the CPU? Power consumption is also a bit high to me for a 45 TPD CPU. Can anyone help?
-San
-
have you tried updating the bios ?
-
have you tried updating the bios ?
Yes. All the firmware, including BIOS and BMC, updated to their latest version.
[2.3.3-RELEASE][/root]: ipmitool mc getsysinfo system_fw_version 1.10.0
-S
-
Hope you used quality thermal paste like arctic silver. Speaking of BIOS.
I think you should go for OS Control or play with Custom to get at your fans and memory with OS DBPM.
May need to bump up fans or tweek your memory. It should solve the speedstep issue also. ;)
That thing is running too hot. 58c is max temp if the source i read was right.
Double check temp readout in BIOS and test memory, I think it is in there also. -
@webtyro:
Hope you used quality thermal paste like arctic silver. Speaking of BIOS.
I think you should go for OS Control or play with Custom to get at your fans and memory with OS DBPM.Used Arctic Silver only.
w.r.t BIOS, at the moment it's Custom with OS DBPM, Min Fan and Max memory. I didn't ry the Max Fan but whatever other settings I do, it never goes below 65C. I don't see any CPU Temp in BIOS but I'll give it a try in an hr. and let you know. The thing doesn't make sense to me is: with this setup, i3 was just fine, even with at the hight clock-rate.Thanks!
-S
-
Update bios, check if heatsink is properly mounted, use artic silver
-
Update bios, check if heatsink is properly mounted, use artic silver
:o Sunday morning, more coffee dude. Well 1 of 3 is not too bad. ;)
Kidding aside I think you may be right. If it was me I would look at again. Same memory, lower power rating, something makes no sense. should be cooler than previous cpu. Sensor failure? -
@webtyro:
Same memory, lower power rating, something makes no sense. should be cooler than previous cpu. Sensor failure?
Exactly, something doesn't make sense. I checked (and re-checked) all the usual suspects and running out of ideas. The thermal paste I'm using is about an yr. old, if that's making any difference. Also, it always reports coretemp1: Tj(target) value 69 does not seem right during the boot regardless of the temperature. How do I make sure if there is any failure? The CPU was bought off eBay - not sure if I really should send it back.
-San
-
It does make one wonder if the cpu is on its last legs. Used off ebay makes it even more suspect.
Have another board to test it on.
The one thing about electronic chips or the like, too much heat is a sure sign of problems. That cpu will not last at those temps also. -
If it is really running that hot you will burn your finger if you were to touch the heatsink.. Might help to tell you if its an accurate reading or not.
Check it with a meat thermometer.
Only time mine got that hot was using offloading.
-
A couple of other thoughts…. did you properly clean the heat sink before reapplying?
Why not try a live linux distro and run a few tests. See how it fares with that.
Just a couple of thoughts. Good Luck, that type of problem can be a real PITA!
-
the heatsink might not be sitting properly as one screw is not tight enough,etc if its sitting lopsided, that could happen.
pull the motherboard out of the system and remount the heatsink out on the desk. clean everything with alcohol and reapply artic silver.
could be that the heatsink is too small for a quad core?
-
The old Intel Core i3-540 has a lithography of 32nm,
the new Xeon L3426 has a lithography of 45nm.
This means that de core surface of the Xeon cpu is larger, a larger core surface means more heat production.
More heatproduction means more cooling to reach lower temperature.
You can't see this difference in surface, because both cpu's use the same metal coverplate over the core.Grtz
DeLorean -
The old Intel Core i3-540 has a lithography of 32nm, the new Xeon L3426 has a lithography of 45nm.
Agree with lithography: 45nm tends to generate more heat but should it be that much of difference? The whole practice was to reduce power consumption. Even thought it's 45 TPD, do you think I was better off with i3 taken everything in consideration? The CPU actually doesn't feel that hot at all if i touch it.
-S
-
I ran into this issue sometime last year when I was changing processors from an i5 to and i3 to save some power consumption. After running through all the usual suspects, I started from scratch. Removed the motherboard, CPU heatsink fan, CPU. Cleaned everything throughly and air blowed the dust off the motherboard (keeping the CPU installed so as to not get the dust particles inside the CPU connectors). I then removed the CPU and re-seated it. Since I had another CPU fan from a similar machine I was using, I swaped it and did notice the heatsink was fitting more snugly than the earlier one. Booted up and saw my temps down to 27-31C range from 49-55C. At times its not the thermal paste but the heatsink fan clips which may be a bit off on a side that may be cauzing the high CPU temps.
-
The reported value is almost certainly incorrect.
The value is calculated using Tj-max and the number coming from the CPU registers. If the reported Tj-max is wrong you will see high numbers.
Those boxes usually have pretty good cooling. If you have powerd enabled and the heatsink is mounted correctlyI'd be surprised to see a coretemp value that high.
For example:
dev.cpu.0.temperature: 32.0C dev.cpu.0.coretemp.throttle_log: 0 dev.cpu.0.coretemp.tjmax: 98.0C dev.cpu.0.coretemp.resolution: 1 dev.cpu.0.coretemp.delta: 66
Steve
-
Those boxes usually have pretty good cooling. If you have powerd enabled and the heatsink is mounted correctlyI'd be surprised to see a coretemp value that high.
I'm completely running out of ideas: R210 has a screw type CPU cooler - either it sits properly or it's not. I've reapplied the thermal-paste and stuff few times and I don't believe I did something wrong every time. checked all the CPU and chassis fans - all running okay. But whatever I do, it never goes down below 58C. How/where do I check the tjmax value oe set it correctly?
It's a lovely little server and really liked that 45 TPD CPU - really like to see that working. Any idea how do I debug further?-S
-
tjmax value is all showing as 100C
dev.cpu.3.coretemp.tjmax: 100.0C dev.cpu.2.coretemp.tjmax: 100.0C dev.cpu.1.coretemp.tjmax: 100.0C dev.cpu.0.coretemp.tjmax: 100.0C
Shouldn't it be like 70 for Xeon?
That also reminds me about on ething: on every boot it says: "Tj value of 69 is not right".# dmesg | grep -i tj coretemp0: Tj(target) value 69 does not seem right. coretemp1: Tj(target) value 69 does not seem right. coretemp2: Tj(target) value 69 does not seem right. coretemp3: Tj(target) value 69 does not seem right.
69C seems right for Xenon. Is it possible that pfSense is not actually seeing it as a Xenon?
-S
-
Yes, very likely though that usually something that is passed by the BIOS.
If it's assuming 100 for Tjmax and it's actually 70 then the temperature will read 30C too high which seems about right.
Unfortunately there is no way of passing the correct value to coretemp as far as I know. There was a previous similar thread about a similar situation where I think the user ended up modifying the dashboard widget to remove the appropriate value before displaying it.
Try to find out the real Tjmax value for your exact CPU. It's something that seems to be unpublished for some reason though.
Steve
-
Unfortunately there is no way of passing the correct value to coretemp as far as I know. There was a previous similar thread about a similar situation where I think the user ended up modifying the dashboard widget to remove the appropriate value before displaying it.
Probably talking about this one: https://forum.pfsense.org/index.php?topic=75092.0 ?
-S