CPU Temp - Atom D525



  • Hi all,  I have setup coretemp to display CPU temp as per this thread http://forum.pfsense.org/index.php/topic,39595.0.html .  When I run sysctl -a | grep temperature,  I get this result :-

    dev.cpu.0.temperature: -1.0C                                                                                                                                                               
    dev.cpu.1.temperature: -1.0C                                                                                                                                                               
    dev.cpu.2.temperature: -1.0C                                                                                                                                                               
    dev.cpu.3.temperature: -1.0C

    This is clearly incorrect.  For the record, BIOS shows the CPU temp @ 28 Deg C.

    I tried mbmon but mbmon -d returns "No Hardware Monitor found!!"

    I am running an Intel Atom D525 CPU with a Jetway F99FL-525 Motherboard.  Has anyone got coretemp working properly with this CPU and mobo or similar?  Any ideas on how to fix?

    Thanks in advance.


  • Netgate Administrator

    Welcome to the forum!  :)

    Since the coretemp driver integregates the thermal diode on the CPU directly it shouldn't make much difference which motherboard you are using.

    Are you seeing the coretemp module loaded in the boot logs?

    Did you use the correct coretemp.ko 32 or 64 bit?

    Steve


  • Rebel Alliance Developer Netgate

    Wouldn't hurt to check/double check for a BIOS update. If coretemp is reading that temp directly and getting the wrong answer, it may be getting the incorrect info from the BIOS somewhere - it may be pulling the data from a different place than the BIOS itself shows.



  • How did you go with this ?
    I get the same readings of -1C for each core when the BIOS reports anything less than 30C, using coretemp.
    I tried turning off hyperthreading, but then coretemp shows just the 2 cores, but the same -1C.
    Adding the two core temps together when they do show a reading still doesn't add up to the temp shown in the BIOS.
    e.g. Core0 = 8C, Core1 = 11C, BIOS reports CPU temp 42C

    I have an AtomD510 in a Lanner FW-7535 and suspect it is overheating.

    PS First post here. Been playing with pfsense the last week and loving it.


  • Rebel Alliance Developer Netgate

    It's a little off, but I find it curious that the numbers are very close to their Fahrenheit equivalents.

    -1C ~= 30F

    Almost like coretemp thinks that the BIOS value is in F, so it converts it to C.



  • You could be onto something there.
    Maybe that's why the values only go to -1C. That would be a reasonable assumption to make by a programmer that a running CPU would never go below -1F.



  • Absolutely right. I had a quick look at the values I had written down, and they work out a being read from the BIOS as F and being recalculated and displayed as C.
    I had even worked out that when it was showing -1 from coretemp, the BIOS was at 32 or lower.
    Hit me with a brick and I wouldn't notice sometimes  :D


  • Rebel Alliance Developer Netgate

    Are there any options in the BIOS that might alter the F/C display? I'm curious if that would affect the output of coretemp



  • A bit more info.
    I just jumped into the BIOS to see if I could make it report in F and can't but in the BIOS the temperature string is reported as "30 C/86 F" with a degree symbol before the C and F.
    I'd reckon that coretemp is reading the first value and seeing the F on the end of the string, and calculating, in this case 30F to -1C.

    Bed time now, thanks again for the insight.


  • Netgate Administrator

    This seems unlikely to be a units problem to me. In fact in this thread we have two users who are both seeing -1 reported by coretemp and 28 and 30 in the bios. Could (must in fact) be at different times though.

    Are either of you seeing something like this at bootup:

    
    coretemp0: <cpu on-die="" thermal="" sensors="">on cpu0
    coretemp0: Can not get Tj(target) from your CPU, using 100C.</cpu> 
    

    Seems to be a possible bug with the Atom D525.

    Are you using 64bit or 32?

    Steve

    Edit: Failing to detect the Tj value (the maximum junction temperature for the CPU) should not cause this behaviour. Though it should be detected. And infact this message wasn't introduced until 8.2 so forget that!


  • Netgate Administrator

    Looking at the code for coretemp, here is the interesting part:

    
    /*
    	 * Bit 31 contains "Reading valid"
    	 */
    	if (((msr >> 31) & 0x1) == 1) {
    		/*
    		 * Starting on bit 16 and ending on bit 22.
    		 */
    		temp = sc->sc_tjmax - ((msr >> 16) & 0x7f);
    	} else
    		temp = -1;
    
    

    So a temperature reading of -1 indicates an invalid reading in the CPU MSR. Not sure what can be done about that. Are the MSRs setup by the BIOS?

    Steve



  • While looking into monitoring for something else I found this on the lm-sensors site regarding coretemp.
    I'm not real good at working out what the C source code is doing, but does this sound possible ?

    From http://www.lm-sensors.org/wiki/FAQ/Chapter3#coretempreturnsunrealisticvalues

    coretemp returns unrealistic values ¶

    The temperature value returned by the coretemp driver isn't absolute. It's a thermal margin from the critical limit, and the greater the margin, the worse the accuracy. It isn't really returning degrees Celsius. At high temperatures, the (small) thermal margin is almost expressed in degrees Celsius, but at low temperature, the (high) thermal margin is no longer expressed in actual degrees Celsius.

    So, if the temperature value reported by coretemp is unrealistically low, all it means is that you are far away from the critical limit so your systems are running totally fine and cool and you don't have to worry at all. Unfortunately, there is no way to improve the readings, this is a hardware limitation.

    Additionally, the critical limit value may be wrong on come CPU models. We may be able to address this problem over time, but again it's not really a problem in the first place. All that really matters is how far the measurement is from that limit. If the difference is above 40 pseudo degrees Celsius (again these are not real degrees Celsius!) then you're safe.


  • Netgate Administrator

    Hmm, interesting.

    That could well be true. It seems a little disappointing that Intel wouldn't correctly calibrate the thermal diode though.
    In fact the value returned as a sysctl is expressed in mK but convered back to °C by the sysctl(8) program when you integregate it. Hence the coretemp program having this:

    
    temp = coretemp_get_temp(dev) * 10 + TZ_ZEROC;
    
    

    The actual value in °C is calcualted by subtracting the register value (returned by the diode) from the maximum junction temperature:

    
    temp = sc->sc_tjmax - ((msr >> 16) & 0x7f)
    
    

    Hence if the junction value is incorrect it will return incorrect temperatures.
    The maximum junction temp appears to be 100°C for the D525 so it's probably correct.

    However if it is returning -1 I think it's safe to say that the 'valid reading' bit has not been set. Why that might be I'm not sure.

    Steve



  • @wayner:

    I am running an Intel Atom D525 CPU with a Jetway F99FL-525 Motherboard.  Has anyone got coretemp working properly with this CPU and mobo or similar?  Any ideas on how to fix?

    I'm using this board with Bios release 2, and the temperatures show correctly, as per the instructions in the thread you linked to:

    
    sysctl -a | grep temperature
    dev.cpu.0.temperature: 37.0C
    dev.cpu.1.temperature: 38.0C
    
    

Locked