WG x750e - automatic speed adjustment: mbmon going crazy



  • Hello all,

    I wrote a script to automatically adjust the fan speed depending on the CPU temperature:

    #!/bin/sh
    # Damien P - 26-May-2013
    # Usage auto_speed.sh [interval in seconds]
    
    #
    # Prerequisites:
    # Install of mbmon package:
    #       pkg_add -r ftp://ftp-archive.freebsd.org/pub/FreeBSD-Archive/ports/i386/packages-8.1-release/Latest/mbmon.tbz
    # Install of cpuburn to load CPU and test program
    #       pkg_add -r ftp://ftp-archive.freebsd.org/pub/FreeBSD-Archive/ports/i386/packages-8.1-release/Latest/cpuburn.tbz
    # Validate packages:
    #       rehash
    # To run the burn test:
    #       burnP6 &
    
    # Default check interval in seconds
    DefaultInterval=10
    
    # First temperature threshold and associated fan speed setting
    Temp1=35
    Fan1=20
    
    # Second temperature threshold and associated fan speed setting
    Temp2=40
    Fan2=28
    
    # Third temperature threshold and associated fan speed setting
    Temp3=45
    Fan3=35
    
    # Last temperature threshold and associated fan speed setting
    Temp4=60
    Fan4=FF
    
    # Check interval in seconds
    CheckInterval=${1:-$DefaultInterval}
    
    if ! [ -z "`echo $CheckInterval | tr -d [0-9]`" ]; then
    i       CheckInterval=$DefaultInterval
            echo "$1 is not a positive integer, using default value [$DefaultInterval]" 1>&2
    fi
    
    # Enter infinite loop
    while [ true ]; do
            # Get CPU temperature (in celsius)
            CpuTemp=`/usr/local/bin/mbmon -I -i -c1 -T2`
    
            if [ $CpuTemp -le $Temp1 ]; then
                    FanSpeed=$Fan1
            elif [ $CpuTemp -le $Temp2 ]; then
                    FanSpeed=$Fan2
            elif [ $CpuTemp -le $Temp3 ]; then
                    FanSpeed=$Fan3
            else
                    FanSpeed=$Fan4
            fi
            /usr/local/bin/WGXepc -f $FanSpeed
            sleep $CheckInterval
    done
    
    

    It works perfectly well for a couple of days until mbmon (see code for download location) starts showing a CPU temperature of 88 C:
    $ /usr/local/bin/mbmon -I -i -c1 -T2
    88
    From there, the fan speed is set to its maximum (really noisy  >:() and though the CPU is almost idle and the air getting out the box really cool, mbmon still indicates 88 C.
    A reboot solves the issue but I do not want to have to do it every two days.

    Did anybody encounter the same issue ?


    Damien


  • Netgate Administrator

    Hmm, well I doubt anyone else has experienced that exactly since you're the first to try it! ;)
    Anyway looks like interesting stuff. So you're saying that after ~2 days mbmon no longer reports the correct temperature value. Is that 2 days uptime? 2 days running the script? Does mbmon report the other values correctly?

    Do you end up with a large number of processes running or anything else odd?

    Since mbmon is just reading the SuperIO chips values you could try installing the superiotool and reading the values with that instead. The output is not really readable usefully (without the datasheet) but you could see it it's a problem with the sensor rather than mbmon.

    Steve



  • Thanks for your quick and thoughtful reply :D
    @stephenw10:

    So you're saying that after ~2 days mbmon no longer reports the correct temperature value. Is that 2 days uptime? 2 days running the script?

    It has in fact been running for 6 days now and the auto_speed.sh script starts upon boot time.

    @stephenw10:

    Does mbmon report the other values correctly?

    I do not know. How does this look like ?

    $ /usr/local/bin/mbmon -I -c1

    Temp.= 25.0, 88.0, 67.0; Rot.=    0,    0,    0
    Vcore = 0.96, 2.26; Volt. = 3.41, 5.16, 12.59, -12.45, -2.18

    @stephenw10:

    Do you end up with a large number of processes running or anything else odd?

    This was my first thought too but the system is running smoothly without any duplicate processes and with a very low CPU load:

    last pid: 61328;  load averages:  0.01,  0.02,  0.00  up 6+03:56:54  15:41:26
    49 processes:  1 running, 48 sleeping
    CPU:  0.7% user,  1.1% nice,  2.6% system,  1.5% interrupt, 94.1% idle
    Mem: 45M Active, 17M Inact, 76M Wired, 120K Cache, 59M Buf, 1849M Free

    @stephenw10:

    Since mbmon is just reading the SuperIO chips values you could try installing the superiotool and reading the values with that instead. The output is not really readable usefully (without the datasheet) but you could see it it's a problem with the sensor rather than mbmon.

    Thanks for the tip. However I have a hard time understanding the output (mbmon output below but could not get anything matching):

    $ superiotool
    superiotool r
    Found Winbond W83627HF/F/HG/G (id=0x52, rev=0x41) at 0x2e

    $  ( superiotool -d ; mbmon -I -c1 ) | tail -7

    LDN 0x0b (Hardware monitor)
    idx 30 60 61 70 f0
    val 01 02 90 00 00
    def 00 00 00 00 00

    Temp.= 25.0, 88.0, 67.0; Rot.=    0,    0,    0
    Vcore = 0.96, 2.26; Volt. = 3.41, 5.16, 12.59, -12.45, -2.18

    Here is what I got after a reboot:

    $ ( superiotool -d ; mbmon -I -c1 ) | tail -7
    LDN 0x0b (Hardware monitor)
    idx 30 60 61 70 f0
    val 01 02 90 00 00
    def 00 00 00 00 00

    Temp.= 25.0, 30.5,  3.0; Rot.= 675000, 225000, 11842
    Vcore = 0.96, 2.16; Volt. = 3.41, 5.16, 12.59, -12.45, -2.18

    superiotool output is exactely the same, unlike mbmon's.


    Damien


  • Netgate Administrator

    Ah, interesting so mbmon, after some time or number of calls, stops reporting both the temperature and the fan speeds. I assume the fans were actually running at that point. You have seen how the fan speed cannot be read correctly by the superio chip at speeds below ~BB. Since at that time they were running at FF they should have been correctly displayed. Potentially I could imagine a scenario where the fan speed read goes into 7 figures and crashed something. Given enough readings this might happen.

    I don't have the superIO data sheet here but it's easily available. I remember reading through it for hours to work out what register did what and how to change it. Anyway I believe you need to run the extended dump to see the all the registers including the fan speed and temperature output. One word of caution here as I caught me out. On the XTM5 box, which has a different SuperIO chip, running the extended dump somehow locked the chip such that after running it I was no longer able to change any register. I don't remember that ever happening on the X-e box and since you're only reading values it shouldn't be an issue anyway.

    Steve



  • Adding the extended dump on my x750e only outputs one more useless line:
    Hardware monitor (0x0295)
    I guess I'll have to stick to a fixed fan speed and forget about having it automated.

    Damien


  • Netgate Administrator

    Hmm, that's no good then. Don't give up yet automatic fan speed control would be great.  :)
    Let me fire up my X750e and see if I can replicate your results.

    Steve


  • Netgate Administrator

    Ok, I checked my box here and yes you're right superiotool does not show the relevant info for some reason. It works fine with some other chips.
    However it is the SuperIO chip that mbmon is reading and it is possible to read it directly but it's not that straight forward.

    To read the values from the chips you set it's index register to required value and then read the value from the data register. In this case (and almost every case) those are I/O space 0x295 and 0x296 respectively. To do this when I was initially looking for the LED control I "wrote" (I use that term very loosely!) two programs, readio and writeio, which can be found at my Google site. These can be used to write to any address in I/O space so can obviously cause problems. There is no sanity checking or anything, you've been warned!  ;)

    Using those we can read values. So for example the 8bit temperature value from sensor 1 is at control register CR27 (the one value SuperIOtool skips past  :-\ ):

    [2.0.3-RELEASE][root@testbox.localdomain]/conf(347): ./writeio 0x295 0x27
    Setting 295 to 27
    [2.0.3-RELEASE][root@testbox.localdomain]/conf(348): ./readio 0x296
    Reading 296 :19
    
    

    It's hex value 0x19, 25 degrees C. We know that's correct from mbmon.
    Unfortunately sensor 1 is almost certainly not connected. Sensor 2 is the only one returning useful data (though I did play with sensor 3 a bit with interesting results). The data values for Sensor 2 are in 'Bank 1' which can only be accessed by first setting the 'bank select' register, 0x4e, to 1. So:

    [2.0.3-RELEASE][root@testbox.localdomain]/conf(349): ./writeio 0x295 0x4e
    Setting 295 to 4e
    [2.0.3-RELEASE][root@testbox.localdomain]/conf(350): ./writeio 0x296 0x01
    Setting 296 to 1
    [2.0.3-RELEASE][root@testbox.localdomain]/conf(351): ./writeio 0x295 0x50
    Setting 295 to 50
    [2.0.3-RELEASE][root@testbox.localdomain]/conf(352): ./readio 0x296
    Reading 296 :28
    
    

    We finally get the value 0x28 which is 40C. About what we'd expect. In fact the value is a 9 bit number with the 9th bit offering 0.5C accuracy but these 8MSBs are probably fine here (also it's stored as 2s compliment so if the numbers go negative things get complicated.  ;)).

    If you don't set the bank select back to 0 mbmon will get confused the first time you run it but runs fine second time around.

    It would be interesting to give this a go when mbmon has freaked out to see if the SuperIO chip itself is still giving useful data. It would be easy enough to incorporate this into WGXepc if it does.

    Steve



  • I'm not too sure why you set register 0x295 to 0x50 once 0x296 has been properly set to 0x01 ?

    Nevertheless, I could confirm that executing these four commands returned the same value than mbmon, both when CPU is idle and supercharged with burnP6.
    I restarted my automated script and will patiently wait for it to go crazy again :P

    Thanks for your detailed explanation and time!

    Damien


  • Netgate Administrator

    @bigramon:

    I'm not too sure why you set register 0x295 to 0x50 once 0x296 has been properly set to 0x01 ?

    Because in order to read the actual temperature value, which is at register 50 on Bank1, you need to point the index register, 0x295, at it. It was previously still pointed to 0x4e, the bank select register.

    The first time I read through the data sheet and tried to comprehend how this all worked my mind nearly melted.  :D It still took a while the second time around. One thing to be aware of is that in order to save processing time/instructions many of the registers automatically increment the index register. This helps if you want to read in a series of values, say 0x20 to 0x2f, you just set the index to 0x20 and then read the data register. That automatically increments the index by 1 so now you just read the data register again and it will be the value of 0x21 etc. This means that if you set the Bank to 1, set the index to 0x50 and read the value at the data register you cannot just read the data again to see it has changed, the index register will now be pointing at 0x51. You have to set the index to 0x50 every time.

    It's interesting that mbmon expects to find the bank select register at 0 and when it's not it has a hard time, perhaps a clue? Something else intereacting with the SuperIO chip is leaving it in a state that confuses mbmon perhaps.

    As an aside I notice your values for sensor 3 are different from mine. is 67C a common reading? My own shows mostly 127 with occasional 0 and even more occasional 2.0. Have you changed the CPU? Interestingly reconfiguring it as a thermistor instead of a thermal diode produced much more reasonable values though still clearly not real temperatures.

    Steve



  • Ah, it's (somewhat) clearer now  :)

    I did not have a look at the mbmon code yet so I do not know why it behaves like this.
    I can't wait though to see whether your trick with the registers will still work when mbmon doesn't anymore !

    I indeed invested massively ($10 including shipping) to replace the original Celeron 1.3GHz:
    Intel(R) Pentium(R) M processor 2.00GHz
    Current: 225 MHz, Max: 1500 MHz

    More horsepower and less electricity/heat  ;D

    Damien



  • I added some detailed logging to understand what is going one and after 11 days, mbmon was only sporadically returning incoherent values (around once a day whereas measures are made every 10 seconds).
    Maybe the multiple register manipulations helped  ???

    At this stage, I'd say the automatic fan speed adjustment is working !

    Thanks Steve for your input  ;D

    Damien


  • Netgate Administrator

    @bigramon:

    Maybe the multiple register manipulations helped  ???

    Plausible but sort of disappointing. Nothing to say it won't happen again or start doing something different.

    Steve



  • Well, time will tell. At some point, I will reboot the x750e and with the level of logging I have in place now, it won't escape me.

    Damien


  • Netgate Administrator

    I added the temperature reading to WGXepc so you can try that if mbmon starts acting up again. Though you'd have to parse it's output for the value in your script somehow.
    http://forum.pfsense.org/index.php/topic,32013.msg346092.html#msg346092

    Steve



  • This time, mbmon started producing bogus data (CPU temp around 9°C) after 24 days of uptime.
    Unfortunately, the new version of WGXepc provides the same measure.

    Let me know if you want me to try something else, otherwise, I'll just reboot the machine.

    Damien



  • So, is auto fan speed control daemon showing up in the next release as an option click-to-enable?


  • Netgate Administrator

    This script is specifically for the Watchguard firebox X-e boxes so it's very unlikely to be in a pfSense realease.  Even if it were more generic it probably wouldn't ever be included as more than a package. Any script that can control the fan speeds in a box has the potential to cause damage by overheating the CPU. Usually in something with thermal fan control, like a laptop, the control is handled directly by the SuperIO chip such that it will continue to cool correctly even if the OS crashes. Unfortunately the superio chip in the X-e box doesn't support this.

    @bigramon
    So reading the temperature with WGXepc gives the same value as mbmon. That implies the SuperIO chip is actually reporting the wrong value. Why could that be? It could be the temperature offset register has been set some how or that it's reading the wrong register for some reason. We could try investigating that but it will be quite involved. You could try using WGXepc in your script instead. Since it only reads one register (where as mbmon reads all registers everytime it's run) it may make a difference.

    Steve



  • Unfortunately, it did not work much better. After 2 weeks, I started getting temperatures of 255°C.
    When I tried mbmon from the command line, it reported an error stating it could not access the hardware.
    Everything went back to normal after a reboot.

    Damien


  • Netgate Administrator

    Ah, so what were you using in the script that didn't work? I forget where we left off.  ::)

    Steve



  • Something gets wrong after a short while:

    • WGXepc does not properly set the fan speed anymore
    • WGXepc does  not report the proper temperature anymore
    • mbmon does not execute properly anymore.
    
    $ /usr/local/bin/WGXepc -f 50
    Found Firebox X-E
    Fanspeed set to 50
    
    $ /usr/local/bin/WGXepc -f
    Found Firebox X-E
    Fanspeed is ff
    
    $ /usr/local/bin/WGXepc -t
    Found Firebox X-E
    SuperIO sensor 2 reads:
    255
    
    $ /usr/local/bin/mbmon -I -i -c1 -T2
    No ISA-IO HWM available!!
    InitMBInfo: Unknown error: 0
    
    

    At this stage, I guess the best option would be to modify the WGXepc source code to allow it to keep executing in the background and set the fan speed depending on the temperature that is read.

    Damien


  • Netgate Administrator

    Ok, but was your script using mbmon or WGXepc to read the temperature? The difference between them is that mbmon reads reads a whole load of values every time it's run even if you only need one. To get all those values requires setting the SuperIO chip in various modes. It's possible that under certain conditions mbmon leaves the superio chip in some error state. It may be possible to determine what the error state is and recover from it or to avoid it in the first place.

    Steve



  • My script was using only WGXepc.

    Damien


  • Netgate Administrator

    Damn!
    Well the only thing to do then is try to find out why the SuperIO chip is no longer responding usefully. Quite how to do that though….  ::)

    Steve



  • Did you guys ever resolve this? I have exactly the same problem. Running mbmon returns reasonable temps for a while and suddenly starts reporting this:

    Temp.= 255.0,  0.0,  0.0; Rot.=    0,    0,    0
    Vcore = 4.08, 4.08; Volt. = 4.08, 6.85, 15.50,  6.07,  5.11

    When I cancel mbmon and run it again, I get:

    ioctl(smb0:open): No such file or directory
    No Hardware Monitor found!!
    InitMBInfo: Bad file descriptor

    WGXepc -t reports good temps and then nothing but 255.
    Only way to fix this is to reboot.
    I'm assuming the SuperIO chip got itself hosed somehow. Is there any way to reset or reboot the chip?

    This is on two X-Core 550e boxes and the default SL6N7 Banias chips have been replaced with SL7EP Dothan chips according to https://doc.pfsense.org/index.php/PfSense_on_Watchguard_Firebox#Further_Enhancements_3.
    All dip switches set correctly. No other changes (powerd/speedstep) were made.


  • Netgate Administrator

    No, or at least I haven't heard from anyone who did. However reading back through the data sheet there appear to be a number of possible things we could try. Since the chip is just giving results that are registers full of all 1s we don't know if it's actually returning anything or if we're even talking to it properly. Though it would seem likely we are because under mbmon the voltage readings continue to come back as reasonable numbers.
    As a rather extreme option it looks like there is a register that can re-initialise the chip, back to it's power on defaults. However I've no way of knowing what registers are configured by the BIOS at boot so the results could be…. unpredictable.  ;)

    Steve



  • Thanks for responding, Steve. I posted a new topic because this one was old and I wasn't sure it was still active. I'm happy to stay in this one.

    This has happened on two boxes but I have another that works fine. I suppose it's possible that two of the Dothan chips I put in these boxes are creating this problem but it seem unlikely it would be two. I'm going to replace the replacement in one of them when another chip and wait for results.

    Is the data sheet you spoke of available electronically? Where can I get it?


  • Netgate Administrator

    Yes, it's available in many places such as here.

    Interacting with the chip manually for test purposes is a PITA.  ::) It involves writing many individual registers to read one value. For example:
    https://forum.pfsense.org/index.php?topic=43574.msg261279#msg261279
    The reset register will not require so much though. It would be interesting to install superiotool to see what has stopped and what is still readable when the chip enters it's uncooperative state. We may get a clue.

    Steve



  • Thanks, Steve.
    I replaced the chip and have been running and watching mbmon for an hour. No problems. I guess it's possible I had two bad chips with the same symptoms. If the problem shows up again, I'll try your idea with superiotool. Until then, I'm moving on.
    Thanks, again.


  • Netgate Administrator

    Hard to see how changing the CPU could make much difference. A change in the temperature sensor perhaps?
    How that would affect the fan control though.

    Steve



  • Beats the crap out of me. Been watching mbmon for 2 1/2 hours and still working great.
    I noticed that when I took out the other chip, I had smeared on a lot of paste (Arctic Silver). I was much more conservative with the new chip.
    Could that have anything to do with this?



  • Spoke too soon. It took 3 hours but mbmon started showing this:

    Temp.= 255.0,  0.0,  0.0; Rot.=    0,    0,    0
    Vcore = 4.08, 4.08; Volt. = 4.08, 6.85, 15.50,   6.07,  5.11
    

    I guess I'll look into resetting the chip as you suggested.


  • Netgate Administrator

    Are you running bios B8? If so you can also get the temperature via ACPI on the dashboard or sysctl. Does that fail also?

    Steve



  • Looks like I'm running B6. Temps not available through ACPI.
    The superio chip should be a Winbond, right? On other pictures of x550e boards, you can clearly see the Winbond logo. On this machine, the chip is covered with a Phoenix Technologies sticker. How can I determine what chip I have?



  • This is what I get if I cancel mbmon and restart it:

    [2.1.5-RELEASE]/root(50): mbmon -d -S
    SMBus[Intel8XX(ICH/ICH2/ICH3/ICH4/ICH5/ICH6)] found, but No HWM available on it!!
    InitMBInfo: Device not configured
    

  • Netgate Administrator

    They all have the Phoenix sticker on the Winbond chip, nothing unusual there. Superiotool will identify the chip.

    Steve



  • Ok, here's the superiotool dump before the problem:

    superiotool r4.0-2827-g1a00cf0
    Found Winbond W83627HF/F/HG/G (id=0x52, rev=0x41) at 0x2e
    Register dump:
    idx 02 20 21 22 23 24 25 26  28 29 2a 2b 2c 2e 2f
    val ff 52 41 ff fe c0 00 00  00 00 fc c4 ff 00 ff
    def 00 52 NA ff 00 MM 00 00  00 00 7c c0 00 00 00
    LDN 0x00 (Floppy)
    idx 30 60 61 70 74 f0 f1 f2  f4 f5
    val 00 00 00 00 04 0e 00 ff  00 00
    def 01 03 f0 06 02 0e 00 ff  00 00
    LDN 0x01 (Parallel port)
    idx 30 60 61 70 74 f0
    val 01 03 78 07 04 38
    def 01 03 78 07 04 3f
    LDN 0x02 (COM1)
    idx 30 60 61 70 f0
    val 01 03 f8 04 00
    def 01 03 f8 04 00
    LDN 0x03 (COM2)
    idx 30 60 61 70 f0 f1
    val 01 02 f8 03 00 00
    def 01 02 f8 03 00 00
    LDN 0x05 (Keyboard)
    idx 30 60 61 62 63 70 72 f0
    val 01 00 60 00 64 01 00 80
    def 01 00 60 00 64 01 0c 80
    LDN 0x06 (Consumer IR)
    idx 30 60 61 70
    val 00 00 00 00
    def 00 00 00 00
    LDN 0x07 (Game port, MIDI port, GPIO 1)
    idx 30 60 61 62 63 70 f0 f1  f2
    val 01 00 00 00 00 00 00 00  00
    def 00 02 01 03 30 09 ff 00  00
    LDN 0x08 (GPIO 2, watchdog timer)
    idx 30 f0 f1 f2 f3 f5 f6 f6  f7
    val 00 ff ff ff 00 00 00 00  00
    def 00 ff 00 00 00 00 00 00  00
    LDN 0x09 (GPIO 3)
    idx 30 f0 f1 f2 f3
    val 00 ff ff ff 00
    def 00 ff 00 00 00
    LDN 0x0a (ACPI)
    idx 30 70 e0 e1 e2 e3 e4 e5  e6 e7 f0 f1 f3 f4 f6 f7  f9 fe ff
    val 00 00 00 00 14 00 00 00  00 00 00 af 32 00 00 00  00 00 00
    def 00 00 00 00 NA NA 00 00  00 00 00 00 00 00 00 00  00 00 00
    LDN 0x0b (Hardware monitor)
    idx 30 60 61 70 f0
    val 01 02 90 00 00
    def 00 00 00 00 00
    

    After mbmon starts failing as obove, there is only one difference in the dump. A single byte in the hardware monitor:

    LDN 0x0b (Hardware monitor)
    idx 30 60 61 70 f0
    val 01 02 0b 00 00
    def 00 00 00 00 00
    

    Steve, how would you go about doing the reset you were talking about?

    EDIT: I assume the reset you mentioned was the initialization bit in the configuration register at 40h. I tried writing to this register with your readio/writeio tools but still got nothing back but FF. In fact, all reads are returning FF. Except for the results of the superiotool dump, I would think the chip had simply shut down.

    I can live without the temperature if I must. But I have no ability to change the fan speed now. I had it set up to automatically adjust the fan speed based on the CPU temp. Now I can neither determine the temp nor adjust the fans. The only way to fix this appears to be to reboot which is not acceptable.

    Would enabling speedstep or powerd make any difference?



  • EUREKA!

    Steve, you're gonna love this!

    After learning what the output from superiotool actually meant and reviewing the Winbond data sheet, I realized the only change in the two dumps was the hardware monitor base address changed from 0290h to 020Bh. I wrote a little script using your readio/writeio tools to reset the base address. I never expected this to work but lo' and behold it did! Everything started working again.

    I guess I should modify WGXepc to check for and correct this but I don't have a FreeBSD platform to do it on. Maybe I'll try and do it on one of my Fireboxes.


  • Netgate Administrator

    Nice work. That seems odd though. I'll have to read the data sheet myself unless you can enlighten me. How was anything able to be read if the base address had changed? Only the extended registers changed? Why did it change?  :-
    Like you say it should be relatively easy to check that and set it back.  :)

    Hmm, just wondering if the base address changed due to some other piece of hardware requiring access to that address space. I didn't think it could change except at boot on the ISA bus but really I don't know. Perhaps even our own script is trying to access the space twice causing the shift.
    It might be better to read the base address and just use it rather than trying to change it back.

    Steve



  • Nothing was able to read from the hardware monitor but everything else seems to have been okay. I've been doing some thinking about that and looking at the WGXepc code.

    As I said, the only difference in the registers was device 0x0b (Hardware Monitor) register 0x61 changed from 0x90 to 0x0b. I run a daemon that uses WGXepc to continually check the temp and adjust the fan speed accordingly. The basic script came from https://forum.pfsense.org/index.php?topic=66129.msg360358#msg360358. bigramon was doing the same thing which started this topic.

    I'm working on the theory that WGXepc is writing 0x0b to the 0x61 register for device 0x0b. I can see how this could happen if port_out(EFDR, 0x0b) was done after a call to get_w83627_addr_port() but it does not look like that is whats happening.

    Any thoughts?



  • I decided I didn't want to modify WGXepc. It's not my code and I really don't know where any repository for it is. I believe this problem is caused by a bug in it, though. There is one issue I saw: get_w83627_addr_port() leaves the chip in Extended Function mode. The result of subsequent port I/O might be unpredictable at times.

    What I did instead, Steve, was to modify my daemon script to check for a temp of 255 and then use your writeio tool to reset the base address for the hardware monitor. Not the most elegant solution but it works reliably, so far.

    For anyone interested in this solution, here is the relevant section of the script:

       cpu_temp=`/usr/local/bin/WGXepc -t | sed '1,2d'`
    
       # Temperature of 255 means Winbond chip hardware monitor base address
       # has been hosed. Use writeio to reset the address. This has only been
       # tested on the X550e.
       if [ $cpu_temp -ge 255 ]
       then
        /usr/local/bin/writeio 0x2e 0x87 > /dev/null # Put Winbond into 
        /usr/local/bin/writeio 0x2e 0x87 > /dev/null #   Extended Function mode.
        /usr/local/bin/writeio 0x2e 0x07 > /dev/null # Set logical device number
        /usr/local/bin/writeio 0x2f 0x0b > /dev/null #   to Hardware Monitor.
        /usr/local/bin/writeio 0x2e 0x60 > /dev/null # Reset  
        /usr/local/bin/writeio 0x2f 0x02 > /dev/null #   monitor base
        /usr/local/bin/writeio 0x2e 0x61 > /dev/null #   address to
        /usr/local/bin/writeio 0x2f 0x90 > /dev/null #   0290h.
        /usr/local/bin/writeio 0x2e 0xaa > /dev/null # Exit Extended Function mode.
        continue
      fi
    
    

    The writeio tool you can get from Steve's google site. There is a link to it earlier in this topic.

    Thanks again for your help, Steve.


Log in to reply