Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    WG x750e - automatic speed adjustment: mbmon going crazy

    Scheduled Pinned Locked Moved Hardware
    53 Posts 4 Posters 13.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      bigramon
      last edited by

      Adding the extended dump on my x750e only outputs one more useless line:
      Hardware monitor (0x0295)
      I guess I'll have to stick to a fixed fan speed and forget about having it automated.

      Damien

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Hmm, that's no good then. Don't give up yet automatic fan speed control would be great.  :)
        Let me fire up my X750e and see if I can replicate your results.

        Steve

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Ok, I checked my box here and yes you're right superiotool does not show the relevant info for some reason. It works fine with some other chips.
          However it is the SuperIO chip that mbmon is reading and it is possible to read it directly but it's not that straight forward.

          To read the values from the chips you set it's index register to required value and then read the value from the data register. In this case (and almost every case) those are I/O space 0x295 and 0x296 respectively. To do this when I was initially looking for the LED control I "wrote" (I use that term very loosely!) two programs, readio and writeio, which can be found at my Google site. These can be used to write to any address in I/O space so can obviously cause problems. There is no sanity checking or anything, you've been warned!  ;)

          Using those we can read values. So for example the 8bit temperature value from sensor 1 is at control register CR27 (the one value SuperIOtool skips past  :-\ ):

          [2.0.3-RELEASE][root@testbox.localdomain]/conf(347): ./writeio 0x295 0x27
          Setting 295 to 27
          [2.0.3-RELEASE][root@testbox.localdomain]/conf(348): ./readio 0x296
          Reading 296 :19
          
          

          It's hex value 0x19, 25 degrees C. We know that's correct from mbmon.
          Unfortunately sensor 1 is almost certainly not connected. Sensor 2 is the only one returning useful data (though I did play with sensor 3 a bit with interesting results). The data values for Sensor 2 are in 'Bank 1' which can only be accessed by first setting the 'bank select' register, 0x4e, to 1. So:

          [2.0.3-RELEASE][root@testbox.localdomain]/conf(349): ./writeio 0x295 0x4e
          Setting 295 to 4e
          [2.0.3-RELEASE][root@testbox.localdomain]/conf(350): ./writeio 0x296 0x01
          Setting 296 to 1
          [2.0.3-RELEASE][root@testbox.localdomain]/conf(351): ./writeio 0x295 0x50
          Setting 295 to 50
          [2.0.3-RELEASE][root@testbox.localdomain]/conf(352): ./readio 0x296
          Reading 296 :28
          
          

          We finally get the value 0x28 which is 40C. About what we'd expect. In fact the value is a 9 bit number with the 9th bit offering 0.5C accuracy but these 8MSBs are probably fine here (also it's stored as 2s compliment so if the numbers go negative things get complicated.  ;)).

          If you don't set the bank select back to 0 mbmon will get confused the first time you run it but runs fine second time around.

          It would be interesting to give this a go when mbmon has freaked out to see if the SuperIO chip itself is still giving useful data. It would be easy enough to incorporate this into WGXepc if it does.

          Steve

          1 Reply Last reply Reply Quote 0
          • B
            bigramon
            last edited by

            I'm not too sure why you set register 0x295 to 0x50 once 0x296 has been properly set to 0x01 ?

            Nevertheless, I could confirm that executing these four commands returned the same value than mbmon, both when CPU is idle and supercharged with burnP6.
            I restarted my automated script and will patiently wait for it to go crazy again :P

            Thanks for your detailed explanation and time!

            Damien

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              @bigramon:

              I'm not too sure why you set register 0x295 to 0x50 once 0x296 has been properly set to 0x01 ?

              Because in order to read the actual temperature value, which is at register 50 on Bank1, you need to point the index register, 0x295, at it. It was previously still pointed to 0x4e, the bank select register.

              The first time I read through the data sheet and tried to comprehend how this all worked my mind nearly melted.  :D It still took a while the second time around. One thing to be aware of is that in order to save processing time/instructions many of the registers automatically increment the index register. This helps if you want to read in a series of values, say 0x20 to 0x2f, you just set the index to 0x20 and then read the data register. That automatically increments the index by 1 so now you just read the data register again and it will be the value of 0x21 etc. This means that if you set the Bank to 1, set the index to 0x50 and read the value at the data register you cannot just read the data again to see it has changed, the index register will now be pointing at 0x51. You have to set the index to 0x50 every time.

              It's interesting that mbmon expects to find the bank select register at 0 and when it's not it has a hard time, perhaps a clue? Something else intereacting with the SuperIO chip is leaving it in a state that confuses mbmon perhaps.

              As an aside I notice your values for sensor 3 are different from mine. is 67C a common reading? My own shows mostly 127 with occasional 0 and even more occasional 2.0. Have you changed the CPU? Interestingly reconfiguring it as a thermistor instead of a thermal diode produced much more reasonable values though still clearly not real temperatures.

              Steve

              1 Reply Last reply Reply Quote 0
              • B
                bigramon
                last edited by

                Ah, it's (somewhat) clearer now  :)

                I did not have a look at the mbmon code yet so I do not know why it behaves like this.
                I can't wait though to see whether your trick with the registers will still work when mbmon doesn't anymore !

                I indeed invested massively ($10 including shipping) to replace the original Celeron 1.3GHz:
                Intel(R) Pentium(R) M processor 2.00GHz
                Current: 225 MHz, Max: 1500 MHz

                More horsepower and less electricity/heat  ;D

                Damien

                1 Reply Last reply Reply Quote 0
                • B
                  bigramon
                  last edited by

                  I added some detailed logging to understand what is going one and after 11 days, mbmon was only sporadically returning incoherent values (around once a day whereas measures are made every 10 seconds).
                  Maybe the multiple register manipulations helped  ???

                  At this stage, I'd say the automatic fan speed adjustment is working !

                  Thanks Steve for your input  ;D

                  Damien

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    @bigramon:

                    Maybe the multiple register manipulations helped  ???

                    Plausible but sort of disappointing. Nothing to say it won't happen again or start doing something different.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • B
                      bigramon
                      last edited by

                      Well, time will tell. At some point, I will reboot the x750e and with the level of logging I have in place now, it won't escape me.

                      Damien

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        I added the temperature reading to WGXepc so you can try that if mbmon starts acting up again. Though you'd have to parse it's output for the value in your script somehow.
                        http://forum.pfsense.org/index.php/topic,32013.msg346092.html#msg346092

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • B
                          bigramon
                          last edited by

                          This time, mbmon started producing bogus data (CPU temp around 9°C) after 24 days of uptime.
                          Unfortunately, the new version of WGXepc provides the same measure.

                          Let me know if you want me to try something else, otherwise, I'll just reboot the machine.

                          Damien

                          1 Reply Last reply Reply Quote 0
                          • K
                            kejianshi
                            last edited by

                            So, is auto fan speed control daemon showing up in the next release as an option click-to-enable?

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              This script is specifically for the Watchguard firebox X-e boxes so it's very unlikely to be in a pfSense realease.  Even if it were more generic it probably wouldn't ever be included as more than a package. Any script that can control the fan speeds in a box has the potential to cause damage by overheating the CPU. Usually in something with thermal fan control, like a laptop, the control is handled directly by the SuperIO chip such that it will continue to cool correctly even if the OS crashes. Unfortunately the superio chip in the X-e box doesn't support this.

                              @bigramon
                              So reading the temperature with WGXepc gives the same value as mbmon. That implies the SuperIO chip is actually reporting the wrong value. Why could that be? It could be the temperature offset register has been set some how or that it's reading the wrong register for some reason. We could try investigating that but it will be quite involved. You could try using WGXepc in your script instead. Since it only reads one register (where as mbmon reads all registers everytime it's run) it may make a difference.

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • B
                                bigramon
                                last edited by

                                Unfortunately, it did not work much better. After 2 weeks, I started getting temperatures of 255°C.
                                When I tried mbmon from the command line, it reported an error stating it could not access the hardware.
                                Everything went back to normal after a reboot.

                                Damien

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Ah, so what were you using in the script that didn't work? I forget where we left off.  ::)

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • B
                                    bigramon
                                    last edited by

                                    Something gets wrong after a short while:

                                    • WGXepc does not properly set the fan speed anymore
                                    • WGXepc does  not report the proper temperature anymore
                                    • mbmon does not execute properly anymore.
                                    
                                    $ /usr/local/bin/WGXepc -f 50
                                    Found Firebox X-E
                                    Fanspeed set to 50
                                    
                                    $ /usr/local/bin/WGXepc -f
                                    Found Firebox X-E
                                    Fanspeed is ff
                                    
                                    $ /usr/local/bin/WGXepc -t
                                    Found Firebox X-E
                                    SuperIO sensor 2 reads:
                                    255
                                    
                                    $ /usr/local/bin/mbmon -I -i -c1 -T2
                                    No ISA-IO HWM available!!
                                    InitMBInfo: Unknown error: 0
                                    
                                    

                                    At this stage, I guess the best option would be to modify the WGXepc source code to allow it to keep executing in the background and set the fan speed depending on the temperature that is read.

                                    Damien

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Ok, but was your script using mbmon or WGXepc to read the temperature? The difference between them is that mbmon reads reads a whole load of values every time it's run even if you only need one. To get all those values requires setting the SuperIO chip in various modes. It's possible that under certain conditions mbmon leaves the superio chip in some error state. It may be possible to determine what the error state is and recover from it or to avoid it in the first place.

                                      Steve

                                      1 Reply Last reply Reply Quote 0
                                      • B
                                        bigramon
                                        last edited by

                                        My script was using only WGXepc.

                                        Damien

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Damn!
                                          Well the only thing to do then is try to find out why the SuperIO chip is no longer responding usefully. Quite how to do that though….  ::)

                                          Steve

                                          1 Reply Last reply Reply Quote 0
                                          • S
                                            sthames42
                                            last edited by

                                            Did you guys ever resolve this? I have exactly the same problem. Running mbmon returns reasonable temps for a while and suddenly starts reporting this:

                                            Temp.= 255.0,  0.0,  0.0; Rot.=    0,    0,    0
                                            Vcore = 4.08, 4.08; Volt. = 4.08, 6.85, 15.50,  6.07,  5.11

                                            When I cancel mbmon and run it again, I get:

                                            ioctl(smb0:open): No such file or directory
                                            No Hardware Monitor found!!
                                            InitMBInfo: Bad file descriptor

                                            WGXepc -t reports good temps and then nothing but 255.
                                            Only way to fix this is to reboot.
                                            I'm assuming the SuperIO chip got itself hosed somehow. Is there any way to reset or reboot the chip?

                                            This is on two X-Core 550e boxes and the default SL6N7 Banias chips have been replaced with SL7EP Dothan chips according to https://doc.pfsense.org/index.php/PfSense_on_Watchguard_Firebox#Further_Enhancements_3.
                                            All dip switches set correctly. No other changes (powerd/speedstep) were made.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.