Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    WG x750e - automatic speed adjustment: mbmon going crazy

    Scheduled Pinned Locked Moved Hardware
    53 Posts 4 Posters 13.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      @bigramon:

      Maybe the multiple register manipulations helped  ???

      Plausible but sort of disappointing. Nothing to say it won't happen again or start doing something different.

      Steve

      1 Reply Last reply Reply Quote 0
      • B
        bigramon
        last edited by

        Well, time will tell. At some point, I will reboot the x750e and with the level of logging I have in place now, it won't escape me.

        Damien

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          I added the temperature reading to WGXepc so you can try that if mbmon starts acting up again. Though you'd have to parse it's output for the value in your script somehow.
          http://forum.pfsense.org/index.php/topic,32013.msg346092.html#msg346092

          Steve

          1 Reply Last reply Reply Quote 0
          • B
            bigramon
            last edited by

            This time, mbmon started producing bogus data (CPU temp around 9°C) after 24 days of uptime.
            Unfortunately, the new version of WGXepc provides the same measure.

            Let me know if you want me to try something else, otherwise, I'll just reboot the machine.

            Damien

            1 Reply Last reply Reply Quote 0
            • K
              kejianshi
              last edited by

              So, is auto fan speed control daemon showing up in the next release as an option click-to-enable?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                This script is specifically for the Watchguard firebox X-e boxes so it's very unlikely to be in a pfSense realease.  Even if it were more generic it probably wouldn't ever be included as more than a package. Any script that can control the fan speeds in a box has the potential to cause damage by overheating the CPU. Usually in something with thermal fan control, like a laptop, the control is handled directly by the SuperIO chip such that it will continue to cool correctly even if the OS crashes. Unfortunately the superio chip in the X-e box doesn't support this.

                @bigramon
                So reading the temperature with WGXepc gives the same value as mbmon. That implies the SuperIO chip is actually reporting the wrong value. Why could that be? It could be the temperature offset register has been set some how or that it's reading the wrong register for some reason. We could try investigating that but it will be quite involved. You could try using WGXepc in your script instead. Since it only reads one register (where as mbmon reads all registers everytime it's run) it may make a difference.

                Steve

                1 Reply Last reply Reply Quote 0
                • B
                  bigramon
                  last edited by

                  Unfortunately, it did not work much better. After 2 weeks, I started getting temperatures of 255°C.
                  When I tried mbmon from the command line, it reported an error stating it could not access the hardware.
                  Everything went back to normal after a reboot.

                  Damien

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Ah, so what were you using in the script that didn't work? I forget where we left off.  ::)

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • B
                      bigramon
                      last edited by

                      Something gets wrong after a short while:

                      • WGXepc does not properly set the fan speed anymore
                      • WGXepc does  not report the proper temperature anymore
                      • mbmon does not execute properly anymore.
                      
                      $ /usr/local/bin/WGXepc -f 50
                      Found Firebox X-E
                      Fanspeed set to 50
                      
                      $ /usr/local/bin/WGXepc -f
                      Found Firebox X-E
                      Fanspeed is ff
                      
                      $ /usr/local/bin/WGXepc -t
                      Found Firebox X-E
                      SuperIO sensor 2 reads:
                      255
                      
                      $ /usr/local/bin/mbmon -I -i -c1 -T2
                      No ISA-IO HWM available!!
                      InitMBInfo: Unknown error: 0
                      
                      

                      At this stage, I guess the best option would be to modify the WGXepc source code to allow it to keep executing in the background and set the fan speed depending on the temperature that is read.

                      Damien

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Ok, but was your script using mbmon or WGXepc to read the temperature? The difference between them is that mbmon reads reads a whole load of values every time it's run even if you only need one. To get all those values requires setting the SuperIO chip in various modes. It's possible that under certain conditions mbmon leaves the superio chip in some error state. It may be possible to determine what the error state is and recover from it or to avoid it in the first place.

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • B
                          bigramon
                          last edited by

                          My script was using only WGXepc.

                          Damien

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Damn!
                            Well the only thing to do then is try to find out why the SuperIO chip is no longer responding usefully. Quite how to do that though….  ::)

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • S
                              sthames42
                              last edited by

                              Did you guys ever resolve this? I have exactly the same problem. Running mbmon returns reasonable temps for a while and suddenly starts reporting this:

                              Temp.= 255.0,  0.0,  0.0; Rot.=    0,    0,    0
                              Vcore = 4.08, 4.08; Volt. = 4.08, 6.85, 15.50,  6.07,  5.11

                              When I cancel mbmon and run it again, I get:

                              ioctl(smb0:open): No such file or directory
                              No Hardware Monitor found!!
                              InitMBInfo: Bad file descriptor

                              WGXepc -t reports good temps and then nothing but 255.
                              Only way to fix this is to reboot.
                              I'm assuming the SuperIO chip got itself hosed somehow. Is there any way to reset or reboot the chip?

                              This is on two X-Core 550e boxes and the default SL6N7 Banias chips have been replaced with SL7EP Dothan chips according to https://doc.pfsense.org/index.php/PfSense_on_Watchguard_Firebox#Further_Enhancements_3.
                              All dip switches set correctly. No other changes (powerd/speedstep) were made.

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                No, or at least I haven't heard from anyone who did. However reading back through the data sheet there appear to be a number of possible things we could try. Since the chip is just giving results that are registers full of all 1s we don't know if it's actually returning anything or if we're even talking to it properly. Though it would seem likely we are because under mbmon the voltage readings continue to come back as reasonable numbers.
                                As a rather extreme option it looks like there is a register that can re-initialise the chip, back to it's power on defaults. However I've no way of knowing what registers are configured by the BIOS at boot so the results could be…. unpredictable.  ;)

                                Steve

                                1 Reply Last reply Reply Quote 0
                                • S
                                  sthames42
                                  last edited by

                                  Thanks for responding, Steve. I posted a new topic because this one was old and I wasn't sure it was still active. I'm happy to stay in this one.

                                  This has happened on two boxes but I have another that works fine. I suppose it's possible that two of the Dothan chips I put in these boxes are creating this problem but it seem unlikely it would be two. I'm going to replace the replacement in one of them when another chip and wait for results.

                                  Is the data sheet you spoke of available electronically? Where can I get it?

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Yes, it's available in many places such as here.

                                    Interacting with the chip manually for test purposes is a PITA.  ::) It involves writing many individual registers to read one value. For example:
                                    https://forum.pfsense.org/index.php?topic=43574.msg261279#msg261279
                                    The reset register will not require so much though. It would be interesting to install superiotool to see what has stopped and what is still readable when the chip enters it's uncooperative state. We may get a clue.

                                    Steve

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      sthames42
                                      last edited by

                                      Thanks, Steve.
                                      I replaced the chip and have been running and watching mbmon for an hour. No problems. I guess it's possible I had two bad chips with the same symptoms. If the problem shows up again, I'll try your idea with superiotool. Until then, I'm moving on.
                                      Thanks, again.

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        Hard to see how changing the CPU could make much difference. A change in the temperature sensor perhaps?
                                        How that would affect the fan control though.

                                        Steve

                                        1 Reply Last reply Reply Quote 0
                                        • S
                                          sthames42
                                          last edited by

                                          Beats the crap out of me. Been watching mbmon for 2 1/2 hours and still working great.
                                          I noticed that when I took out the other chip, I had smeared on a lot of paste (Arctic Silver). I was much more conservative with the new chip.
                                          Could that have anything to do with this?

                                          1 Reply Last reply Reply Quote 0
                                          • S
                                            sthames42
                                            last edited by

                                            Spoke too soon. It took 3 hours but mbmon started showing this:

                                            Temp.= 255.0,  0.0,  0.0; Rot.=    0,    0,    0
                                            Vcore = 4.08, 4.08; Volt. = 4.08, 6.85, 15.50,   6.07,  5.11
                                            

                                            I guess I'll look into resetting the chip as you suggested.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.