Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    WG x750e - automatic speed adjustment: mbmon going crazy

    Scheduled Pinned Locked Moved Hardware
    53 Posts 4 Posters 13.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      bigramon
      last edited by

      Something gets wrong after a short while:

      • WGXepc does not properly set the fan speed anymore
      • WGXepc does  not report the proper temperature anymore
      • mbmon does not execute properly anymore.
      
      $ /usr/local/bin/WGXepc -f 50
      Found Firebox X-E
      Fanspeed set to 50
      
      $ /usr/local/bin/WGXepc -f
      Found Firebox X-E
      Fanspeed is ff
      
      $ /usr/local/bin/WGXepc -t
      Found Firebox X-E
      SuperIO sensor 2 reads:
      255
      
      $ /usr/local/bin/mbmon -I -i -c1 -T2
      No ISA-IO HWM available!!
      InitMBInfo: Unknown error: 0
      
      

      At this stage, I guess the best option would be to modify the WGXepc source code to allow it to keep executing in the background and set the fan speed depending on the temperature that is read.

      Damien

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Ok, but was your script using mbmon or WGXepc to read the temperature? The difference between them is that mbmon reads reads a whole load of values every time it's run even if you only need one. To get all those values requires setting the SuperIO chip in various modes. It's possible that under certain conditions mbmon leaves the superio chip in some error state. It may be possible to determine what the error state is and recover from it or to avoid it in the first place.

        Steve

        1 Reply Last reply Reply Quote 0
        • B
          bigramon
          last edited by

          My script was using only WGXepc.

          Damien

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Damn!
            Well the only thing to do then is try to find out why the SuperIO chip is no longer responding usefully. Quite how to do that though….  ::)

            Steve

            1 Reply Last reply Reply Quote 0
            • S
              sthames42
              last edited by

              Did you guys ever resolve this? I have exactly the same problem. Running mbmon returns reasonable temps for a while and suddenly starts reporting this:

              Temp.= 255.0,  0.0,  0.0; Rot.=    0,    0,    0
              Vcore = 4.08, 4.08; Volt. = 4.08, 6.85, 15.50,  6.07,  5.11

              When I cancel mbmon and run it again, I get:

              ioctl(smb0:open): No such file or directory
              No Hardware Monitor found!!
              InitMBInfo: Bad file descriptor

              WGXepc -t reports good temps and then nothing but 255.
              Only way to fix this is to reboot.
              I'm assuming the SuperIO chip got itself hosed somehow. Is there any way to reset or reboot the chip?

              This is on two X-Core 550e boxes and the default SL6N7 Banias chips have been replaced with SL7EP Dothan chips according to https://doc.pfsense.org/index.php/PfSense_on_Watchguard_Firebox#Further_Enhancements_3.
              All dip switches set correctly. No other changes (powerd/speedstep) were made.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                No, or at least I haven't heard from anyone who did. However reading back through the data sheet there appear to be a number of possible things we could try. Since the chip is just giving results that are registers full of all 1s we don't know if it's actually returning anything or if we're even talking to it properly. Though it would seem likely we are because under mbmon the voltage readings continue to come back as reasonable numbers.
                As a rather extreme option it looks like there is a register that can re-initialise the chip, back to it's power on defaults. However I've no way of knowing what registers are configured by the BIOS at boot so the results could be…. unpredictable.  ;)

                Steve

                1 Reply Last reply Reply Quote 0
                • S
                  sthames42
                  last edited by

                  Thanks for responding, Steve. I posted a new topic because this one was old and I wasn't sure it was still active. I'm happy to stay in this one.

                  This has happened on two boxes but I have another that works fine. I suppose it's possible that two of the Dothan chips I put in these boxes are creating this problem but it seem unlikely it would be two. I'm going to replace the replacement in one of them when another chip and wait for results.

                  Is the data sheet you spoke of available electronically? Where can I get it?

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Yes, it's available in many places such as here.

                    Interacting with the chip manually for test purposes is a PITA.  ::) It involves writing many individual registers to read one value. For example:
                    https://forum.pfsense.org/index.php?topic=43574.msg261279#msg261279
                    The reset register will not require so much though. It would be interesting to install superiotool to see what has stopped and what is still readable when the chip enters it's uncooperative state. We may get a clue.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • S
                      sthames42
                      last edited by

                      Thanks, Steve.
                      I replaced the chip and have been running and watching mbmon for an hour. No problems. I guess it's possible I had two bad chips with the same symptoms. If the problem shows up again, I'll try your idea with superiotool. Until then, I'm moving on.
                      Thanks, again.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Hard to see how changing the CPU could make much difference. A change in the temperature sensor perhaps?
                        How that would affect the fan control though.

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • S
                          sthames42
                          last edited by

                          Beats the crap out of me. Been watching mbmon for 2 1/2 hours and still working great.
                          I noticed that when I took out the other chip, I had smeared on a lot of paste (Arctic Silver). I was much more conservative with the new chip.
                          Could that have anything to do with this?

                          1 Reply Last reply Reply Quote 0
                          • S
                            sthames42
                            last edited by

                            Spoke too soon. It took 3 hours but mbmon started showing this:

                            Temp.= 255.0,  0.0,  0.0; Rot.=    0,    0,    0
                            Vcore = 4.08, 4.08; Volt. = 4.08, 6.85, 15.50,   6.07,  5.11
                            

                            I guess I'll look into resetting the chip as you suggested.

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Are you running bios B8? If so you can also get the temperature via ACPI on the dashboard or sysctl. Does that fail also?

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • S
                                sthames42
                                last edited by

                                Looks like I'm running B6. Temps not available through ACPI.
                                The superio chip should be a Winbond, right? On other pictures of x550e boards, you can clearly see the Winbond logo. On this machine, the chip is covered with a Phoenix Technologies sticker. How can I determine what chip I have?

                                1 Reply Last reply Reply Quote 0
                                • S
                                  sthames42
                                  last edited by

                                  This is what I get if I cancel mbmon and restart it:

                                  [2.1.5-RELEASE]/root(50): mbmon -d -S
                                  SMBus[Intel8XX(ICH/ICH2/ICH3/ICH4/ICH5/ICH6)] found, but No HWM available on it!!
                                  InitMBInfo: Device not configured
                                  
                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    They all have the Phoenix sticker on the Winbond chip, nothing unusual there. Superiotool will identify the chip.

                                    Steve

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      sthames42
                                      last edited by

                                      Ok, here's the superiotool dump before the problem:

                                      superiotool r4.0-2827-g1a00cf0
                                      Found Winbond W83627HF/F/HG/G (id=0x52, rev=0x41) at 0x2e
                                      Register dump:
                                      idx 02 20 21 22 23 24 25 26  28 29 2a 2b 2c 2e 2f
                                      val ff 52 41 ff fe c0 00 00  00 00 fc c4 ff 00 ff
                                      def 00 52 NA ff 00 MM 00 00  00 00 7c c0 00 00 00
                                      LDN 0x00 (Floppy)
                                      idx 30 60 61 70 74 f0 f1 f2  f4 f5
                                      val 00 00 00 00 04 0e 00 ff  00 00
                                      def 01 03 f0 06 02 0e 00 ff  00 00
                                      LDN 0x01 (Parallel port)
                                      idx 30 60 61 70 74 f0
                                      val 01 03 78 07 04 38
                                      def 01 03 78 07 04 3f
                                      LDN 0x02 (COM1)
                                      idx 30 60 61 70 f0
                                      val 01 03 f8 04 00
                                      def 01 03 f8 04 00
                                      LDN 0x03 (COM2)
                                      idx 30 60 61 70 f0 f1
                                      val 01 02 f8 03 00 00
                                      def 01 02 f8 03 00 00
                                      LDN 0x05 (Keyboard)
                                      idx 30 60 61 62 63 70 72 f0
                                      val 01 00 60 00 64 01 00 80
                                      def 01 00 60 00 64 01 0c 80
                                      LDN 0x06 (Consumer IR)
                                      idx 30 60 61 70
                                      val 00 00 00 00
                                      def 00 00 00 00
                                      LDN 0x07 (Game port, MIDI port, GPIO 1)
                                      idx 30 60 61 62 63 70 f0 f1  f2
                                      val 01 00 00 00 00 00 00 00  00
                                      def 00 02 01 03 30 09 ff 00  00
                                      LDN 0x08 (GPIO 2, watchdog timer)
                                      idx 30 f0 f1 f2 f3 f5 f6 f6  f7
                                      val 00 ff ff ff 00 00 00 00  00
                                      def 00 ff 00 00 00 00 00 00  00
                                      LDN 0x09 (GPIO 3)
                                      idx 30 f0 f1 f2 f3
                                      val 00 ff ff ff 00
                                      def 00 ff 00 00 00
                                      LDN 0x0a (ACPI)
                                      idx 30 70 e0 e1 e2 e3 e4 e5  e6 e7 f0 f1 f3 f4 f6 f7  f9 fe ff
                                      val 00 00 00 00 14 00 00 00  00 00 00 af 32 00 00 00  00 00 00
                                      def 00 00 00 00 NA NA 00 00  00 00 00 00 00 00 00 00  00 00 00
                                      LDN 0x0b (Hardware monitor)
                                      idx 30 60 61 70 f0
                                      val 01 02 90 00 00
                                      def 00 00 00 00 00
                                      

                                      After mbmon starts failing as obove, there is only one difference in the dump. A single byte in the hardware monitor:

                                      LDN 0x0b (Hardware monitor)
                                      idx 30 60 61 70 f0
                                      val 01 02 0b 00 00
                                      def 00 00 00 00 00
                                      

                                      Steve, how would you go about doing the reset you were talking about?

                                      EDIT: I assume the reset you mentioned was the initialization bit in the configuration register at 40h. I tried writing to this register with your readio/writeio tools but still got nothing back but FF. In fact, all reads are returning FF. Except for the results of the superiotool dump, I would think the chip had simply shut down.

                                      I can live without the temperature if I must. But I have no ability to change the fan speed now. I had it set up to automatically adjust the fan speed based on the CPU temp. Now I can neither determine the temp nor adjust the fans. The only way to fix this appears to be to reboot which is not acceptable.

                                      Would enabling speedstep or powerd make any difference?

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        sthames42
                                        last edited by

                                        EUREKA!

                                        Steve, you're gonna love this!

                                        After learning what the output from superiotool actually meant and reviewing the Winbond data sheet, I realized the only change in the two dumps was the hardware monitor base address changed from 0290h to 020Bh. I wrote a little script using your readio/writeio tools to reset the base address. I never expected this to work but lo' and behold it did! Everything started working again.

                                        I guess I should modify WGXepc to check for and correct this but I don't have a FreeBSD platform to do it on. Maybe I'll try and do it on one of my Fireboxes.

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Nice work. That seems odd though. I'll have to read the data sheet myself unless you can enlighten me. How was anything able to be read if the base address had changed? Only the extended registers changed? Why did it change?  :-
                                          Like you say it should be relatively easy to check that and set it back.  :)

                                          Hmm, just wondering if the base address changed due to some other piece of hardware requiring access to that address space. I didn't think it could change except at boot on the ISA bus but really I don't know. Perhaps even our own script is trying to access the space twice causing the shift.
                                          It might be better to read the base address and just use it rather than trying to change it back.

                                          Steve

                                          1 Reply Last reply Reply Quote 0
                                          • S
                                            sthames42
                                            last edited by

                                            Nothing was able to read from the hardware monitor but everything else seems to have been okay. I've been doing some thinking about that and looking at the WGXepc code.

                                            As I said, the only difference in the registers was device 0x0b (Hardware Monitor) register 0x61 changed from 0x90 to 0x0b. I run a daemon that uses WGXepc to continually check the temp and adjust the fan speed accordingly. The basic script came from https://forum.pfsense.org/index.php?topic=66129.msg360358#msg360358. bigramon was doing the same thing which started this topic.

                                            I'm working on the theory that WGXepc is writing 0x0b to the 0x61 register for device 0x0b. I can see how this could happen if port_out(EFDR, 0x0b) was done after a call to get_w83627_addr_port() but it does not look like that is whats happening.

                                            Any thoughts?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.