Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    NUT suddenly stops working every app. 6 minutes

    Scheduled Pinned Locked Moved UPS Tools
    29 Posts 3 Posters 782 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      j.koopmann
      last edited by

      Hi,

      with the exception of updating to 24.11-RELEASE from the 23.x version I have not changed anything substantial in my setup. However all of the sudden UPS/nut keeps failling:

      After a restart all is fine.

      Can't connect to UPS [Keller] (usbhid-ups-Keller): No such file or directory
      Found 1 UPS defined in ups.conf
      User local-monitor@127.0.0.1 logged into UPS [Keller]
      Poll UPS [Keller] failed - Driver not connected
      Communications with UPS Keller lost
      Connected to UPS [Keller]: usbhid-ups-Keller
      Communications with UPS Keller established
      

      The UPS is a usb connected Eaton 9SX 3000i.

      This setup worked flawlessly for months so my main suspicion is that the update triggered a bug. Is anything known and maybe is there a patch?

      Regards
      JP

      GertjanG dennypageD 2 Replies Last reply Reply Quote 0
      • GertjanG
        Gertjan @j.koopmann
        last edited by

        @j-koopmann

        Check if the UPS is actually connected.
        Console or command line :
        Type

        usbconfig
        

        You saw it ?

        This means (to me) :

        @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

        Can't connect to UPS [Keller] (usbhid-ups-Keller): No such file or directory

        that de USB wasn't known to the OS - so not attached (if attached : bad cable, bad USB host, flaky UPS firmware, etc).

        pfSense packages don't receive 'official' patches.
        If needed, they are modified by their creator, and marked for 'to be upgraded'.
        That said, if you find a patch (redmine / github) you can copy paste a patch yourself.
        This is valid for the GUI part of the package.
        Binaries (executable) can't be patched.

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        J 1 Reply Last reply Reply Quote 0
        • J
          j.koopmann @Gertjan
          last edited by

          @Gertjan Thanks.

          I have to correct myself: It is not 6 minutes. But it is 1-2 days after restart of NUT. I will look for the correct log entries at the next failure.

          If it stops working simply restarting NUT fixes it. I don't touch the hardware. So a hardware (cable disconnected) or general USB-cannot-be-found failure is sort of out of the question if a simple service restart makes it happy again for the next hours to days.

          I can check the USB cable but again: It is not as if someone is touching the devices during or near the failure and it started with pfsense update. Not sure if the nut version was also updated between 23.x and 24.x.

          Will change cable once it happens again to see if this is related.

          GertjanG 1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan @j.koopmann
            last edited by

            @j-koopmann

            I'm pretty sure no one disconnected your cable for a moment 😊
            I use myself an UPS on a 4100 :

            [25.03-BETA][root@pfSense.bhf.tld]root: usbconfig
            ....
            ugen0.2: <Uninterruptible Power Supply American Power Conversion> at usbus0, cfg=0 md=HOST spd=LOW (1.5Mbps) pwr=ON (24mA)
            

            and its petty rock solid.
            That is, as long as I change the battery, normally every 2 years (a bit more).

            I'm using the same NUT version and underlying upsmon software as you, and it plays well.

            When you see this :

            @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

            Poll UPS [Keller] failed - Driver not connected
            Communications with UPS Keller lost
            Connected to UPS [Keller]: usbhid-ups-Keller
            Communications with UPS Keller established

            the USB hardware, on the pfSense side, or the UPS side disconnected the logical connection.
            Like something glitched somewhere.
            Like a NIC : you'll see the link down and link up event.

            My not so SPECIAL NUT settings are :

            4c29f953-7618-4a38-9142-c70d8bc5efc6-image.png

            Btw : If possible : test with another UPS. Pretty sure the issue goes away ^^
            I can advise you "APC" (not made in America of course, it was branded in the States ^^)

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            J 1 Reply Last reply Reply Quote 0
            • J
              j.koopmann @Gertjan
              last edited by

              @Gertjan I am not happy buying a new UPS if the old one is working and has been doing so flawlessly for the past months. :-) Maybe I will try a different cable and USB port. Who knows...

              GertjanG 1 Reply Last reply Reply Quote 0
              • GertjanG
                Gertjan @j.koopmann
                last edited by Gertjan

                @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                I am not happy buying a new UPS if the old one is working and has been doing so flawlessly for the past months.

                Not forcing you to get out and buy something just for testing.

                An UPS is a security device, and I was presuming you have more to protect a just pfSense.
                So : you have at least 2 UPS, so you can do : the exchange trick, which is perfect to rules things out.

                Btw : UPS batteries, even when they are not solicited often, do die - give them 3 years, and they switched roles : instead of being a security device in your house, they will set your house in fire (batteries overheating : even the fire squads run away from it).

                Again : USB connection (electrical) are not perfect.
                UPSMon (NUT) : as we use the same version, identical at a bit level, only our settings differ. For example : my USB driver is probably not the same as yours. You use the same USB driver as I do :

                d7eb77a5-e907-440d-baf8-d9d9041a3577-image.png

                see here : https://networkupstools.org/stable-hcl.html

                The USB chip used on the UPS side .... we'll never know,

                And maybe the internal power of your UPS is 'flaky'. (Capacitors go bad also ...)

                Normally, I change my UPSs after the second battery change, if they didn't go bad before.

                No "help me" PM's please. Use the forum, the community will thank you.
                Edit : and where are the logs ??

                J 1 Reply Last reply Reply Quote 0
                • J
                  j.koopmann @Gertjan
                  last edited by

                  @Gertjan

                  @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                  Not forcing you to get out and buy something just for testing.

                  I know. :-)

                  @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                  An UPS is a security device, and I was presuming you have more to protect a just pfSense.

                  Yes one server and a switch and other equipment but for a home installation only hence only one UPS. Moreover the entire house is on a big PV driven battery so the UPS only will have to cover for a few minutes. Not worth buying two...

                  @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                  Btw : UPS batteries, even when they are not solicited often, do die - give them 3 years,

                  I am aware. But the old UPS is only appr. 12 months old. :-) So I am not going to buy y new one just to rule this one out. Worst case: It is not working or I have to restart nut every x hours. --> Home installation.

                  @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                  Normally, I change my UPSs after the second battery change, if they didn't go bad before.

                  Agreed. Will do that in x years. :-)

                  Thanks for your help. Let me change cables and USB port and see if something changes.

                  1 Reply Last reply Reply Quote 0
                  • dennypageD
                    dennypage @j.koopmann
                    last edited by

                    @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                    This setup worked flawlessly for months so my main suspicion is that the update triggered a bug.

                    Unplug and re-plug the USB cable to confirm you are not experiencing a simple permissions issue.

                    Also, had you installed a quirk by chance?

                    J 1 Reply Last reply Reply Quote 0
                    • GertjanG Gertjan referenced this topic
                    • J
                      j.koopmann @dennypage
                      last edited by

                      @dennypage following up here from the other thread:

                      No quirks installed. How would this be a permissions issue if it is working after a restart of NUT and sometimes simply stops working out of the blue? I am happy to reattach the USB when it next happens.

                      GertjanG 1 Reply Last reply Reply Quote 0
                      • GertjanG
                        Gertjan @j.koopmann
                        last edited by

                        @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                        and sometimes simply stops working out of the blue?

                        As said in the other thread : when you connect, for example, a RJ45 network cable from the pfSense WAN to the the upstream LAN port of your ISP router, you presume it stays up until one of the two devices go down / are powered down.
                        If it doesn't, start ditching cables, NICs (drivers ?) or even devices until your connection is stable.

                        USB connection are often less mission critical, but, look at their physical connection method : just some metal plates against other metal plates, so the smallest pollution or whatever can make the connection 'bad'.
                        Swap cables.
                        Swap USB ports.
                        If the connection stays bad, swap the UPS.

                        Go back to the https://networkupstools.org/ web site, and this time check with the user forum first with the obvious question : "what is the best one ?"

                        Btw : I presume your USB connection going down isn't a NUT issue. The USB hardware signals the usb-hid driver that the USB connection went down. This gets signaled upwards, up until you see a line in the log : 'lost UPS'.
                        I can create this log line also with my pfSense NUT install - the very same as yours : I have to rip out the USB cable to see it.
                        I'm using a Netgate classic "4100" hooked up to an APC UPS. It's pretty rock solid.

                        No "help me" PM's please. Use the forum, the community will thank you.
                        Edit : and where are the logs ??

                        J 1 Reply Last reply Reply Quote 0
                        • J
                          j.koopmann @Gertjan
                          last edited by

                          @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                          @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                          and sometimes simply stops working out of the blue?

                          As said in the other thread : when you connect, for example, a RJ45 network cable from the pfSense WAN to the the upstream LAN port of your ISP router, you presume it stays up until one of the two devices go down / are powered down.
                          If it doesn't, start ditching cables, NICs (drivers ?) or even devices until your connection is stable.

                          If it does not stay up however I would see up/down messages in dmesg or alike and/or an increase in error counters.

                          USB connection are often less mission critical, but, look at their physical connection method : just some metal plates against other metal plates, so the smallest pollution or whatever can make the connection 'bad'.
                          Swap cables.
                          Swap USB ports.
                          If the connection stays bad, swap the UPS.

                          All understood. But if the USB connection is physically causing problems and disconnects, the kernel would throw disconnect errors on dmesg would it not?

                          Go back to the https://networkupstools.org/ web site, and this time check with the user forum first with the obvious question : "what is the best one ?"

                          So your answer is (again): Buy a better UPS. I am glad to at one point in time switch from Eaton to APC or if I am 100% sure this is the problem do so earlier. As stated before: This worked flawlessly for months up until the very point I upgraded from pfsense 23.x to 24.x. On that day (which also changed nut packages) the trouble started. This is one hell of a coincidence is it not?

                          Btw : I presume your USB connection going down isn't a NUT issue.

                          Assuming it is the USB connection that is going down. Other than the nut daemon showing "a problem" there is not indication that the USB connection is going down. No error message pointing to this.

                          The USB hardware signals the usb-hid driver that the USB connection went down.

                          which it would also signal in dmesg. Which it does not not meaning that the usb hardware is not going down or in a way the kernel does not recognize it.

                          GertjanG 1 Reply Last reply Reply Quote 0
                          • dennypageD
                            dennypage
                            last edited by

                            You are using 2.8.2_5, yes?

                            Do you have anything in the additional arguments to driver section?

                            Regarding USB physical issues, in addition to swapping USB ports and cables, you can also try adding a powered USB hub in-between.

                            J 2 Replies Last reply Reply Quote 0
                            • J
                              j.koopmann @dennypage
                              last edited by

                              @dennypage

                              pkg info | grep nut
                              nut-2.8.2                      Network UPS Tools
                              pfSense-pkg-nut-2.8.2_4        Network UPS Tools
                              

                              And GUI saying 2.8.2_4 as well. pfSense 24.11-RELEASE (amd64).

                              How can I upgrade to 2.8.2_5?

                              Will look for a powered USB hub in my archive. Same question to you: Would a USB physical error that is so severe that the driver stops working not also result in some sort of disconnect message in dmesg?

                              Is there a way to increase the debugging level in usb-hid to get more info the next time it stops working?

                              dennypageD 1 Reply Last reply Reply Quote 0
                              • J
                                j.koopmann @dennypage
                                last edited by

                                Do you have anything in the additional arguments to driver section?

                                [Keller]
                                driver=usbhid-ups
                                port=auto
                                

                                No. Not really other than port....

                                1 Reply Last reply Reply Quote 0
                                • dennypageD
                                  dennypage @j.koopmann
                                  last edited by

                                  @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                                  pkg info | grep nut
                                  nut-2.8.2                      Network UPS Tools
                                  pfSense-pkg-nut-2.8.2_4        Network UPS Tools
                                  

                                  Those versions are fine with 2.7.2. You should see 2.8.2_5 when you upgrade to pfSense 2.8.0.

                                  FWIW, you should consider upgrading to 2.8.0, with its newer kernel and USB core, before spending a lot of time on this.

                                  Will look for a powered USB hub in my archive. Same question to you: Would a USB physical error that is so severe that the driver stops working not also result in some sort of disconnect message in dmesg?

                                  "Severe" isn't the right way to think of it. The behavior in question would be a momentary disconnect, or rejection of a basic USB communication. It may or may not be logged.

                                  Is there a way to increase the debugging level in usb-hid to get more info the next time it stops working?

                                  First start the service from the UI, and then kill the driver:

                                  killall usbhid-ups
                                  

                                  Then, run the driver with debug:

                                  /usr/local/libexec/nut/usbhid-ups -DDD -a YOUR_UPS_NAME_HERE
                                  

                                  See what error it shows. If it doesn't show enough you can add more 'D's until it does up to "-DDDDDD".

                                  Think about installing pfSense 2.8.0.

                                  [FYI, I am traveling with very limited availability for the next several weeks...]

                                  1 Reply Last reply Reply Quote 0
                                  • GertjanG
                                    Gertjan @j.koopmann
                                    last edited by Gertjan

                                    @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                                    But if the USB connection is physically causing problems and disconnects, the kernel would throw disconnect errors on dmesg would it not?

                                    No timestamp in 'dmesg' but I can see these :

                                    [25.03-BETA][root@pfSense.bhf.tld]/root: dmesg | grep '700U'
                                    ugen0.2: <American Power Conversion Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5> at usbus0
                                    ugen0.2: <American Power Conversion Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5> at usbus0 (disconnected)
                                    ugen0.2: <American Power Conversion Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5> at usbus0
                                    ugen0.2: <American Power Conversion Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5> at usbus0 (disconnected)
                                    ugen0.2: <American Power Conversion Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5> at usbus0
                                    ugen0.2: <American Power Conversion Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5> at usbus0 (disconnected)
                                    ugen0.2: <American Power Conversion Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5> at usbus0
                                    

                                    (We had some minor power issues last week)

                                    If the USB connection itself is ok, but the other side goes bad logically, then I can image dmesg wouldn't show a hardware event.

                                    @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                                    So your answer is (again): Buy a better UPS

                                    Because I can't ask you to open up the UPS, so you can check what happens at the other side.
                                    Is the UPS operated by a dumb controller ? Or something with a CPU ? Does it have an serial console interface ? Can you connect to it ? If possible, this would help clarify the situation.

                                    The only thing I'm pretty sure about, as you and I use the same binaries on the pfSense side, that the issue isn't called 'NUT'.

                                    What differs :
                                    My NUT settings versus your NUT settings (might be identical !).
                                    Me : a 4100 - you use what ? What USB chipset set etc ?
                                    Me : APC : you : something else.
                                    Our USB cables are not identical neither ^^

                                    If it's only you using pfSense + NUT with a "Eaton 9SX 3000i." then there is little hope : you have to dive into it.

                                    Btw : from what I know, NUT, on a low level, opens a USB connection. NUT then sends over what really look like a serial or console connection, "status request commands" to the UPS.
                                    The UPS answers with ... what I've shown above with the upsc command.
                                    If any of the UPS replies gets miss interpreted, because on character is off (misspelled, whatever) then the UPS driver might get stalled .... and times out.
                                    This would explain what you see - and this is, afaik, what the quirks solution can do for you (denny talked about these).

                                    To see this happening, you have to use the NUT doc, and start the NUT processes in debug mode so you can see way more details. These could show you 'why' the communications stops.

                                    And that's the real issue : you have to investigate, and this might take "some time".
                                    It's that ... or, sorry, you opt for the 'change' option.
                                    I prefer of course that you find out what is really happening.
                                    Of course I hate the 'change it' solution ;)

                                    edit : lol, denny just posted how to start the driver in debug mode 👍 👏
                                    Now I see a lot of comm going on.
                                    I didn't know there was that much of a 'chat' bewteen NUT and the UPS.

                                    No "help me" PM's please. Use the forum, the community will thank you.
                                    Edit : and where are the logs ??

                                    dennypageD J 2 Replies Last reply Reply Quote 0
                                    • dennypageD
                                      dennypage @Gertjan
                                      last edited by

                                      @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                                      If any of the UPS replies gets miss interpreted, because on character is off (misspelled, whatever) then the UPS driver might get stalled .... and times out.
                                      This would explain what you see - and this is, afaik, what the quirks solution can do for you (denny talked about these).

                                      Quirks are used in FreeBSD to identify the device as a UPS to the kernel, which will prevent the kernel from attaching a default kernel driver to the USB device when it is discovered.

                                      Removing an already attached kernel driver requires root privileges, which is why you see some people use "user=root". They do this because their system is missing the appropriate quirk for the UPS.

                                      IMO, it's a better choice to install the missing quirk than to run the NUT driver as root.

                                      J 2 Replies Last reply Reply Quote 0
                                      • J
                                        j.koopmann @Gertjan
                                        last edited by

                                        @Gertjan

                                        @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                                        If the USB connection itself is ok, but the other side goes bad logically, then I can image dmesg wouldn't show a hardware event.

                                        Agreed. Possible of course. If it would not coincide with my upgrade months ago this would be my conclusion as well. :-)

                                        @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                                        Is the UPS operated by a dumb controller ? Or something with a CPU ? Does it have an serial console interface ? Can you connect to it ? If possible, this would help clarify the situation.

                                        I really cannot say what exactly the Eaton 9SX 3000i is using but I would suspect some sort of controller. It also has a RS232 and apparently the mge-shut driver should work. I will have to check if my firewall has RS232 and can try to switch. Otherwise I would need a USB to serial interface.

                                        @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                                        If it's only you using pfSense + NUT with a "Eaton 9SX 3000i." then there is little hope : you have to dive into it.

                                        Yep. Trying to do so!

                                        @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                                        If any of the UPS replies gets miss interpreted, because on character is off (misspelled, whatever) then the UPS driver might get stalled .... and times out.
                                        This would explain what you see - and this is, afaik, what the quirks solution can do for you (denny talked about these).

                                        To see this happening, you have to use the NUT doc, and start the NUT processes in debug mode so you can see way more details. These could show you 'why' the communications stops.

                                        Exactly. Will try to do that next thing (might have to install screen first. :-) )

                                        @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                                        And that's the real issue : you have to investigate, and this might take "some time".

                                        Oh that is not an issue. It is more an issue to get better information on where to look and how to look and denny just provided that. THANKS! And yes there is quite a but of chatter.

                                        GertjanG 1 Reply Last reply Reply Quote 0
                                        • J
                                          j.koopmann @dennypage
                                          last edited by

                                          This post is deleted!
                                          dennypageD 1 Reply Last reply Reply Quote 0
                                          • J
                                            j.koopmann @dennypage
                                            last edited by

                                            @dennypage regarding quirks: I get that and I am running this with user=root at the moment. Do you suspect that a missing quirk is the problem? It would not make sense to me because then it should not work at all without a quirk. :-)

                                            I can't really go to 2.8.0 since I am on the 24.11. I could upgrade to 25.03 beta but have not seen anything in the changelog that would indicate usb or kernel related changes. 24.11 already is on

                                            15.0-CURRENT FreeBSD 15.0-CURRENT #0 plus-RELENG_24_11-n256407-1bbb3194162

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.