Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    NUT suddenly stops working every app. 6 minutes

    Scheduled Pinned Locked Moved UPS Tools
    46 Posts 3 Posters 7.0k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • GertjanG Offline
      Gertjan @j.koopmann
      last edited by

      @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

      stopped working. It was signal 15 SIGTERM and after

      SIGTERM is send to a process - NUT in this case - by the system - pfSense or FreeBSD - because some system event happened.
      Examples : an interface was set up, or vanished.
      ( still strange, as localhost - 127.0.0.1 - actually always stays up during the run time of the OS )
      Anyway, "VPN" did something with the available interface and all interface bound process like nginx, unbound, DHCP etc etc will receive a 'terminate' and will be restarted.
      The dreaded :

      Restarting packages.

      @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

      The entire stack was killed and came back... So back to the drawing board and restart the driver with -DDD to see if the real error resurfaces. However: It was ok for a long time now...

      So, all the NUT processes were restarted, but a working USB connection wasn't created ?

      No "help me" PM's please. Use the forum, the community will thank you.
      Edit : and where are the logs ??

      J 1 Reply Last reply Reply Quote 0
      • J Offline
        j.koopmann @Gertjan
        last edited by

        @Gertjan said in NUT suddenly stops working every app. 6 minutes:

        SIGTERM is send to a process - NUT in this case - by the system - pfSense or FreeBSD - because some system event happened.

        I might have hidden it too well due to the way I communicate: I am in the business for > 30 years... I am well aware what a SIGTERM is. :-)

        @Gertjan said in NUT suddenly stops working every app. 6 minutes:

        Anyway, "VPN" did something with the available interface and all interface bound process like nginx, unbound, DHCP etc etc will receive a 'terminate' and will be restarted.

        I know. That was the conclusion today.

        @Gertjan said in NUT suddenly stops working every app. 6 minutes:

        So, all the NUT processes were restarted, but a working USB connection wasn't created ?

        Whenever usbhid-ups is restarted it ALWAYS establishes a working USB connection. Everything was working fine this morning. I was inspecting my -DDD log and just saw that it stopped recording around 6:30 which led me to believe the original problem had resurfaced. But it was the VPN connection triggering the service restart.

        So I reenabled debugging and wait for a "real" failure.

        J 1 Reply Last reply Reply Quote 1
        • J Offline
          j.koopmann @j.koopmann
          last edited by

          So far everything is stable without me changing anything else. Who knows why....

          J 1 Reply Last reply Reply Quote 0
          • J Offline
            j.koopmann @j.koopmann
            last edited by

            Started happening again (not that frequently but still...)

            Today I noticed it stopped around 6:30.

            upsmon 89561 - - Poll UPS [Keller] failed - Protocol error

            The usbhid-ups was still running. I killed it and restarted it with -DDD. As always it comes up flawlessly and communicates with the UPS. BUT upsmon continues to show Protocol errors. That is new (to me). Once usbhid-ups is back again (started with -a Keller of course) usbmon should be able to reestablish connection to it. In this case it failes!

            After restarting usbmon and usbhid-ups things come back. Why is usbmon showing a protocol error when usbhid-ups clearly is running and showing no errors and has a clearly established communication with the UPS?

            GertjanG dennypageD 2 Replies Last reply Reply Quote 0
            • GertjanG Offline
              Gertjan @j.koopmann
              last edited by

              @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

              upsmon 89561 - - Poll UPS [Keller] failed - Protocol error

              I've tried to generate such a message myself.

              I srated this on the console :

              tail -f /var/log/system.log
              

              and then disconnected the USB cable that connects pfSense to the UPS.

              This was the result :

              <38>1 2025-06-26T14:26:40.471014+02:00 bhf.tld sshd 32452 - - Accepted publickey for root from 192.168.1.6 port 51478 ssh2: RSA SHA256:t6AMtOQbd+vBU56dvmXq3xE+lWsMMoOUn9njUtgMwTQ
              <2>1 2025-06-26T14:27:09.833170+02:00 bhf.tld kernel - - - ugen0.2: <American Power Conversion Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5> at usbus0 (disconnected)
              <28>1 2025-06-26T14:27:12.075594+02:00 bhf.tld usbhid-ups 88307 - - libusb1: Could not open any HID devices: no USB buses found
              <29>1 2025-06-26T14:27:12.075778+02:00 bhf.tld upsd 87637 - - Data for UPS [UPS] is stale - check driver
              <2>1 2025-06-26T14:27:13.652138+02:00 bhf.tld kernel - - - ugen0.2: <American Power Conversion Back-UPS XS 700U   FW:924.Z5 .I USB FW:Z5> at usbus0
              <29>1 2025-06-26T14:27:14.212641+02:00 bhf.tld upsd 87637 - - UPS [UPS] data is no longer stale
              

              and when checking the UPS :

              d578732d-008c-4b3b-bfda-aa404941b6d4-image.png

              and the settings page : all was fine.

              I guess its time to dive in the upsmon source file to see what "Poll UPS [xxxx] failed - Protocol error" really means, what the connect is where this line is logged.

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              J 1 Reply Last reply Reply Quote 0
              • J Offline
                j.koopmann @Gertjan
                last edited by

                @Gertjan I agree. I suppose you disconnected and then reconnected?

                When I restart via pfsense and then kill the driver and manually restart it with -DDD upsmon either does not notice the problem or recovers automatically. Which is why I am puzzled that exactly this did not happen. And yes I suspect an error in upsmon. But I am not experienced enough to dig into the source code.

                GertjanG 1 Reply Last reply Reply Quote 0
                • GertjanG Offline
                  Gertjan @j.koopmann
                  last edited by Gertjan

                  @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                  then kill the driver

                  That can/will not happen in reality ?! right ?

                  It probably happens when processes get restarted because there was an Interface event somewhere.
                  For me, that's against my "church rules". It says : as the admin, do everything, and more, so that interfaces always say up. No exceptions allowed.
                  And I'm not allowed to believe.
                  I have to fact check myself being my own big-brother - click and see for yourself.
                  That's why I use an UPS btw ^^

                  edit : the stats have a granularity (?) of '300 seconds'. That elmans that smaller outages are possible a can go unnoticed in stats.
                  But I'm a "good admin", I rarely look at the dashboard page. I look at the logs, as that is place where the real fun is being showed.

                  No "help me" PM's please. Use the forum, the community will thank you.
                  Edit : and where are the logs ??

                  J 1 Reply Last reply Reply Quote 0
                  • J Offline
                    j.koopmann @Gertjan
                    last edited by

                    @Gertjan said in NUT suddenly stops working every app. 6 minutes:

                    That can/will not happen in reality ?! right ?

                    I hope not. Only reason for me doing it currently is to have more debug output and see if the underlying problem is in the USB communication between the driver and the UPS (which was speculated but I believe not to be the case so far) or somewhere else.

                    I just wanted to second your conclusion (at least I believe). If the USB connection goes away and comes back upsmon regularly is fine. Even if I kill the driver and get it back online, upsmon is fine. Not so this morning however.

                    GertjanG 1 Reply Last reply Reply Quote 0
                    • GertjanG Offline
                      Gertjan @j.koopmann
                      last edited by

                      @j-koopmann

                      Normally, I don't checkup with my 4100 all the time : as long as PC applications that need a connection, work fine, I can work (which doesnt' include pfSense baby sitting).

                      I do have this on :

                      3f4b7338-38eb-4336-b310-d87672b91b99-image.png

                      = WinNut 2.0 client on my desktop.

                      And it does happen that is shows this :

                      82169195-37e7-42f5-9c4d-c6f5e62b5771-image.png so I know I have to go to pfSense and start UPS again as it failed 'for some reason'.
                      Now I'm thinking about it, this happens more often then the wished for 'never' ^^

                      No "help me" PM's please. Use the forum, the community will thank you.
                      Edit : and where are the logs ??

                      J 1 Reply Last reply Reply Quote 0
                      • J Offline
                        j.koopmann @Gertjan
                        last edited by

                        @Gertjan Welcome to my world. I have this on my unraid server and get the occasional "UPS is not responding" which led me to start this thread in the first place... :-)

                        GertjanG 1 Reply Last reply Reply Quote 0
                        • GertjanG Offline
                          Gertjan @j.koopmann
                          last edited by Gertjan

                          @j-koopmann

                          Up until now I always thought it was me, the UPS series of processes get restarted, and because it's more then one thing (process) that gets restarted I didn't give it a lot of thought. Happely enough my power utility company is pretty stable (France here).
                          That said, I might also be me who causes it, as I'm messing around a lot with my pfSense box. And it do make failures ... (more and more btw).
                          Just yesterday, was doing 'something' and I lost all IPv4 traffic on all interfaces. IPv6 was ok. A nice test to see what works, and what doesn't. Google, Netflix, Most Microsoft sites : ok. Other sites like expedia and other hotel related productivity sites didn't show up anymore. DNS was doing great, and it showed me that everything goes overv IPv6 by default, as IPv4 has become the 'fall back' network.
                          I quickly discovered that my phone busing the carrier 5G was ok (same operator), so it was 'me'.
                          Normally, I would say : great, a self inflicted problem, let's debug that, but as my connection is also the connection of the company, things went out of control rapidly. Clients got to the desk to say that my (free) wifi sucks ... so I fired back that I was working on the quality / price of the wifi and the ones who followed math beyond 11 years would know instantly that I'm telling them to f up. something that is, I know, not a very mature neither professional thing to do.
                          Anyway : I used Plan Z : I rebooted from as safe ZFS partition, and I was back up in no time, but this took away an occasion to find out what happened (where I fcked up).

                          No "help me" PM's please. Use the forum, the community will thank you.
                          Edit : and where are the logs ??

                          J 1 Reply Last reply Reply Quote 0
                          • J Offline
                            j.koopmann @Gertjan
                            last edited by

                            @Gertjan I know what you mean. And same here. :-)

                            In this case however it is just a great annoyance. I could restart nut every x hours to be on the safe side. And it's a first world problem as the UPS itself still works even if nut is not there and my PV battery at least in the summer should be ok for many minutes to hours as a backup as well.

                            but.... A service that should be there stably and is not is an annoyance. :-) And I would love to understand the underlying problem. The more I debug this the less it looks like "a usb cable problem" which was the first (understandable) suspicion.

                            1 Reply Last reply Reply Quote 0
                            • dennypageD Offline
                              dennypage @j.koopmann
                              last edited by

                              @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                              Started happening again (not that frequently but still...)

                              Today I noticed it stopped around 6:30.

                              upsmon 89561 - - Poll UPS [Keller] failed - Protocol error

                              The usbhid-ups was still running. I killed it and restarted it with -DDD. As always it comes up flawlessly and communicates with the UPS. BUT upsmon continues to show Protocol errors. That is new (to me). Once usbhid-ups is back again (started with -a Keller of course) usbmon should be able to reestablish connection to it. In this case it failes!

                              After restarting usbmon and usbhid-ups things come back. Why is usbmon showing a protocol error when usbhid-ups clearly is running and showing no errors and has a clearly established communication with the UPS?

                              This is not related to usbhid-ups. The protocol error in question is between upsmon and upsd. usbhid-ups does not communicate directly with upsmon.

                              If you see this again, run upsc, which also talks directly with upsd, and confirm that you can see values for the UPS.

                              In general, this is indicative of a bug in NUT itself. What version of pfSense and NUT are you running? Do you see any other log entries for upsd or upsmon in the system log?

                              J 1 Reply Last reply Reply Quote 0
                              • J Offline
                                j.koopmann @dennypage
                                last edited by

                                @dennypage said in NUT suddenly stops working every app. 6 minutes:

                                This is not related to usbhid-ups. The protocol error in question is between upsmon and upsd. usbhid-ups does not communicate directly with upsmon.

                                Agreed.

                                If you see this again, run upsc, which also talks directly with upsd, and confirm that you can see values for the UPS.

                                Good hint. Thanks. Will do!

                                In general, this is indicative of a bug in NUT itself. What version of pfSense and NUT are you running? Do you see any other log entries for upsd or upsmon in the system log?

                                Again: Agreed and my thesis from the start. :-)

                                24.11-RELEASE
                                nut package 2.8.2_4

                                dennypageD 2 Replies Last reply Reply Quote 0
                                • dennypageD Offline
                                  dennypage @j.koopmann
                                  last edited by

                                  @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                                  Again: Agreed and my thesis from the start. :-)

                                  24.11-RELEASE
                                  nut package 2.8.2_4

                                  I see that this information was discussed above. I've been traveling the past three plus weeks, so it's been hard to keep track of things. Sorry.

                                  1 Reply Last reply Reply Quote 0
                                  • dennypageD Offline
                                    dennypage @j.koopmann
                                    last edited by

                                    @j-koopmann Checking on the prior issue for a moment... Were you able to implement a quirk? Are you still having issues with usbhid-ups?

                                    J 1 Reply Last reply Reply Quote 0
                                    • J Offline
                                      j.koopmann @dennypage
                                      last edited by j.koopmann

                                      @dennypage I honestly never had a problem with usbhid-ups (so I believe). Whenever I start it it immediately recognizes my UPS. It is currently running as user nut and I never installed a quirk.

                                      upsmon 75628 - - Signal 15: exiting
                                      upsd 24479 - - Signal 15: exiting
                                      

                                      But these are due to interface WAN IP changes that restart all services.

                                      At one point other the other I get

                                      psmon 89561 - - UPS [Keller]: connect failed: Connection failure: Connection refused
                                      

                                      (even though usbhid-ups seems to be running) and
                                      then

                                      <27>1 2025-06-26T08:48:50.206377+02:00 pfSenseHills.koopmann.local upsmon 89561 - - UPS [Keller]: connect failed: Connection failure: Connection refused
                                      <27>1 2025-06-26T08:48:55.275466+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Set password on [Keller] failed - got [PASSWORD 66da6bc012db26058161]
                                      <27>1 2025-06-26T08:49:00.279702+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:05.280772+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:10.285154+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:15.287439+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:20.292291+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:25.301818+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:30.303758+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:35.304766+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:40.308262+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:45.323762+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:50.325535+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:49:55.330127+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:50:00.337898+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:50:05.343880+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:50:10.367770+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:50:15.370222+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:50:20.473811+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <27>1 2025-06-26T08:50:25.475779+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      <29>1 2025-06-26T08:50:25.475839+02:00 pfSenseHills.koopmann.local upsmon 89561 - - UPS Keller is unavailable
                                      <27>1 2025-06-26T08:50:30.480347+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Poll UPS [Keller] failed - Protocol error
                                      
                                      dennypageD 1 Reply Last reply Reply Quote 0
                                      • dennypageD Offline
                                        dennypage @j.koopmann
                                        last edited by

                                        @j-koopmann said in NUT suddenly stops working every app. 6 minutes:

                                        <27>1 2025-06-26T08:48:50.206377+02:00 pfSenseHills.koopmann.local upsmon 89561 - - UPS [Keller]: connect failed: Connection failure: Connection refused
                                        <27>1 2025-06-26T08:48:55.275466+02:00 pfSenseHills.koopmann.local upsmon 89561 - - Set password on [Keller] failed - got [PASSWORD 66da6bc012db26058161]

                                        Okay, that is entertaining to say the least. Does "66da6bc012db26058161" happen to be the locally generated password for local-monitor?

                                        The next time the protocol error happens, would you please report the start time of the upsd and upsmon processes, plus the mod time and contents of these two files:

                                        /usr/local/etc/nut/upsd.users 
                                        /usr/local/etc/nut/upsmon.conf
                                        

                                        After capturing that information, run

                                        /usr/local/sbin/upsd -c reload
                                        

                                        and see if the protocol error stops.

                                        If it doesn't, then run

                                        /usr/local/sbin/upsmon -c reload
                                        

                                        and see if that stops it.

                                        Thanks.

                                        J 1 Reply Last reply Reply Quote 0
                                        • J Offline
                                          j.koopmann @dennypage
                                          last edited by

                                          @dennypage said in NUT suddenly stops working every app. 6 minutes:

                                          Okay, that is entertaining to say the least. Does "66da6bc012db26058161" happen to be the locally generated password for local-monitor?

                                          I have not the slightest idea! Never seen this before!!!!

                                          Rest: Will be delighted to do so! Thanks for the instructions.

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.