NUT suddenly stops working every app. 6 minutes
-
Hi,
with the exception of updating to 24.11-RELEASE from the 23.x version I have not changed anything substantial in my setup. However all of the sudden UPS/nut keeps failling:
After a restart all is fine.
Can't connect to UPS [Keller] (usbhid-ups-Keller): No such file or directory Found 1 UPS defined in ups.conf User local-monitor@127.0.0.1 logged into UPS [Keller] Poll UPS [Keller] failed - Driver not connected Communications with UPS Keller lost Connected to UPS [Keller]: usbhid-ups-Keller Communications with UPS Keller established
The UPS is a usb connected Eaton 9SX 3000i.
This setup worked flawlessly for months so my main suspicion is that the update triggered a bug. Is anything known and maybe is there a patch?
Regards
JP -
Check if the UPS is actually connected.
Console or command line :
Typeusbconfig
You saw it ?
This means (to me) :
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
Can't connect to UPS [Keller] (usbhid-ups-Keller): No such file or directory
that de USB wasn't known to the OS - so not attached (if attached : bad cable, bad USB host, flaky UPS firmware, etc).
pfSense packages don't receive 'official' patches.
If needed, they are modified by their creator, and marked for 'to be upgraded'.
That said, if you find a patch (redmine / github) you can copy paste a patch yourself.
This is valid for the GUI part of the package.
Binaries (executable) can't be patched. -
@Gertjan Thanks.
I have to correct myself: It is not 6 minutes. But it is 1-2 days after restart of NUT. I will look for the correct log entries at the next failure.
If it stops working simply restarting NUT fixes it. I don't touch the hardware. So a hardware (cable disconnected) or general USB-cannot-be-found failure is sort of out of the question if a simple service restart makes it happy again for the next hours to days.
I can check the USB cable but again: It is not as if someone is touching the devices during or near the failure and it started with pfsense update. Not sure if the nut version was also updated between 23.x and 24.x.
Will change cable once it happens again to see if this is related.
-
I'm pretty sure no one disconnected your cable for a moment
I use myself an UPS on a 4100 :[25.03-BETA][root@pfSense.bhf.tld]root: usbconfig .... ugen0.2: <Uninterruptible Power Supply American Power Conversion> at usbus0, cfg=0 md=HOST spd=LOW (1.5Mbps) pwr=ON (24mA)
and its petty rock solid.
That is, as long as I change the battery, normally every 2 years (a bit more).I'm using the same NUT version and underlying upsmon software as you, and it plays well.
When you see this :
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
Poll UPS [Keller] failed - Driver not connected
Communications with UPS Keller lost
Connected to UPS [Keller]: usbhid-ups-Keller
Communications with UPS Keller establishedthe USB hardware, on the pfSense side, or the UPS side disconnected the logical connection.
Like something glitched somewhere.
Like a NIC : you'll see the link down and link up event.My not so SPECIAL NUT settings are :
Btw : If possible : test with another UPS. Pretty sure the issue goes away ^^
I can advise you "APC" (not made in America of course, it was branded in the States ^^) -
@Gertjan I am not happy buying a new UPS if the old one is working and has been doing so flawlessly for the past months. :-) Maybe I will try a different cable and USB port. Who knows...
-
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
I am not happy buying a new UPS if the old one is working and has been doing so flawlessly for the past months.
Not forcing you to get out and buy something just for testing.
An UPS is a security device, and I was presuming you have more to protect a just pfSense.
So : you have at least 2 UPS, so you can do : the exchange trick, which is perfect to rules things out.Btw : UPS batteries, even when they are not solicited often, do die - give them 3 years, and they switched roles : instead of being a security device in your house, they will set your house in fire (batteries overheating : even the fire squads run away from it).
Again : USB connection (electrical) are not perfect.
UPSMon (NUT) : as we use the same version, identical at a bit level, only our settings differ. For example : my USB driver is probably not the same as yours. You use the same USB driver as I do :see here : https://networkupstools.org/stable-hcl.html
The USB chip used on the UPS side .... we'll never know,
And maybe the internal power of your UPS is 'flaky'. (Capacitors go bad also ...)
Normally, I change my UPSs after the second battery change, if they didn't go bad before.
-
@Gertjan said in NUT suddenly stops working every app. 6 minutes:
Not forcing you to get out and buy something just for testing.
I know. :-)
@Gertjan said in NUT suddenly stops working every app. 6 minutes:
An UPS is a security device, and I was presuming you have more to protect a just pfSense.
Yes one server and a switch and other equipment but for a home installation only hence only one UPS. Moreover the entire house is on a big PV driven battery so the UPS only will have to cover for a few minutes. Not worth buying two...
@Gertjan said in NUT suddenly stops working every app. 6 minutes:
Btw : UPS batteries, even when they are not solicited often, do die - give them 3 years,
I am aware. But the old UPS is only appr. 12 months old. :-) So I am not going to buy y new one just to rule this one out. Worst case: It is not working or I have to restart nut every x hours. --> Home installation.
@Gertjan said in NUT suddenly stops working every app. 6 minutes:
Normally, I change my UPSs after the second battery change, if they didn't go bad before.
Agreed. Will do that in x years. :-)
Thanks for your help. Let me change cables and USB port and see if something changes.
-
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
This setup worked flawlessly for months so my main suspicion is that the update triggered a bug.
Unplug and re-plug the USB cable to confirm you are not experiencing a simple permissions issue.
Also, had you installed a quirk by chance?