NUT suddenly stops working every app. 6 minutes
-
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
I get that and I am running this with user=root at the moment. Do you suspect that a missing quirk is the problem? It would not make sense to me because then it should not work at all without a quirk. :-)
If the UPS is momentarily disconnecting on the USB, it would be re-discovered by the kernel when it comes back and the default kernel driver would be installed. I don't remember if removing the kernel driver is handled by the NUT USB driver on reconnect, or only on initial connect. I am not able to look through the code at the current time.
I would certainly put the effort in to do a quirk and get rid of user=root.
See information on quirks here.
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
I can't really go to 2.8.0 since I am on the 24.11.
Sorry, for some reason I thought you were on 2.7.2. My bad.
-
@dennypage said in NUT suddenly stops working every app. 6 minutes:
If the UPS is momentarily disconnecting on the USB, it would be re-discovered by the kernel when it comes back and the default kernel driver would be installed. I don't remember if removing the kernel driver is handled by the NUT USB driver on reconnect, or only on initial connect. I am not able to look through the code at the current time.
Will take a look in a moment and do my best.
IF however this is caused by a disconnect and the kernel would discover this, that would surely (99%) create the appropriate messages in dmesg.
I have the driver running with -DDD and of course it is not showing any errors whatsoever. Will try to have it log to a logfile and run it in screen and then wait for the next failure to happen hoping to find anything that will help us (or rather me).
-
@dennypage I removed user = root and restarted. All is working fine which means I do not need a quirk does it not. However:
root 76470 0.1 0.1 18600 7828 - Ss 18:08 0:00.00 /usr/local/sbin/upsmon nut 76794 0.1 0.1 18736 8268 - S 18:08 0:00.02 /usr/local/sbin/upsmon nut 43869 0.0 0.0 13888 3740 - Ss 18:08 0:00.00 /usr/local/libexec/nut/usbhid-ups -a Keller root 67123 0.0 0.2 26228 14152 - Ss 18:08 0:00.00 /usr/local/sbin/upsd -u root
Is this the normal output? Two upsmon one as nut one as root? And upsd -u root? I checked /usr/local/etc/nut and there is no reference to root in the *.conf files anymore...
-
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
@dennypage I removed user = root and restarted. All is working fine which means I do not need a quirk does it not. However:
root 76470 0.1 0.1 18600 7828 - Ss 18:08 0:00.00 /usr/local/sbin/upsmon nut 76794 0.1 0.1 18736 8268 - S 18:08 0:00.02 /usr/local/sbin/upsmon nut 43869 0.0 0.0 13888 3740 - Ss 18:08 0:00.00 /usr/local/libexec/nut/usbhid-ups -a Keller root 67123 0.0 0.2 26228 14152 - Ss 18:08 0:00.00 /usr/local/sbin/upsd -u root
Is this the normal output? Two upsmon one as nut one as root? And upsd -u root? I checked /usr/local/etc/nut and there is no reference to root in the *.conf files anymore...
Yes, that is correct. The NUT driver, /usr/local/libexec/nut/usbhid-ups in this case, is the process that would need root privs to override the kernel driver.
-
@dennypage said in [NUT suddenly stops working every app.
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
I can't really go to 2.8.0 since I am on the 24.11.
Sorry, for some reason I thought you were on 2.7.2. My bad.
FWIW, you have boot environments, so testing with 25.03 Beta is pretty easy. Or you could wait for GA. But 24.11 -> 25.03 is not as big a step as 2.7.2 -> 2.8.0.
-
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
It also has a RS232
That might be a good 'plan B'.
Cables that convert a serial port to an USB port exist. NUT will allow you to use a serial port. That is, it's still a USB cable that connects, and that connect will enable a virtual 'serial' interface, avaible for pfSense thus NUT. -
@Gertjan good theory. However.....
The eaton does 9600 baud and the serial driver is fix 2400 baud...
So today 6:30 Uhr the usbhid-ups stopped working. It was signal 15 SIGTERM and after debugging for a while it was due to a VPN connection falling down which triggered a suspected IP change
php-fpm 66039 - - /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - x.x.x.x > y.y.y.y - Restarting packages.
The entire stack was killed and came back... So back to the drawing board and restart the driver with -DDD to see if the real error resurfaces. However: It was ok for a long time now...
-
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
stopped working. It was signal 15 SIGTERM and after
SIGTERM is send to a process - NUT in this case - by the system - pfSense or FreeBSD - because some system event happened.
Examples : an interface was set up, or vanished.
( still strange, as localhost - 127.0.0.1 - actually always stays up during the run time of the OS )
Anyway, "VPN" did something with the available interface and all interface bound process like nginx, unbound, DHCP etc etc will receive a 'terminate' and will be restarted.
The dreaded :Restarting packages.
@j-koopmann said in NUT suddenly stops working every app. 6 minutes:
The entire stack was killed and came back... So back to the drawing board and restart the driver with -DDD to see if the real error resurfaces. However: It was ok for a long time now...
So, all the NUT processes were restarted, but a working USB connection wasn't created ?
-
@Gertjan said in NUT suddenly stops working every app. 6 minutes:
SIGTERM is send to a process - NUT in this case - by the system - pfSense or FreeBSD - because some system event happened.
I might have hidden it too well due to the way I communicate: I am in the business for > 30 years... I am well aware what a SIGTERM is. :-)
@Gertjan said in NUT suddenly stops working every app. 6 minutes:
Anyway, "VPN" did something with the available interface and all interface bound process like nginx, unbound, DHCP etc etc will receive a 'terminate' and will be restarted.
I know. That was the conclusion today.
@Gertjan said in NUT suddenly stops working every app. 6 minutes:
So, all the NUT processes were restarted, but a working USB connection wasn't created ?
Whenever usbhid-ups is restarted it ALWAYS establishes a working USB connection. Everything was working fine this morning. I was inspecting my -DDD log and just saw that it stopped recording around 6:30 which led me to believe the original problem had resurfaced. But it was the VPN connection triggering the service restart.
So I reenabled debugging and wait for a "real" failure.
-
So far everything is stable without me changing anything else. Who knows why....