NUT suddenly stops working every app. 6 minutes
- 
 Do you have anything in the additional arguments to driver section? [Keller] driver=usbhid-ups port=autoNo. Not really other than port.... 
- 
 @j-koopmann said in NUT suddenly stops working every app. 6 minutes: pkg info | grep nut nut-2.8.2 Network UPS Tools pfSense-pkg-nut-2.8.2_4 Network UPS ToolsThose versions are fine with 2.7.2. You should see 2.8.2_5 when you upgrade to pfSense 2.8.0. FWIW, you should consider upgrading to 2.8.0, with its newer kernel and USB core, before spending a lot of time on this. Will look for a powered USB hub in my archive. Same question to you: Would a USB physical error that is so severe that the driver stops working not also result in some sort of disconnect message in dmesg? "Severe" isn't the right way to think of it. The behavior in question would be a momentary disconnect, or rejection of a basic USB communication. It may or may not be logged. Is there a way to increase the debugging level in usb-hid to get more info the next time it stops working? First start the service from the UI, and then kill the driver: killall usbhid-upsThen, run the driver with debug: /usr/local/libexec/nut/usbhid-ups -DDD -a YOUR_UPS_NAME_HERESee what error it shows. If it doesn't show enough you can add more 'D's until it does up to "-DDDDDD". Think about installing pfSense 2.8.0. [FYI, I am traveling with very limited availability for the next several weeks...] 
- 
 @j-koopmann said in NUT suddenly stops working every app. 6 minutes: But if the USB connection is physically causing problems and disconnects, the kernel would throw disconnect errors on dmesg would it not? No timestamp in 'dmesg' but I can see these : [25.03-BETA][root@pfSense.bhf.tld]/root: dmesg | grep '700U' ugen0.2: <American Power Conversion Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5> at usbus0 ugen0.2: <American Power Conversion Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5> at usbus0 (disconnected) ugen0.2: <American Power Conversion Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5> at usbus0 ugen0.2: <American Power Conversion Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5> at usbus0 (disconnected) ugen0.2: <American Power Conversion Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5> at usbus0 ugen0.2: <American Power Conversion Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5> at usbus0 (disconnected) ugen0.2: <American Power Conversion Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5> at usbus0(We had some minor power issues last week) If the USB connection itself is ok, but the other side goes bad logically, then I can image dmesg wouldn't show a hardware event. @j-koopmann said in NUT suddenly stops working every app. 6 minutes: So your answer is (again): Buy a better UPS Because I can't ask you to open up the UPS, so you can check what happens at the other side. 
 Is the UPS operated by a dumb controller ? Or something with a CPU ? Does it have an serial console interface ? Can you connect to it ? If possible, this would help clarify the situation.The only thing I'm pretty sure about, as you and I use the same binaries on the pfSense side, that the issue isn't called 'NUT'. What differs : 
 My NUT settings versus your NUT settings (might be identical !).
 Me : a 4100 - you use what ? What USB chipset set etc ?
 Me : APC : you : something else.
 Our USB cables are not identical neither ^^If it's only you using pfSense + NUT with a "Eaton 9SX 3000i." then there is little hope : you have to dive into it. Btw : from what I know, NUT, on a low level, opens a USB connection. NUT then sends over what really look like a serial or console connection, "status request commands" to the UPS. 
 The UPS answers with ... what I've shown above with the upsc command.
 If any of the UPS replies gets miss interpreted, because on character is off (misspelled, whatever) then the UPS driver might get stalled .... and times out.
 This would explain what you see - and this is, afaik, what the quirks solution can do for you (denny talked about these).To see this happening, you have to use the NUT doc, and start the NUT processes in debug mode so you can see way more details. These could show you 'why' the communications stops. And that's the real issue : you have to investigate, and this might take "some time". 
 It's that ... or, sorry, you opt for the 'change' option.
 I prefer of course that you find out what is really happening.
 Of course I hate the 'change it' solution ;)edit : lol, denny just posted how to start the driver in debug mode    
 Now I see a lot of comm going on.
 I didn't know there was that much of a 'chat' bewteen NUT and the UPS.
- 
 @Gertjan said in NUT suddenly stops working every app. 6 minutes: If any of the UPS replies gets miss interpreted, because on character is off (misspelled, whatever) then the UPS driver might get stalled .... and times out. 
 This would explain what you see - and this is, afaik, what the quirks solution can do for you (denny talked about these).Quirks are used in FreeBSD to identify the device as a UPS to the kernel, which will prevent the kernel from attaching a default kernel driver to the USB device when it is discovered. Removing an already attached kernel driver requires root privileges, which is why you see some people use "user=root". They do this because their system is missing the appropriate quirk for the UPS. IMO, it's a better choice to install the missing quirk than to run the NUT driver as root. 
- 
 @Gertjan said in NUT suddenly stops working every app. 6 minutes: If the USB connection itself is ok, but the other side goes bad logically, then I can image dmesg wouldn't show a hardware event. Agreed. Possible of course. If it would not coincide with my upgrade months ago this would be my conclusion as well. :-) @Gertjan said in NUT suddenly stops working every app. 6 minutes: Is the UPS operated by a dumb controller ? Or something with a CPU ? Does it have an serial console interface ? Can you connect to it ? If possible, this would help clarify the situation. I really cannot say what exactly the Eaton 9SX 3000i is using but I would suspect some sort of controller. It also has a RS232 and apparently the mge-shut driver should work. I will have to check if my firewall has RS232 and can try to switch. Otherwise I would need a USB to serial interface. @Gertjan said in NUT suddenly stops working every app. 6 minutes: If it's only you using pfSense + NUT with a "Eaton 9SX 3000i." then there is little hope : you have to dive into it. Yep. Trying to do so! @Gertjan said in NUT suddenly stops working every app. 6 minutes: If any of the UPS replies gets miss interpreted, because on character is off (misspelled, whatever) then the UPS driver might get stalled .... and times out. 
 This would explain what you see - and this is, afaik, what the quirks solution can do for you (denny talked about these).To see this happening, you have to use the NUT doc, and start the NUT processes in debug mode so you can see way more details. These could show you 'why' the communications stops. Exactly. Will try to do that next thing (might have to install screen first. :-) ) @Gertjan said in NUT suddenly stops working every app. 6 minutes: And that's the real issue : you have to investigate, and this might take "some time". Oh that is not an issue. It is more an issue to get better information on where to look and how to look and denny just provided that. THANKS! And yes there is quite a but of chatter. 
- 
 This post is deleted!
- 
 @dennypage regarding quirks: I get that and I am running this with user=root at the moment. Do you suspect that a missing quirk is the problem? It would not make sense to me because then it should not work at all without a quirk. :-) I can't really go to 2.8.0 since I am on the 24.11. I could upgrade to 25.03 beta but have not seen anything in the changelog that would indicate usb or kernel related changes. 24.11 already is on 15.0-CURRENT FreeBSD 15.0-CURRENT #0 plus-RELENG_24_11-n256407-1bbb3194162 
- 
 @j-koopmann said in NUT suddenly stops working every app. 6 minutes: I get that and I am running this with user=root at the moment. Do you suspect that a missing quirk is the problem? It would not make sense to me because then it should not work at all without a quirk. :-) If the UPS is momentarily disconnecting on the USB, it would be re-discovered by the kernel when it comes back and the default kernel driver would be installed. I don't remember if removing the kernel driver is handled by the NUT USB driver on reconnect, or only on initial connect. I am not able to look through the code at the current time. I would certainly put the effort in to do a quirk and get rid of user=root. See information on quirks here. @j-koopmann said in NUT suddenly stops working every app. 6 minutes: I can't really go to 2.8.0 since I am on the 24.11. Sorry, for some reason I thought you were on 2.7.2. My bad. 
- 
 @dennypage said in NUT suddenly stops working every app. 6 minutes: If the UPS is momentarily disconnecting on the USB, it would be re-discovered by the kernel when it comes back and the default kernel driver would be installed. I don't remember if removing the kernel driver is handled by the NUT USB driver on reconnect, or only on initial connect. I am not able to look through the code at the current time. Will take a look in a moment and do my best. IF however this is caused by a disconnect and the kernel would discover this, that would surely (99%) create the appropriate messages in dmesg. I have the driver running with -DDD and of course it is not showing any errors whatsoever. Will try to have it log to a logfile and run it in screen and then wait for the next failure to happen hoping to find anything that will help us (or rather me). 
- 
 @dennypage I removed user = root and restarted. All is working fine which means I do not need a quirk does it not. However: root 76470 0.1 0.1 18600 7828 - Ss 18:08 0:00.00 /usr/local/sbin/upsmon nut 76794 0.1 0.1 18736 8268 - S 18:08 0:00.02 /usr/local/sbin/upsmon nut 43869 0.0 0.0 13888 3740 - Ss 18:08 0:00.00 /usr/local/libexec/nut/usbhid-ups -a Keller root 67123 0.0 0.2 26228 14152 - Ss 18:08 0:00.00 /usr/local/sbin/upsd -u rootIs this the normal output? Two upsmon one as nut one as root? And upsd -u root? I checked /usr/local/etc/nut and there is no reference to root in the *.conf files anymore... 
- 
 @j-koopmann said in NUT suddenly stops working every app. 6 minutes: @dennypage I removed user = root and restarted. All is working fine which means I do not need a quirk does it not. However: root 76470 0.1 0.1 18600 7828 - Ss 18:08 0:00.00 /usr/local/sbin/upsmon nut 76794 0.1 0.1 18736 8268 - S 18:08 0:00.02 /usr/local/sbin/upsmon nut 43869 0.0 0.0 13888 3740 - Ss 18:08 0:00.00 /usr/local/libexec/nut/usbhid-ups -a Keller root 67123 0.0 0.2 26228 14152 - Ss 18:08 0:00.00 /usr/local/sbin/upsd -u rootIs this the normal output? Two upsmon one as nut one as root? And upsd -u root? I checked /usr/local/etc/nut and there is no reference to root in the *.conf files anymore... Yes, that is correct. The NUT driver, /usr/local/libexec/nut/usbhid-ups in this case, is the process that would need root privs to override the kernel driver. 
- 
 @dennypage said in [NUT suddenly stops working every app. @j-koopmann said in NUT suddenly stops working every app. 6 minutes: I can't really go to 2.8.0 since I am on the 24.11. Sorry, for some reason I thought you were on 2.7.2. My bad. FWIW, you have boot environments, so testing with 25.03 Beta is pretty easy. Or you could wait for GA. But 24.11 -> 25.03 is not as big a step as 2.7.2 -> 2.8.0. 
- 
 @j-koopmann said in NUT suddenly stops working every app. 6 minutes: It also has a RS232 That might be a good 'plan B'. 
 Cables that convert a serial port to an USB port exist. NUT will allow you to use a serial port. That is, it's still a USB cable that connects, and that connect will enable a virtual 'serial' interface, avaible for pfSense thus NUT.
- 
 @Gertjan good theory. However..... The eaton does 9600 baud and the serial driver is fix 2400 baud... So today 6:30 Uhr the usbhid-ups stopped working. It was signal 15 SIGTERM and after debugging for a while it was due to a VPN connection falling down which triggered a suspected IP change php-fpm 66039 - - /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - x.x.x.x > y.y.y.y - Restarting packages. The entire stack was killed and came back... So back to the drawing board and restart the driver with -DDD to see if the real error resurfaces. However: It was ok for a long time now... 
- 
 @j-koopmann said in NUT suddenly stops working every app. 6 minutes: stopped working. It was signal 15 SIGTERM and after SIGTERM is send to a process - NUT in this case - by the system - pfSense or FreeBSD - because some system event happened. 
 Examples : an interface was set up, or vanished.
 ( still strange, as localhost - 127.0.0.1 - actually always stays up during the run time of the OS )
 Anyway, "VPN" did something with the available interface and all interface bound process like nginx, unbound, DHCP etc etc will receive a 'terminate' and will be restarted.
 The dreaded :Restarting packages. @j-koopmann said in NUT suddenly stops working every app. 6 minutes: The entire stack was killed and came back... So back to the drawing board and restart the driver with -DDD to see if the real error resurfaces. However: It was ok for a long time now... So, all the NUT processes were restarted, but a working USB connection wasn't created ? 
- 
 @Gertjan said in NUT suddenly stops working every app. 6 minutes: SIGTERM is send to a process - NUT in this case - by the system - pfSense or FreeBSD - because some system event happened. I might have hidden it too well due to the way I communicate: I am in the business for > 30 years... I am well aware what a SIGTERM is. :-) @Gertjan said in NUT suddenly stops working every app. 6 minutes: Anyway, "VPN" did something with the available interface and all interface bound process like nginx, unbound, DHCP etc etc will receive a 'terminate' and will be restarted. I know. That was the conclusion today. @Gertjan said in NUT suddenly stops working every app. 6 minutes: So, all the NUT processes were restarted, but a working USB connection wasn't created ? Whenever usbhid-ups is restarted it ALWAYS establishes a working USB connection. Everything was working fine this morning. I was inspecting my -DDD log and just saw that it stopped recording around 6:30 which led me to believe the original problem had resurfaced. But it was the VPN connection triggering the service restart. So I reenabled debugging and wait for a "real" failure. 
- 
 So far everything is stable without me changing anything else. Who knows why.... 
- 
 Started happening again (not that frequently but still...) Today I noticed it stopped around 6:30. upsmon 89561 - - Poll UPS [Keller] failed - Protocol error The usbhid-ups was still running. I killed it and restarted it with -DDD. As always it comes up flawlessly and communicates with the UPS. BUT upsmon continues to show Protocol errors. That is new (to me). Once usbhid-ups is back again (started with -a Keller of course) usbmon should be able to reestablish connection to it. In this case it failes! After restarting usbmon and usbhid-ups things come back. Why is usbmon showing a protocol error when usbhid-ups clearly is running and showing no errors and has a clearly established communication with the UPS? 
- 
 @j-koopmann said in NUT suddenly stops working every app. 6 minutes: upsmon 89561 - - Poll UPS [Keller] failed - Protocol error I've tried to generate such a message myself. I srated this on the console : tail -f /var/log/system.logand then disconnected the USB cable that connects pfSense to the UPS. This was the result : <38>1 2025-06-26T14:26:40.471014+02:00 bhf.tld sshd 32452 - - Accepted publickey for root from 192.168.1.6 port 51478 ssh2: RSA SHA256:t6AMtOQbd+vBU56dvmXq3xE+lWsMMoOUn9njUtgMwTQ <2>1 2025-06-26T14:27:09.833170+02:00 bhf.tld kernel - - - ugen0.2: <American Power Conversion Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5> at usbus0 (disconnected) <28>1 2025-06-26T14:27:12.075594+02:00 bhf.tld usbhid-ups 88307 - - libusb1: Could not open any HID devices: no USB buses found <29>1 2025-06-26T14:27:12.075778+02:00 bhf.tld upsd 87637 - - Data for UPS [UPS] is stale - check driver <2>1 2025-06-26T14:27:13.652138+02:00 bhf.tld kernel - - - ugen0.2: <American Power Conversion Back-UPS XS 700U FW:924.Z5 .I USB FW:Z5> at usbus0 <29>1 2025-06-26T14:27:14.212641+02:00 bhf.tld upsd 87637 - - UPS [UPS] data is no longer staleand when checking the UPS :  and the settings page : all was fine. I guess its time to dive in the upsmon source file to see what "Poll UPS [xxxx] failed - Protocol error" really means, what the connect is where this line is logged. 
- 
 @Gertjan I agree. I suppose you disconnected and then reconnected? When I restart via pfsense and then kill the driver and manually restart it with -DDD upsmon either does not notice the problem or recovers automatically. Which is why I am puzzled that exactly this did not happen. And yes I suspect an error in upsmon. But I am not experienced enough to dig into the source code. 

