NUT package (2.8.0 and below)
-
Hey guys. Mind if I join the party?
I upgraded from from 2.6 CE to 23.01 plus today, to get support for the 2.5Gbps nics in my firewall.
Unfortunately after the upgrade, NUT started failing because it couldn't claim the USB device:
Can't claim USB device [051d:0003]@0/0: Other error
UPS is a APC Smart-UPS 1500.
Did a little searching, found it was a permission error, and eventually found this thread.
Looks like I found the right place.
I've gone back through this thread about a month and started reading.
Adding
user=root
to ups.conf got things going again. However, I'd call that a workaround. If I read right, looks like I need to wait for the next release of NUT for a real fix.At the moment I am not using @dennypage's custom usbhid-ups. If my UPS does not stay online, I'll apply it and post results.
I'll be keeping an eye on this thread for new information.
-
@knight-of-ni said in NUT package:
Adding user=root to ups.conf got things going again. However, I'd call that a workaround. If I read right, looks like I need to wait for the next release of NUT for a real fix.
While the next release of nut is expected to address the CyberPower issue, it will not address the APC issue. The APC issue is a usb quirk issue, and this requires a new version of the kernel in pfSense to permanently resolve. I don't expect that to happen soon.
See here for further details of the APC issue. I recommend using the /boot/loader.conf.local solution if you can take a reboot.
-
-
@dennypage said in NUT package:
If you use the file, please post and let me know if it resolves an issue for you.
Your version of
usbhid-ups
has been working fine with my CyberPower for 48 hours now. Thanks. -
@dennypage said in NUT package:
If you use the file, please post and let me know if it resolves an issue for you.
I was experiencing the same issue and using this version of
usbhid-ups
has resolved the issue. Many thanks, that was driving me NUTS (pun intended) -
@dennypage Quick question: is the NUT package somehow dependent on other services or is integrated into a service hook since 23.01? After upgrading I have NUT alert messages all over the place for things that are completely unrealted to the UPS.
E.g. resetting or reconfiguring a gateway, doing configurations on WAN or VPNs if the interface is assigned etc. all seem to trigger "problems" with NUT loosing and regaining connection to the UPS. UPS attached is an APC BackUPS via USB that ran well before without being trigger happy with notifications when interface/gateway/routing things are happening. Now just touching some of those things seem to trigger a connection loss from NUT. Really confusing
Cheers
\jens -
@jegr said in NUT package:
E.g. resetting or reconfiguring a gateway, doing configurations on WAN or VPNs if the interface is assigned etc. all seem to trigger "problems" with NUT loosing and regaining connection to the UPS.
pfSense restarts package services when WAN interfaces disconnect or reconnect. Yes, this is unnecessary for some services, such as NUT with a USB connection, but there is no way for pfSense to know which services actually need to be restarted. It's always been this way.
What you would expect to see is NUT restart once when the interface goes down, and once again when the interface comes back up. Whether or not you see NUT reporting a lost connection or not depends upon the order and speed of shutting down the various processes involved (usbhid-ups, upsd, upsmon).
-
@dennypage said in NUT package:
pfSense restarts package services when WAN interfaces disconnect or reconnect. Yes, this is unnecessary for some services, such as NUT with a USB connection, but there is no way for pfSense to know which services actually need to be restarted. It's always been this way.
That may very well be, but before 23.01 there were no problems with NUT overactively reporting down/ups at those times whereas now they pop up almost every time when someone is editing something on interfaces, routing gateways etc.
Just wanting to check if there's anything that has changed while converting stuff to PHP 8.1 or anything. Wouldn't be the first :) -
@jegr Nothing to do with PHP, it's actually a shell script.
If you go back through this thread you'll find mention of this issue five or more years ago. Some people see it a lot, some never see it. It just depends on how things happen on the box. If it is really bothering you, you could do a local patch for the rc script to add a sleep following the kill of upsmon. But only if it's really bothering you.
-
@dennypage Nah, I can live with that :) But that only adds to another thing that packages should have more specific hooks like the XY interface instead of "any" interface, as theres no need otherwise. Or OpenVPN that gets restarted pretty often while listening on localhost so would be completely unphased by any changes. Also it'd be nice to have those notifications selectable and expandable so we can get more/specific notifications and disable others :)
But I'll push that in another thread. Thanks for getting back :)
Cheers
\jens -
@dennypage said in NUT package:
I've received several requests for the dev build of usbhid-ups, so I thought I would upload the file here.
For reference, the shasum and sha256sum checksums of the unzipped file are:
49ce9131502bfb8b789ee97b7fb3fc81fc9f8fff usbhid-ups 999a2653559dbc50ecc8ba592a67587b1e307a1495f6e8ebbd3d8e90e3967133 usbhid-ups
If you use the file, please post and let me know if it resolves an issue for you.
I finally got round to loading this file onto my system, and unfortunately it does not seem to work at all. Now instead of failing randomly after a period of time I get the failure immediately after restarting the UPS service.
Mar 30 18:25:59 upsmon 34059 Poll UPS [EatonUPS] failed - Driver not connected Mar 30 18:25:54 upsmon 34059 Poll UPS [EatonUPS] failed - Driver not connected Mar 30 18:25:49 upsmon 34059 Poll UPS [EatonUPS] failed - Driver not connected Mar 30 18:25:44 upsmon 34059 Poll UPS [EatonUPS] failed - Driver not connected Mar 30 18:25:39 upsmon 34059 Poll UPS [EatonUPS] failed - Driver not connected Mar 30 18:25:38 php-cgi 35430 notify_monitor.php: Message sent to my@email.net OK Mar 30 18:25:34 upsmon 34059 Poll UPS [EatonUPS] failed - Driver not connected Mar 30 18:25:29 upsmon 34059 UPS EatonUPS is unavailable Mar 30 18:25:29 upsmon 34059 Poll UPS [EatonUPS] failed - Driver not connected Mar 30 18:25:29 upsd 35561 User local-monitor@::1 logged into UPS [EatonUPS] Mar 30 18:25:27 php-cgi 35430 notify_monitor.php: Message sent to my@email.net OK Mar 30 18:25:25 upsd 35561 Startup successful
I think this log sequence shows from startup.
Any further suggestions for a solution to this? Are there any diagnostics I can do to be helpful?
Thank you.
David -
@davidir did you check to be sure the file permissions are correct after uploading it? This file from Denny Page is working without a hitch for me with a CyberPower UPS. I don't know if there are any other quirks with an Eaton UPS.
-
@davidir said in NUT package:
I think this log sequence shows from startup.
Any further suggestions for a solution to this?What's missing from your log entries are the usbhid-ups entries. Those are what is needed to help further the investigation if you are having a problem with usbhid-ups.
You may want to confirm that it is installed correctly. Location and permissions should look like this:
[23.01-RELEASE][root@fw]/root: ls -l /usr/local/libexec/nut/usbhid-ups -rwxr-xr-x 1 root wheel 250968 Jan 17 17:34 /usr/local/libexec/nut/usbhid-ups [23.01-RELEASE][root@fw]/root:
Lastly, if you are still having an issue you may want to post your config.
-
@dennypage Just to be clear (in case I did something really stupid) here is what I have done:
- download file linked above 1678659799995-usbhid-ups.gz
- Use pfsense web /diag_command.php to upload file
Uploaded file to /tmp/1678659799995-usbhid-ups.gz.
- Stop UPS service on pfsense
- use putty to open shell on pfsense
- check existing file
[23.01-RELEASE][admin@pfSense.irwazu.co.uk]/usr/local/libexec/nut: ls -l /usr/local/libexec/nut/usbhid-ups -rwxr-xr-x 1 root wheel 202096 Jan 18 01:29 /usr/local/libexec/nut/usbhid-ups [23.01-RELEASE][admin@pfSense.irwazu.co.uk]/usr/local/libexec/nut: [23.01-RELEASE][admin@pfSense.irwazu.co.uk]/usr/local/libexec/nut:
- extract the .gz file downloaded
[23.01-RELEASE][admin@pfSense.irwazu.co.uk]/usr/local/libexec/nut: gunzip -d /tmp/1678659799995-usbhid-ups.gz [23.01-RELEASE][admin@pfSense.irwazu.co.uk]/usr/local/libexec/nut: ls -l /tmp/1678659799995-usbhid-ups -rw-r--r-- 1 root wheel 258936 Mar 31 17:08 /tmp/1678659799995-usbhid-ups [23.01-RELEASE][admin@pfSense.irwazu.co.uk]/usr/local/libexec/nut:
- rename old file, move new file into place
[23.01-RELEASE][admin@pfSense.irwazu.co.uk]/usr/local/libexec/nut: mv usbhid-ups usbhid-ups.old [23.01-RELEASE][admin@pfSense.irwazu.co.uk]/usr/local/libexec/nut: mv /tmp/1678659799995-usbhid-ups ./usbhid-ups
- fix new file permissions
[23.01-RELEASE][admin@pfSense.irwazu.co.uk]/usr/local/libexec/nut: chmod 0755 usbhid-ups [23.01-RELEASE][admin@pfSense.irwazu.co.uk]/usr/local/libexec/nut: ls -l /usr/local/libexec/nut/usbhid-ups -rwxr-xr-x 1 root wheel 258936 Mar 31 17:08 /usr/local/libexec/nut/usbhid-ups
- start the UPS service
Config:
no additional driver arguments
Additional configuration lines for upsmon.conf: RUN_AS_USER root
Additional configuration lines for ups.conf: user=root
Additional configuration lines for upsd.conf:
LISTEN 127.0.0.1
LISTEN 192.168.200.254These are there as I want to be able to access the UPS status from an external monitor, and this was one of the steps in the documentation I followed.
I am still not seeing any log lines from usbhid-ups, if I revert back to the original usbhid-ups file then these do show.
-
@davidir said in NUT package:
Additional configuration lines for upsmon.conf: RUN_AS_USER root
Additional configuration lines for ups.conf: user=rootFWIW, you should remove both of these lines. You don't want to run as root unless you really have to. If you are running as root to work around a missing quirk issue, I strongly recommend using the loader.conf.local approach instead.
Setting that aside...
You can try logging in to your box and running the driver by hand to confirm, but I suspect that I already know why it isn't working for you. The executable I posted is for 64-bit x86-64 (Intel or AMD) hardware... what hardware architecture are you on?
The reason I ask, is that my files look like so:
[23.01-RELEASE][root@fw]/root: ls -l /usr/local/libexec/nut/usbhid-ups* -rwxr-xr-x 1 root wheel 258936 Apr 1 00:26 /usr/local/libexec/nut/usbhid-ups -rwxr-xr-x 1 root wheel 250968 Jan 17 17:34 /usr/local/libexec/nut/usbhid-ups.org [23.01-RELEASE][root@fw]/root:
My original usbhid-ups is 250,968 bytes, whereas yours is only 202,096 bytes. This suggests a different architecture. Are you on an ARM by chance?
-
Yes I have a Netgate 3100, CPU Type: ARM Cortex-A9 r4p1
@dennypage said in NUT package:
FWIW, you should remove both of these lines. You don't want to run as root unless you really have to. If you are running as root to work around a missing quirk issue, I strongly recommend using the loader.conf.local approach instead.
Thanks for the pointer, I will look into this.
-
Thank you @dennypage , I just used the described method to fix my Tripp Lite SMART750RM1U using the tripplite_usb driver with pfSense 23.01 running on a 6100.
usbconfig -d ugen0.2 dump_device_desc ugen0.2: <Tripp Lite TRIPP LITE SMART750RM1U> at usbus0, cfg=0 md=HOST spd=LOW (1.5Mbps) pwr=ON (0mA) bLength = 0x0012 bDescriptorType = 0x0001 bcdUSB = 0x0110 bDeviceClass = 0x0000 <Probed by interface class> bDeviceSubClass = 0x0000 bDeviceProtocol = 0x0000 bMaxPacketSize0 = 0x0008 idVendor = 0x09ae idProduct = 0x0001 bcdDevice = 0x000a iManufacturer = 0x0001 <Tripp Lite > iProduct = 0x0002 <TRIPP LITE SMART750RM1U > iSerialNumber = 0x0000 <no string> bNumConfigurations = 0x0001 #Two lines shown usbconfig -d ugen0.2 show_ifdrv ugen0.2: <Tripp Lite TRIPP LITE SMART750RM1U> at usbus0, cfg=0 md=HOST spd=LOW (1.5Mbps) pwr=ON (0mA) ugen0.2.0: uhid0: <Tripp Lite TRIPP LITE SMART750RM1U, class 0/0, rev 1.10/0.0a, addr 2> #boot time config hw.usb.quirk.0="0x09ae 0x0001 0x0000 0xffff UQ_HID_IGNORE" /boot/loader.conf.local #live testing usbconfig add_dev_quirk_vplh 0x09ae 0x0001 0x0000 0xffff UQ_HID_IGNORE
-
Is the nut pfsense package setup to deal with power race conditions?
I'm trying to understand when /usr/local/etc/rc.d/shutdown.nut.sh gets executed.
My UPS will cancel a delayed shutdown if the power returns during the shutdown delay, so that would leave my firewall in a halted state, never to restart.
So should a shutdown delay ever be used with the pfsense nut package? Or is it best to shutdown the UPS immediately when "/usr/local/sbin/upsdrvctl shutdown" gets called?
The nut FAQ has a recommendation for adding in a reboot after 120 seconds if the system is still up after the upsdrvctl shutdown command has been run.
https://networkupstools.org/docs/FAQ.html#_i_8217_m_facing_a_power_race
-
@stompro said in NUT package:
Is the nut pfsense package setup to deal with power race conditions?
No. Once a low battery situation has been declared, the pfSense system will perform a complete shutdown.
The "power race" situation discussed in the FAQ entry is rather antiquated, and was only pertinent to really dumb UPSs. The approach recommended in the FAQ required the OS to remain operational and not actually perform a complete shutdown. The expectation was that the secondary file systems would all be unmounted, and the root FS quiesced or remounted re-only to minimize damage to the file system. 20+ years ago this may have been a somewhat common approach to system shutdown, but no longer.
My UPS will cancel a delayed shutdown if the power returns during the shutdown delay, so that would leave my firewall in a halted state, never to restart.
Are you sure? Most all modern UPSs, once the kill command has been received, will carry forward with the disconnect of power regardless of whether or not mains power is present. Unless you have a very old and dumb UPS, I would not worry about it. If you do have such a UPS, I would seriously consider getting a new one.
-
@dennypage said in NUT package:
@stompro said in NUT package:
Is the nut pfsense package setup to deal with power race conditions?
No. Once a low battery situation has been declared, the pfSense system will perform a complete shutdown.
The "power race" situation discussed in the FAQ entry is rather antiquated, and was only pertinent to really dumb UPSs. The approach recommended in the FAQ required the OS to remain operational and not actually perform a complete shutdown. The expectation was that the secondary file systems would all be unmounted, and the root FS quiesced or remounted re-only to minimize damage to the file system. 20+ years ago this may have been a somewhat common approach to system shutdown, but no longer.
Are you sure about this? It seems like a better method than to hope and assume that the shutdown delay is set to a correct length of time to allow a machine to shut down before the power gets cut. Modern file systems are better, but there are tons of pfSense systems using UFS which is known to handle power cuts poorly.
I'm not sure if ram disks gets synced before the nut shutdown happens... I'll have to check on that since I use the ramdisk feature on all my firewalls.
I'm pretty sure sure all my debian based systems use the official method of running a shutdown and then killing the power at the end of the shutdown. And it looks like they also support a "POWEROFF_WAIT=15m" line in /etc/nut/nut.conf option to handle this type of power race condition.
See the systemd nutshutdown script https://github.com/networkupstools/nut/blob/master/scripts/systemd/nutshutdown.in
Looks like that was committed in 2018, so it seems like this style of system shutdown with NUT isn't something that went away 20+ years ago as I believe you are implying.
My UPS will cancel a delayed shutdown if the power returns during the shutdown delay, so that would leave my firewall in a halted state, never to restart.
Are you sure? Most all modern UPSs, once the kill command has been received, will carry forward with the disconnect of power regardless of whether or not mains power is present. Unless you have a very old and dumb UPS, I would not worry about it. If you do have such a UPS, I would seriously consider getting a new one.
Ha, I have a brand new Tripp Lite SMART750RM1U that does seem to have this problem. I'll test it again to make sure, but I also checked with Tripp Lite support and they said all their UPSs behave like that. If the power returns after the shutdown delay command is sent, the shutdown gets canceled. (I'm not sure how much a believe their support though, or my ability to communicate what I'm asking correctly.)
Here is a debian bug from 2016 that mentions that their APC ups (no model) has a similar quirk. (shutdown command is ignored when on mains power) So I don't know that this behavior is all that rare yet.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=835634But I think you have answered my question, no the pfsense nut package doesn't handle power race conditions.
-
@stompro said in NUT package:
Are you sure about this? It seems like a better method than to hope and assume that the shutdown delay is set to a correct length of time to allow a machine to shut down before the power gets cut. Modern file systems are better, but there are tons of pfSense systems using UFS which is known to handle power cuts poorly.
Yes. The default kill delay of 20 seconds covers most non NAS systems. The delay is usually controllable so you can adjust it if needed. See the description of variable offdelay in usbhid-ups.
UFS is an example of why the approach recommended in that FAQ entry is generally a bad idea because it requires that the UFS file system still be mounted when the power is cut. UFS is somewhat fragile and even if quiesced does not appreciate unclean shutdowns.
Regardless, the quiesce shutdown approach is not something that pfSense supports, let alone the pfSense NUT package. Even if it did, it still wouldn't be a good idea...
If you are using a UPS that will not obey the kill command if mains returns, it is still much better approach to use a safe complete shut down that requires manual intervention to recover rather than an unsafe shut down that may result in damage to the root file system.
Let's put some numbers to this. Let's say we have a UPS with a 10 minute run-time before low battery, a 20 second delay on the kill command, and power events that last 0-4 hours with an even distribution.
4% of the time will be no issue because mains will be restored before shutdown begins. 0.1% of the time, mains will be restored during the kill command delay, and the system will continue the shut down and require external intervention to reboot. However if the system does not use a complete shut down, 96% of the time, the system will still be running when power is cut, exposing the root file system to potential corruption. Even if we say that there is only a 1 in 100 chance (~1%) that the file system will experience corruption, it's still nearly a 10:1 win to use the full and complete shutdown approach.
Now let's look at the associated costs for those failure cases. In the 0.1% case, the cost of manual intervention is flipping a switch which a lay person can do. In the 1% case, the cost of manual intervention is an experienced system administrator spending significant time on the console fixing file system corruption or performing a complete re-install along with the associated data loss. When you look at the entire picture, it's a very clear and easy choice.
One last log to throw on the fire... with a dumb UPS and the quiesce and reboot approach, there is a significant exposure if there is a second power event shortly after the system decides to reboot. There is something like a 45 second window during which UPS will likely power off before the system even gets to the point of starting NUT, let alone completing another shutdown. With UFS, the chance of corruption in this case is much higher than 1 in 100. Yes, I know... see variable ondelay in usbhid-ups.
Ha, I have a brand new Tripp Lite SMART750RM1U that does seem to have this problem.
Well, that is unfortunate. I can't speak to your specific model, but I used baby Tripp Lites for several years and did not have that problem. I still have a leftover ECO model under my desk for my workstation.
My "main" UPS have generally been APCs, and they have obeyed power kill with mains live. On one occasion I really wished that they didn't because I was doing NUT testing without sufficient precaution and accidentally took out all my servers at once. Yes, I know... really stupid.
-
-
-