-
@dennypage said in NUT Package (2.8.1 and above):
@pfpv said in NUT Package (2.8.1 and above):
I respectfully disagree about Synology. It's just a server that passes UPS messages. It's very stable, does it properly, has worked for years and I haven't seen any complaints. I chose to connect my UPS to Synology because in my opinion it is the most critical piece of equipment to be properly shut down, and it provides a NUT server for other equipment.
I don't disagree that there is a bug in NUT, and I will be looking at that shortly. That said...
I'm in my third generation of Synology equipment. Fifteen plus years. I have handled a number of support issues arising from Synology's NUT implementation, both mine and others. Their NUT implementation started out very straight forward, but over time it has evolved, becoming more and more specialized and complex. With DSM 7.2, it's gotten to the point that I don't consider it to be completely reliable, and view it as a primary of last resort.
I will point out one thing that you said that you may wish to reconsider. You indicate that the NAS is the most important thing to have a proper shutdown. I agree with this general sentiment. However, by running the NAS as the NUT primary, you are actually incurring higher risk for the NAS rather than less.
The NUT primary does not initiate a shutdown until all the associated secondaries have logged out of the primary. Assuming a default polling interval of 5 seconds, a pfSense or Linux system will take something on the order of 10-15 seconds before they log out, and another 30-90 seconds to complete a shutdown. This means that the NAS will not begin its shutdown until 10-15 seconds after the UPS declares a low battery. Depending upon configuration and current activity, a Synology usually takes over 2 minutes to complete a shutdown. If the UPS is off in calculating remaining runtime, you run the risk of exhausting the battery before the NAS has completed its shutdown.
If you reverse this situation and use pfSense or a Linux system as the primary, then the NAS will begin its shutdown within 5 seconds. Not only does this give a wider margin of safety for the NAS, it can give an increased margin of safety for the other systems as well. When the NAS shuts down, there is suddenly a lot less load on the UPS, which gives more time for the other systems to complete their shutdown even if the estimated remaining runtime was incorrect.
The relevant passage from upsmon.conf:
# Also, since the primary system stays up the longest, it suffers higher risks # of ungraceful shutdown if the estimation of remaining runtime (or of the # time it takes to shut down this system) was guessed wrong. By consequence, # the "secondary" systems typically monitor the power environment state # through the 'upsd' processes running on the remote (often "primary") systems # and do not directly interact with an UPS (no local NUT drivers are running # on the secondary systems). As such, secondaries typically shut down as # soon as there is a sufficiently long power outage, or a low-battery alert # from the UPS, or a loss of connection to the primary while the power was # last known to be missing.
As a general rule, you want to have systems that represent the highest UPS load and/or longest shutdown time as secondaries, and a system that represents lower load and is fast to shut down as the primary.
I am not done reading the thread but this morning I had my second shutdown of pfSense. Same behavior as OP. I don't have a synology but a Proxmox machine hosting a Debian/Docker server.
At both events, everything stayed on except pfSense that shut down. no router, no LAN.
Going trough the rest of the thread.
-
@NinthWave said in NUT Package (2.8.1 and above):
I am not done reading the thread but this morning I had my second shutdown of pfSense. Same behavior as OP. I don't have a synology but a Proxmox machine hosting a Debian/Docker server.
At both events, everything stayed on except pfSense that shut down. no router, no LAN.
Circa post 104 there are replacement executables that you may use until the next drop of NUT.
-
@dennypage said in NUT Package (2.8.1 and above):
@ha11oga11o said in NUT Package (2.8.1 and above):
With pfSense 2.7.0 and nut 2.8.2 i still have huge problems with Riello 2200.
Something is off with the version numbers here. pfSense-pkg-nut version 2.8.2 requires pfSense version 23.09/2.7.1 or above. Are you sure you are not running pfSense-pkg-nut version 2.8.0?
Edit: In the other thread you started, it appears that you haven't updated anything since Nov 10th. The pattern you are describing is characteristic of version 2.8.0. Did you go through any of the later posts in that thread? @Unoptanio indicates success with the Riello using version 2.8.2 of the pfSense-pkg-nut on pfsense version 2.7.1.
Thanks!
But where is one to copy these files ? I looked under /usr/bin and /usr/local/bin with no avail.
-
@dennypage said in NUT Package (2.8.1 and above):
@NinthWave said in NUT Package (2.8.1 and above):
I am not done reading the thread but this morning I had my second shutdown of pfSense. Same behavior as OP. I don't have a synology but a Proxmox machine hosting a Debian/Docker server.
At both events, everything stayed on except pfSense that shut down. no router, no LAN.
Circa post 104 there are replacement executables that you may use until the next drop of NUT.
Thanks. I did not see it at first.
Are you aware if there is a reason there are no subthreads is Netgate community.
Using Google, I typed "pfsense + the_push_message_when_pfsense_shutdown"
But if I were have been to look under "Nut Package (2.8.1 and above", I may not have read all the 122 messages.
A subthread for this issue would have been greatly appreciated. -
@NinthWave said in NUT Package (2.8.1 and above):
I looked under /usr/bin and /usr/local/bin with no avail.
/usr/local/sbin and /usr/local/libexec/nut.
-
@NinthWave said in NUT Package (2.8.1 and above):
Are you aware if there is a reason there are no subthreads is Netgate community.
Sorry, can't help you there. I've no association with the forum other than as a user of it.
-
@dennypage Thank you for the detailed information. I had issues with NUT and replaced it with apcupsd. Which resolved the issue.
-
@dennypage said in NUT Package (2.8.1 and above):
@NinthWave said in NUT Package (2.8.1 and above):
I looked under /usr/bin and /usr/local/bin with no avail.
/usr/local/sbin and /usr/local/libexec/nut.
Thank you
I have copied the files in their respective directories.
When I restart NUT service from GUI, I getStatus Alert: The UPS requires attention
/var/log/nut is empty
[EDIT 09:14 EST]
I reinstalled the package from GUI in pfSense's package manager:
The usbhid-ups did not get refreshed
The upsmon got refreshed
The system is workingSo either I did something wrong while copying the file or upsmon has a bug or something I did not do.
Maybe mention I extracted the .gz in Windows then copied it to pfSense.
-
@NinthWave You probably had wrong file permissions. Be sure to match permissions as previous. Should look like this:
[23.09.1-RELEASE][root@fw]/root: ls -l /usr/local/sbin/upsmon* /usr/local/libexec/nut/usbhid-ups* -rwxr-xr-x 1 root wheel 333728 Dec 27 10:14 /usr/local/libexec/nut/usbhid-ups -rwxr-xr-x 1 root wheel 287088 Nov 1 00:57 /usr/local/libexec/nut/usbhid-ups.org -rwxr-xr-x 1 root wheel 87760 Dec 27 10:13 /usr/local/sbin/upsmon -rwxr-xr-x 1 root wheel 68904 Nov 1 00:57 /usr/local/sbin/upsmon.org [23.09.1-RELEASE][root@fw]/root:
-
@dennypage said in NUT Package (2.8.1 and above):
@NinthWave You probably had wrong file permissions. Be sure to match permissions as previous. Should look like this:
[23.09.1-RELEASE][root@fw]/root: ls -l /usr/local/sbin/upsmon* /usr/local/libexec/nut/usbhid-ups* -rwxr-xr-x 1 root wheel 333728 Dec 27 10:14 /usr/local/libexec/nut/usbhid-ups -rwxr-xr-x 1 root wheel 287088 Nov 1 00:57 /usr/local/libexec/nut/usbhid-ups.org -rwxr-xr-x 1 root wheel 87760 Dec 27 10:13 /usr/local/sbin/upsmon -rwxr-xr-x 1 root wheel 68904 Nov 1 00:57 /usr/local/sbin/upsmon.org [23.09.1-RELEASE][root@fw]/root:
First time I did it, I forgot to stop the service before copying the files. It might have been that.
Also, I did not check the file permissions the first time so I used 777
Now that I hed reinstalled the package and that you showed the original file permission, I used 755
The daemon restarted correctly.
Thanks
-
-
@dennypage Thanks. I have SB-5100 so I would like to deploy the files you provided to mitigate my issues.
Can you help me out and describe how to perform actual replacement? Appreciate your help. -
-
@markster
How to patch NUT until next package availabe, in order to avoid unexpected shutdownIn short:
- dowload those two archives on your PC or your pfSense directly (from SSH)
@dennypage said in NUT Package (2.8.1 and above):
For those of you that are on amd64 based systems (Intel or AMD), and are severely affected by the shutdown on calibration/self-test issue, attached are replacement versions of upsmon and usbhid-ups that you can use until the update is published.
Note that these files are for amd64 systems only. I do not have a build/test system for arm. Sorry!
- If download to PC, copy them to pfSense using |Diagnostics|Command Prompt|Upload file or WinSCP or else
- Extract the archives. You will have two files: "usbhid-ups" and "upsmon"
- Stop your NUT service from within pfSense GUI
- Move the files in appropriate directories and make sure to have appropriate file permission or CHMOD 755
@dennypage said in NUT Package (2.8.1 and above):
[23.09.1-RELEASE][root@fw]/root: ls -l /usr/local/sbin/upsmon* /usr/local/libexec/nut/usbhid-ups*
-rwxr-xr-x 1 root wheel 333728 Dec 27 10:14 /usr/local/libexec/nut/usbhid-ups
-rwxr-xr-x 1 root wheel 287088 Nov 1 00:57 /usr/local/libexec/nut/usbhid-ups.org
-rwxr-xr-x 1 root wheel 87760 Dec 27 10:13 /usr/local/sbin/upsmon
-rwxr-xr-x 1 root wheel 68904 Nov 1 00:57 /usr/local/sbin/upsmon.org
[23.09.1-RELEASE][root@fw]/root:- Restart service and voilĂ
- dowload those two archives on your PC or your pfSense directly (from SSH)
-
@NinthWave Thank you for your assistance and help. Just did update and things started fine.
Will monitor. Hopefully we soon get official patch. -
@dennypage, I wonder why it's taking so long to update the official package when this case is so straightforward. I know you are not the one who can do it.
-
Hi guys,
I have a very similar issue, however, not during self tests or something like that, but during "normal" powerfailures too.
I get about the same error messages though, and I can confirm that the UPS is in good state, and the batteries are well too, I also tried with a different OS, with an older NUT Version, where the problem does not occur.
Think is, once power from the wall gets lost, I get the following messages, before the pfsense and all slaves immediately shut down +- 30 Seconds (with >2h of runtime left according to the UPS!)Jan 25 18:33:08 snmp-ups 53155 Requesting UPS [Eaton_5PX2200] to power off, as/if handled by its driver by default (may exit), due to socket protocol request
Jan 25 18:33:03 upsmon 89442 Auto logout and shutdown proceeding
Jan 25 18:33:03 upsmon 89442 Executing automatic power-fail shutdown
Jan 25 18:33:03 upsd 22669 Client local-monitor@127.0.0.1 set FSD on UPS [Eaton_5PX2200]
Jan 25 18:33:03 upsmon 89442 UPS [Eaton_5PX2200] is reported as (administratively) OFF
Jan 25 18:33:03 upsmon 89442 UPS Eaton_5PX2200: administratively OFF or asleep
Jan 25 18:33:03 upsmon 89442 UPS Eaton_5PX2200 on batteryThe UPS Logs are as follows:
2024/01/25 18:32:52 Normal AC NOK
2024/01/25 18:32:52 ABM state resting
2024/01/25 18:32:52 UPS on battery
2024/01/25 18:32:53 System shutdown in 2 h 17 mn 16 s
2024/01/25 18:32:53 Outlet group 1 shutdown in 2 h 49 mn 35 s
2024/01/25 18:32:53 Outlet group 2 shutdown in 2 h 49 mn 35 s
2024/01/25 18:32:53 ABM state Off
2024/01/25 18:32:53 Normal AC voltage out of toleranceI have replaced the two files mentioned in the above posts, however, I don't think that solved the issue, cause this is not during self test mode nor calibration mode, this is a quite normal condition.
In addition: I updated my pfsense box from 2.7.0 to 2.7.2 just one or two weeks ago, about 3 weeks before I had a poweroutage of about 2 hours, which was handled as usual, no unexpected shutdowns, everything ran through.
Now with that version, it takes under a minute and everything wents black.I ran upsc on a different machine, and it seems like during a power Outage, the Eaton 5PX Reports the Status "OFF".
ups.status: OB OFF OBAny ideas?
-
@GeneralGresi said in NUT Package (2.8.1 and above):
Jan 25 18:33:08 snmp-ups 53155 Requesting UPS [Eaton_5PX2200] to power off, as/if handled by its driver by default (may exit), due to socket protocol request
Jan 25 18:33:03 upsmon 89442 Auto logout and shutdown proceeding
Jan 25 18:33:03 upsmon 89442 Executing automatic power-fail shutdown
Jan 25 18:33:03 upsd 22669 Client local-monitor@127.0.0.1 set FSD on UPS [Eaton_5PX2200]
Jan 25 18:33:03 upsmon 89442 UPS [Eaton_5PX2200] is reported as (administratively) OFF
Jan 25 18:33:03 upsmon 89442 UPS Eaton_5PX2200: administratively OFF or asleep
Jan 25 18:33:03 upsmon 89442 UPS Eaton_5PX2200 on batteryThis appears to be the failsafe when loosing connection to the UPS while on battery. This could either be a communication issue between the host and the UPS, or a bug in the snmp-ups driver.
I haven't seen the bug with the snmp-ups driver, but of course that doesn't mean it isn't there. There is a pending update to the NUT package. If you want to try an updated file in the interim, I've attached a newer snmp-ups to this post.
-
Thank's for the fast reply.
Replaced the new snmp-ups file, and restarted the service, however still the same issue.
I unplug the UPS for testing purposes, wait about 30-60 Seconds and the machine shuts down.
I was able to gather a screenshot from the status page:
This looks nearly the same as with a nut 2.8.0 Installation on a Bebian machine during a power loss situation, however, without the forced shutdown Flag.
(nut 2.8.0: ups.status: OB OFF OB)
I would say as this happens concurrently for 5 testruns now, I would rule out the communication loss theory, especially because everything else, including the Debian boxes stay alive - even if I let a longer time pass (as it should be).I don't know much about the underlyings of nut, however I'd say it's the OFF Flag coming from the UPS via SNMP in a power loss situation?
As the UPS doesn't report any error in it's webinterface, I would not see the UPS itself as the reason. -
@GeneralGresi said in NUT Package (2.8.1 and above):
I would say as this happens concurrently for 5 testruns now, I would rule out the communication loss theory, especially because everything else, including the Debian boxes stay alive - even if I let a longer time pass (as it should be).
I don't know much about the underlyings of nut, however I'd say it's the OFF Flag coming from the UPS via SNMP in a power loss situation?
You cannot put too much into the Debian boxes not noticing or caring. They may still be on NUT 2.7.4. NUT 2.8.0 and above are much more attentive and sensitive to some UPS notifications.
Only other thing I can think of is to explore the new OFFDURATION configuration option. This would go into the Additional configuration lines for upsmon.conf section.
-
@dennypage said in NUT Package (2.8.1 and above):
For those of you that are on amd64 based systems (Intel or AMD), and are severely affected by the shutdown on calibration/self-test issue, attached are replacement versions of upsmon and usbhid-ups that you can use until the update is published.
Note that these files are for amd64 systems only. I do not have a build/test system for arm. Sorry!
Having the shutdown bc. calibration issue, i guess, on APC Back-UPS ES 850G2
Event log:
Jan 27 19:40:41 localhost usbhid-ups[580]: ups_status_set: seems that UPS [ups] is in OL+DISCHRG state now. Is it calibrating or do you perhaps want to set 'onlinedischarge' option? Some UPS models (e.g. CyberPower UT series) emit OL+DISCHRG when offline.
Jan 27 19:40:43 localhost usbhid-ups[580]: ups_status_set: seems that UPS [ups] is in OL+DISCHRG state now. Is it calibrating or do you perhaps want to set 'onlinedischarge' option? Some UPS models (e.g. CyberPower UT series) emit OL+DISCHRG when offline.
...
Jan 27 19:40:46 localhost upsmon[95193]: Host sync timer expired, forcing shutdown
Jan 27 19:40:46 localhost upsmon[95193]: Executing automatic power-fail shutdown
Jan 27 19:40:46 localhost upsmon[95193]: Auto logout and shutdown proceeding
...
Jan 27 19:40:51 localhost shutdown[20307]: power-down by root:
Jan 27 19:40:51 localhost usbhid-ups[580]: sock_connect: enabling asynchronous mode (auto)
Jan 27 19:40:51 localhost kernel:
Jan 27 19:40:51 localhost kernel: Netgate pfSense Plus is now shutting down ...@dennypage
Should i use the replacement files?