Possible memory leak in nut package?
-
I've been dealing with my Netgate SG-3100 running out of some area of memory every couple of weeks (last one took 28 days) and having to jump through hoops to get it safely rebooted. This last time, it wouldn't even spawn a
shutdown -r now
command from a console shell. I've documented my issue to date under the Geneal pfsense category, but, it's looking like it might be a possible issue with nut or a driver it's using.https://forum.netgate.com/topic/154284/pfsense-running-out-of-memory-and-locking-up
Specifically, if you look at the log messages I posted yesterday (7/6/2020) in that topic, there are references to running out of kstack memory.
Jul 5 01:14:36 pfSense upsmon[90689]: Communications with UPS ups established Jul 5 01:14:36 pfSense kernel: vm_thread_new: kstack allocation failed Jul 5 01:14:36 pfSense upsmon[16358]: Can't invoke wall: Cannot allocate memory Jul 5 01:14:44 pfSense kernel: vm_thread_new: kstack allocation failed Jul 5 01:15:30 pfSense kernel: vm_thread_new: kstack allocation failed Jul 5 01:17:08 pfSense kernel: vm_thread_new: kstack allocation failed Jul 5 01:18:45 pfSense kernel: vm_thread_new: kstack allocation failed Jul 5 01:19:11 pfSense kernel: vm_thread_new: kstack allocation failed Jul 5 01:19:44 pfSense upsd[93206]: Data for UPS [ups] is stale - check driver Jul 5 01:19:46 pfSense upsd[93206]: UPS [ups] data is no longer stale Jul 5 01:20:31 pfSense upsd[93206]: Data for UPS [ups] is stale - check driver
Someone speculated that it could be something in nut or a driver nut is using that is causing this issue.
~Dan
-
Hello, I seem to be running into the same thing, and can confirm that it seems to be related to using Nut.
I have 9 SG-3100's that don't have UPS's so don't run Nut.
And 6 that are hooked up to cyberpower UPS's and use nut + usbhid to monitor the UPS. Two firewalls locked up today out of memory after 13 days of uptime.
I am using memdisks on all of them, so I probably have less available kmem than the poster, so my lockups are happening faster.
Sample of sysctl vm.kmem_map_free on firewalls not running nut All have ramdisk setup for /tmp and /var
2.4.4-Release-p3 vm.kmem_map_free: 110714880 2.4.5-RELEASE-p1 vm.kmem_map_free: 3776512 2.4.5-RELEASE-p1 vm.kmem_map_free: 4608000
Sample of values for 2.4.5-Release-P1 SG-3100 that are running nut
vm.kmem_map_free: 4722688 (Nut enabled) vm.kmem_map_free: 1785856 (Nut disabled???)
Here is the value of a 2.4.5-Release-p1 SG-2220, same amount of system ram, but vastly different numbers.
vm.kmem_map_free: 1999176794
I'm also running arpwatch on the firewalls that have Nut, I just tried disabling arpwatch, and that freed up quite a bit of kmem? So the reason I hit this issue faster may also be because arpwatch uses more kmem.