Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfSense running out of memory and locking up

    Scheduled Pinned Locked Moved General pfSense Questions
    35 Posts 6 Posters 4.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      DannyBoy2k @stephenw10
      last edited by

      @stephenw10 , thank you for the suggestion. I didn't set up a ramdisk myself and, as far as I can tell, there isn't one currently on the system that might have been setup by a package or something.

      [2.4.5-RELEASE][admin@pfSense.localdomain]/root: df -h
      Filesystem                     Size    Used   Avail Capacity  Mounted on
      /dev/ufsid/5cdd38aef2899872    6.9G    1.0G    5.3G    16%    /
      devfs                          1.0K    1.0K      0B   100%    /dev
      /dev/msdosfs/FATBOOT0           34M    2.0M     32M     6%    /boot/u-boot
      /dev/md0                       3.4M    116K    3.0M     4%    /var/run
      devfs                          1.0K    1.0K      0B   100%    /var/dhcpd/dev
      

      I'm going to try to keep a closer eye on the logs to see, if this happens again, when and what the point of failure might be.

      ~Dan

      1 Reply Last reply Reply Quote 1
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        You don't need to be using RAM disks to see that change. It was applied to all arm installs in order to accommodate RAM disks better. You might be hitting a side effect of that.

        Steve

        1 Reply Last reply Reply Quote 0
        • D
          DannyBoy2k
          last edited by

          Well, I didn't apply these changes and it locked up again today, 28 days later. When I logged in with the console, I could do nothing as everything I did from the shell could not be spawned due to lack of memory including "shutdown -r now". Ended having to power cycle the box and pray the file system didn't get corrupted.

          I've put the suggested fix into place and rebooted. Will see if this occurs again.

          1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan
            last edited by Gertjan

            Hi,

            I'm using UPS (NUT ?) myself. No connections issues, as my using an off-the shelfves APC UPS, rtahter classic.
            Memory (max 2 Gbytes) doesn't change.

            Except for BlockerNG-devel, which could eat up a lot of memory, I'm not using anything sepcial :

            acme 	0.6.8 	
            Avahi 	2.1_1 	
            Cron 	0.3.7_4 	
            freeradius3 	0.15.7_16 	
            Notes 	0.2.9_2 	
            nut 	2.7.4_7 	
            openvpn-client-export 	1.4.23_1 	
            pfBlockerNG-devel 	2.2.5_33 	
            RRD_Summary 	2.0 	
            Shellcmd 	1.0.5_1 	
            System_Patches
            

            What happens when you stop / remove NUT/UPS ?
            If needed, go bare bone, and test up from there.

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            bmeeksB 1 Reply Last reply Reply Quote 0
            • D
              DannyBoy2k
              last edited by DannyBoy2k

              I was trying to figure out how to list the packages from the command line as it appears you did, but haven't found it yet. This is all I show as installed from the web GUI:

              GUI packages

              I'd rather not disable nut since my box wouldn't shutdown properly in the event of a power outage. Same as you, I'm just using an off-the-shelf UPS, the CyberPower CP685AVRG. I can't decide if nut is actually the problem or the symptom of whatever is going on. It does seem to lose connectivity a few times a day, but regain it quickly. That cycling seems to release (or maybe not?) resources and reacquire them. I have wondered if it might not actually be releasing anything and consuming more and more over time, but I have no proof of that. Memory usage as remained low according to the GUI.

              ~Dan

              GertjanG 1 Reply Last reply Reply Quote 0
              • D
                DannyBoy2k
                last edited by DannyBoy2k

                Interestingly, the log messages are slightly different from last time. @stephenw10 , would these indicate that your suggested fix might indeed be the problem:

                Jul  5 01:14:36 pfSense upsmon[90689]: Communications with UPS ups established
                Jul  5 01:14:36 pfSense kernel: vm_thread_new: kstack allocation failed
                Jul  5 01:14:36 pfSense upsmon[16358]: Can't invoke wall: Cannot allocate memory
                Jul  5 01:14:44 pfSense kernel: vm_thread_new: kstack allocation failed
                Jul  5 01:15:30 pfSense kernel: vm_thread_new: kstack allocation failed
                Jul  5 01:17:08 pfSense kernel: vm_thread_new: kstack allocation failed
                Jul  5 01:18:45 pfSense kernel: vm_thread_new: kstack allocation failed
                Jul  5 01:19:11 pfSense kernel: vm_thread_new: kstack allocation failed
                Jul  5 01:19:44 pfSense upsd[93206]: Data for UPS [ups] is stale - check driver
                Jul  5 01:19:46 pfSense upsd[93206]: UPS [ups] data is no longer stale
                Jul  5 01:20:31 pfSense upsd[93206]: Data for UPS [ups] is stale - check driver
                

                And when I was trying to reboot from the web GUI:

                Jul  6 11:08:57 pfSense php-fpm[365]: /diag_reboot.php: Stopping all packages.
                Jul  6 11:08:57 pfSense kernel: vm_thread_new: kstack allocation failed
                Jul  6 11:08:57 pfSense php-fpm[365]: /diag_reboot.php: The command '/usr/local/etc/rc.d/nut.sh stop' returned exit code '2', the output was 'stopping NUT /usr/local/etc/rc.d/nut.sh: Cannot fork: Cannot a
                llocate memory' 
                Jul  6 11:08:57 pfSense kernel: vm_thread_new: kstack allocation failed
                Jul  6 11:08:58 pfSense kernel: vm_thread_new: kstack allocation failed
                Jul  6 11:08:58 pfSense php-fpm[365]: /diag_reboot.php: The command 'nohup /etc/rc.reboot > /dev/null 2>&1 &' returned exit code '-1', the output was ''
                

                No logs are present from when I was trying to reboot from the console, but I was seeing the same kind of messages echoed. It couldn't spawn any of the commands I was issuing.

                ~Dan

                1 Reply Last reply Reply Quote 0
                • bmeeksB
                  bmeeks
                  last edited by bmeeks

                  My bet is the upsd driver. The message says your system is running out of kstack memory. This is special, reserved memory for the kernel stack. I think that is a fixed allocated chunk of memory, so it very well may not show up as being "consumed" in the Dashboard memory consumption indicator. Or stated another way, it is probably part of the base system memory area and the entire block is accounted for once and processes use small bits of that preallocated chunk when running.

                  From what I understand from the limited research I did, that block is not expandable. So when a process consumes enough of it, that will kill the kernel because other processes that need some space can't get it.

                  My guess is the upsd driver is crashing in some fashion (or some portion of it is crashing) and leaking kstack memory each time it crashes. After enough days of crashing, all of the kstack memory is consumed via those "leaks".

                  I know it can be dangerous, especially if power at your location is flaky, but I would test with nut removed and the UPS unplugged from the USB port to see if the kstack errors go away. It will take several days to know.

                  D S 2 Replies Last reply Reply Quote 0
                  • GertjanG
                    Gertjan @DannyBoy2k
                    last edited by

                    @DannyBoy2k said in pfSense running out of memory and locking up:

                    from the command line as it appears you did

                    I have to deceive you here.

                    I copied with my mouse the follow part of the GUI dashboard :

                    f6e9ef83-9fdc-4c99-a96b-31434752b8f6-image.png
                    I copied the text, and used the format tool

                    3b01dce6-4f74-450d-8ebd-83e11c9b36ac-image.png

                    to make it readable for humans.
                    ( and thus using 190 bytes storage in stead of several Kilo of bytes for the image)

                    Btw : command line version :

                    ls -al /usr/local/pkg
                    

                    will list you what you have on your pfSense.
                    It's a bit messy, but can be useful.

                    The directory list doesn't show what is up to date, actually activated etc.

                    No "help me" PM's please. Use the forum, the community will thank you.
                    Edit : and where are the logs ??

                    1 Reply Last reply Reply Quote 0
                    • bmeeksB
                      bmeeks @Gertjan
                      last edited by

                      @Gertjan:
                      Are you running nut on an SG-3100? I tried a number of times to get it running on an SG-3100 with a CyberPower UPS and was never successful in getting the UPS to be recognized.

                      GertjanG D 2 Replies Last reply Reply Quote 0
                      • GertjanG
                        Gertjan @bmeeks
                        last edited by

                        @bmeeks said in pfSense running out of memory and locking up:

                        @Gertjan:

                        You mean @DannyBoy2k

                        No "help me" PM's please. Use the forum, the community will thank you.
                        Edit : and where are the logs ??

                        bmeeksB 1 Reply Last reply Reply Quote 0
                        • bmeeksB
                          bmeeks @Gertjan
                          last edited by bmeeks

                          @Gertjan said in pfSense running out of memory and locking up:

                          @bmeeks said in pfSense running out of memory and locking up:

                          @Gertjan:

                          You mean @DannyBoy2k

                          No, I was asking you since you mentioned nut running okay. Not trying to change the thread topic, but wondering if the SG-3100 due to its ARM architecture acts weird with some peripherals.

                          @DannyBoy2k has it sort of running, but with the serious issue he posted about.

                          GertjanG 1 Reply Last reply Reply Quote 0
                          • D
                            DannyBoy2k @bmeeks
                            last edited by

                            @bmeeks , yes, I was able to get nut running with the CyperPower just using the usb driver. It's just that is seems to occasionally (maybe once a day) need to restart/reconnect to it.

                            ~Dan

                            bmeeksB 1 Reply Last reply Reply Quote 0
                            • GertjanG
                              Gertjan @bmeeks
                              last edited by

                              @bmeeks said in pfSense running out of memory and locking up:

                              No, I was asking you

                              I'm using NUT (pfSense) and a bare bone Intel PC's from the last decade - APC UPS's only using their "USB" ports.

                              No "help me" PM's please. Use the forum, the community will thank you.
                              Edit : and where are the logs ??

                              bmeeksB 1 Reply Last reply Reply Quote 0
                              • bmeeksB
                                bmeeks @DannyBoy2k
                                last edited by

                                @DannyBoy2k said in pfSense running out of memory and locking up:

                                @bmeeks , yes, I was able to get nut running with the CyperPower just using the usb driver. It's just that is seems to occasionally (maybe once a day) need to restart/reconnect to it.

                                ~Dan

                                Okay, but it appears to not be running well. Should not disconnect. I was never able to get it to work, so have that firewall for now running on the UPS but "blind" to battery exhaustion. Not ideal!

                                Mentioned this in your thread to say perhaps there are issues with the USB driver for UPS/nut that manifest themselves in various ways.

                                1 Reply Last reply Reply Quote 1
                                • bmeeksB
                                  bmeeks @Gertjan
                                  last edited by bmeeks

                                  @Gertjan said in pfSense running out of memory and locking up:

                                  @bmeeks said in pfSense running out of memory and locking up:

                                  No, I was asking you

                                  I'm using NUT (pfSense) and a bare bone Intel PC's from the last decade - APC UPS's only using their "USB" ports.

                                  Ah! I've never had issues with my Intel-based firewalls and have used both APC and other UPS boxes. The SG-3100 was the first one to ever kick my butt! It's also the first ARM architecture firewall I've encountered.

                                  S 1 Reply Last reply Reply Quote 0
                                  • D
                                    DannyBoy2k @bmeeks
                                    last edited by

                                    @bmeeks , thank you for the thoughts. I posted a message in the pfsense packages Category to see if it leads anywhere:
                                    https://forum.netgate.com/topic/155094/possible-memory-leak-in-nut-package

                                    ~Dan

                                    1 Reply Last reply Reply Quote 0
                                    • bmeeksB
                                      bmeeks
                                      last edited by

                                      Review my first post in this thread where I mention the kstack memory allocation error. My bet is still on the USB driver for the UPS being the problem. If you can, disable that driver completely and see if stability returns. Might take a month to be sure since you went as far as 28 or 29 days between lockups.

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        serbus @bmeeks
                                        last edited by

                                        @bmeeks said in pfSense running out of memory and locking up:

                                        My guess is the upsd driver is crashing in some fashion (or some portion of it is crashing) and leaking kstack memory each time it crashes. After enough days of crashing, all of the kstack memory is consumed via those "leaks".

                                        I know it can be dangerous, especially if power at your location is flaky, but I would test with nut removed and the UPS unplugged from the USB port to see if the kstack errors go away. It will take several days to know.

                                        Hello!

                                        I use a pi running upsd and netgates attaching to it with upsmon (Remote NUT Server). This could be a workaround while exploring local upsd issues.

                                        John

                                        Lex parsimoniae

                                        D 1 Reply Last reply Reply Quote 2
                                        • S
                                          serbus @bmeeks
                                          last edited by

                                          @bmeeks said in pfSense running out of memory and locking up:

                                          The SG-3100 was the first one to ever kick my butt!

                                          Hello!

                                          With a sample size of one...

                                          https://forum.netgate.com/topic/154674/nut-and-apc-smart-ups-750-rm-usb

                                          John

                                          Lex parsimoniae

                                          bmeeksB 1 Reply Last reply Reply Quote 0
                                          • bmeeksB
                                            bmeeks @serbus
                                            last edited by bmeeks

                                            @serbus said in pfSense running out of memory and locking up:

                                            @bmeeks said in pfSense running out of memory and locking up:

                                            The SG-3100 was the first one to ever kick my butt!

                                            Hello!

                                            With a sample size of one...

                                            https://forum.netgate.com/topic/154674/nut-and-apc-smart-ups-750-rm-usb

                                            John

                                            I tried a number of things with that SG-3100, and never did get the UPS properly recognized. It is something to do with device file permissions I suspect. I did not want to dive into a bunch of repetitive reboots and tinkering with the base OS at the time. I've never had any issues at all with either nut or apcupsd on several iterations of Intel-based hardware with pfSense. That particular SG-3100 is currently serving duty as a church firewall.

                                            I found another post or two here in the past about assigning specific permissions to one or more of the /dev psuedo files/directories that get created for peripherals, but as I said above I did not want to get off into those weeds.

                                            The ARM architecture of the SG-1000, SG-1100 and SG-3100 appliances has turned out to be shall we just say "interesting" ... ☺. Lots of Some legacy C source code programs that run fine on Intel hardware will crash on the ARM stuff due to memory alignment errors. Other subtle differences in the internal architecture can also contribute to "weirdness" with some software on the ARM devices.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.