Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfSense running out of memory and locking up

    Scheduled Pinned Locked Moved General pfSense Questions
    35 Posts 6 Posters 3.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • bmeeksB
      bmeeks @Gertjan
      last edited by bmeeks

      @Gertjan said in pfSense running out of memory and locking up:

      @bmeeks said in pfSense running out of memory and locking up:

      @Gertjan:

      You mean @DannyBoy2k

      No, I was asking you since you mentioned nut running okay. Not trying to change the thread topic, but wondering if the SG-3100 due to its ARM architecture acts weird with some peripherals.

      @DannyBoy2k has it sort of running, but with the serious issue he posted about.

      GertjanG 1 Reply Last reply Reply Quote 0
      • D
        DannyBoy2k @bmeeks
        last edited by

        @bmeeks , yes, I was able to get nut running with the CyperPower just using the usb driver. It's just that is seems to occasionally (maybe once a day) need to restart/reconnect to it.

        ~Dan

        bmeeksB 1 Reply Last reply Reply Quote 0
        • GertjanG
          Gertjan @bmeeks
          last edited by

          @bmeeks said in pfSense running out of memory and locking up:

          No, I was asking you

          I'm using NUT (pfSense) and a bare bone Intel PC's from the last decade - APC UPS's only using their "USB" ports.

          No "help me" PM's please. Use the forum, the community will thank you.
          Edit : and where are the logs ??

          bmeeksB 1 Reply Last reply Reply Quote 0
          • bmeeksB
            bmeeks @DannyBoy2k
            last edited by

            @DannyBoy2k said in pfSense running out of memory and locking up:

            @bmeeks , yes, I was able to get nut running with the CyperPower just using the usb driver. It's just that is seems to occasionally (maybe once a day) need to restart/reconnect to it.

            ~Dan

            Okay, but it appears to not be running well. Should not disconnect. I was never able to get it to work, so have that firewall for now running on the UPS but "blind" to battery exhaustion. Not ideal!

            Mentioned this in your thread to say perhaps there are issues with the USB driver for UPS/nut that manifest themselves in various ways.

            1 Reply Last reply Reply Quote 1
            • bmeeksB
              bmeeks @Gertjan
              last edited by bmeeks

              @Gertjan said in pfSense running out of memory and locking up:

              @bmeeks said in pfSense running out of memory and locking up:

              No, I was asking you

              I'm using NUT (pfSense) and a bare bone Intel PC's from the last decade - APC UPS's only using their "USB" ports.

              Ah! I've never had issues with my Intel-based firewalls and have used both APC and other UPS boxes. The SG-3100 was the first one to ever kick my butt! It's also the first ARM architecture firewall I've encountered.

              S 1 Reply Last reply Reply Quote 0
              • D
                DannyBoy2k @bmeeks
                last edited by

                @bmeeks , thank you for the thoughts. I posted a message in the pfsense packages Category to see if it leads anywhere:
                https://forum.netgate.com/topic/155094/possible-memory-leak-in-nut-package

                ~Dan

                1 Reply Last reply Reply Quote 0
                • bmeeksB
                  bmeeks
                  last edited by

                  Review my first post in this thread where I mention the kstack memory allocation error. My bet is still on the USB driver for the UPS being the problem. If you can, disable that driver completely and see if stability returns. Might take a month to be sure since you went as far as 28 or 29 days between lockups.

                  1 Reply Last reply Reply Quote 0
                  • S
                    serbus @bmeeks
                    last edited by

                    @bmeeks said in pfSense running out of memory and locking up:

                    My guess is the upsd driver is crashing in some fashion (or some portion of it is crashing) and leaking kstack memory each time it crashes. After enough days of crashing, all of the kstack memory is consumed via those "leaks".

                    I know it can be dangerous, especially if power at your location is flaky, but I would test with nut removed and the UPS unplugged from the USB port to see if the kstack errors go away. It will take several days to know.

                    Hello!

                    I use a pi running upsd and netgates attaching to it with upsmon (Remote NUT Server). This could be a workaround while exploring local upsd issues.

                    John

                    Lex parsimoniae

                    D 1 Reply Last reply Reply Quote 2
                    • S
                      serbus @bmeeks
                      last edited by

                      @bmeeks said in pfSense running out of memory and locking up:

                      The SG-3100 was the first one to ever kick my butt!

                      Hello!

                      With a sample size of one...

                      https://forum.netgate.com/topic/154674/nut-and-apc-smart-ups-750-rm-usb

                      John

                      Lex parsimoniae

                      bmeeksB 1 Reply Last reply Reply Quote 0
                      • bmeeksB
                        bmeeks @serbus
                        last edited by bmeeks

                        @serbus said in pfSense running out of memory and locking up:

                        @bmeeks said in pfSense running out of memory and locking up:

                        The SG-3100 was the first one to ever kick my butt!

                        Hello!

                        With a sample size of one...

                        https://forum.netgate.com/topic/154674/nut-and-apc-smart-ups-750-rm-usb

                        John

                        I tried a number of things with that SG-3100, and never did get the UPS properly recognized. It is something to do with device file permissions I suspect. I did not want to dive into a bunch of repetitive reboots and tinkering with the base OS at the time. I've never had any issues at all with either nut or apcupsd on several iterations of Intel-based hardware with pfSense. That particular SG-3100 is currently serving duty as a church firewall.

                        I found another post or two here in the past about assigning specific permissions to one or more of the /dev psuedo files/directories that get created for peripherals, but as I said above I did not want to get off into those weeds.

                        The ARM architecture of the SG-1000, SG-1100 and SG-3100 appliances has turned out to be shall we just say "interesting" ... ☺. Lots of Some legacy C source code programs that run fine on Intel hardware will crash on the ARM stuff due to memory alignment errors. Other subtle differences in the internal architecture can also contribute to "weirdness" with some software on the ARM devices.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          If there is some memory leak I would expect to be able to see it somewhere before it actually locks up.

                          The first place I've look in the Monitoring Graphs for System - Memory. When we have seen bugs like that before you can see the usage ramp up there.

                          Steve

                          D 1 Reply Last reply Reply Quote 0
                          • D
                            DannyBoy2k @stephenw10
                            last edited by

                            @stephenw10 , I am almost always at 7% of 2020 MiB when I log into the web GUI. I'm barely using the features of this box. A couple of VLANs, DNS, DHCP. That's about it. The only indication I've ever found of something being wrong are the systems logs I've pasted above.

                            ~Dan

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Hmm. Well I guess if it is kernel memory that's harder to see.... try checking the output of sysctl vm.kmem_map_free.

                              First thing you will see is that on the 32bit arm system that is much smaller than other architectures so far easier to hit an issue. See if hat value decreases over time.

                              Steve

                              D 1 Reply Last reply Reply Quote 0
                              • D
                                DannyBoy2k @stephenw10
                                last edited by DannyBoy2k

                                @stephenw10 , I explored the web GUI a bit more and found the Status: Monitoring section. This seems interesting, but I have no idea what it means. I mean, I can see that all free memory suddenly became unavailable, but no idea why.
                                Memory for 1 month

                                I ran your suggested command:

                                [2.4.5-RELEASE][admin@pfSense.localdomain]/root: sysctl vm.kmem_map_free
                                vm.kmem_map_free: 141295616
                                

                                I'll try to log in every now and again and continue to monitor.

                                ~Dan

                                1 Reply Last reply Reply Quote 0
                                • D
                                  DannyBoy2k
                                  last edited by

                                  Here is a higher fidelity snapshot around the period of interest. It appears it just happened very suddenly, not ramping up over time.
                                  Higher fidelity image

                                  ~Dan

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    That is it failing to log anything, likely when it exhausted the kernel memory.

                                    However before that you can see the free memory ramping down. If you click on the orange 'free' button to de-select it that will show the other data in more detail.
                                    Any idea what happened on June 17th to free some memory?

                                    Steve

                                    D 1 Reply Last reply Reply Quote 0
                                    • D
                                      DannyBoy2k @stephenw10
                                      last edited by

                                      @stephenw10 , unfortunately, no. I really don't do anything on this box; I just let it do its thing. I pretty much only log in when I suddenly lose Internet connectivity. What's interesting is, based on these graphs, I'm in a pretty bad state for several days before the box ceases to route.

                                      Here is the graph without the free memory line:
                                      Graph Without Free Memory

                                      ~Dan

                                      1 Reply Last reply Reply Quote 0
                                      • D
                                        DannyBoy2k @serbus
                                        last edited by

                                        @serbus , thank you for putting out this idea. I'm definitely considering it.

                                        Cheers,
                                        Dan

                                        1 Reply Last reply Reply Quote 1
                                        • S
                                          stompro
                                          last edited by

                                          Hello, just a meeto post.

                                          SG-3100 firewalls running 2.4.5-P1.

                                          I have 9 SG-3100 boxes that don't run Nut, they do use ramdisks, no problem with those.

                                          I have another 6 SG-3100 boxes that have Nut setup with USB Cyberpower OR500.

                                          This morning I noticed that two that were installed on the same day, 13 days ago, both failed an automated config backup because ssh failed. I was able to reboot one of them, the other I'm still fighting with. These are 45 and 60 miles away, so I cannot just power cycle them.

                                          I'm trying to figure out how to tell which processes are using kmem.

                                          Josh

                                          Hardware used: Alix 2D13 X 10, APU2D4 X 10, SG-2200 X 10, SG-2440 X 4

                                          1 Reply Last reply Reply Quote 1
                                          • S
                                            stompro
                                            last edited by

                                            Speculating here, I had one of the SG-3100 boxes run into the IPV6 bogons issue, where it couldn't load the bogonsv6 table because it didn't have enough memory. Even after upping the max table entries value... so I'm wondering if the bogons tables use kmem also? Maybe my setup of ramdisks + arpwatch + nut didn't leave enough kmem for the bogon table refresh, which is why it didn't matter how much I increased max table entries.

                                            I'm wondering if this was triggered after a month because the bogonv6 table gets reloaded via cron once a month, and takes 2x the memory for a reload(I read that in one of the threads about the bogonsv6 table reload issue).

                                            I've since disabled ipv6 on my SG3100 boxes, so maybe that will take care of this for me? I didn't have it disabled on the two that locked up today.

                                            <someone upvote me so I can get enough reputation to change my old signature>

                                            Josh

                                            Hardware used: Alix 2D13 X 10, APU2D4 X 10, SG-2200 X 10, SG-2440 X 4

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.