Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Fault tolerance on return of power

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    21 Posts 4 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fg @stephenw10
      last edited by

      @stephenw10

      Hey Stevo,

      I did as you suggested. My 3100 was using 31% of memory of 2 gigs. My ramdisk recommendation was minimum 40 for /tmp and 60 for /var. So I set it for 100 megs for /tmp an 150 for /var (keeping that 50% more ratio on /var). I set the backups for dhcp/logs/RRD/captive-portal-data for 4 hours. And re-booted.

      Everything came up roses except pfblocker-ng daemons hadn't restarted. I tried to add a wake-on-lan widget and the 3100 crashed like a bull in the bullring when the el estoque is delivered. :0)

      I had to putty in and turn back time.

      It did come back up fine with all the daemons running. This has been my problem with pfsense since going back many years now. It is not a fault tolerant system. Between this example and any number of updates for Pfsense Communist and Pfsense+ I've had to get back up to speed with product and restore what was broken on-site. I don't think I can rely on offsite management. I'm not that good. Snort has always been an issue.

      When you leave the system alone it runs a long time. And presumably well but since I don't touch it I'm of course presuming.

      Thanks for your input. I'm going to try one more time and see what happens. But if you got any secrets I'd appreciate you sharing them with me.

      S 1 Reply Last reply Reply Quote 0
      • F
        fg @stephenw10
        last edited by

        @stephenw10

        Well, I tried 3 times more with the last try using the minimum 40 and 60 megs.

        This looks like the culprit.

        RAMDISK.PNG

        Something about an uncaught error in line 76 of interfaces.inc which comports with the experience on the matter.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Hmm, you simply enabled ramdisks and then rebooted and it failed to come back up each time?

          Or is that a forced power cycle each time?

          F 2 Replies Last reply Reply Quote 0
          • MaxK 0M
            MaxK 0 @fg
            last edited by

            @fg I have a APC SCL500RM1UC UPS. A USB cable connects it to my 3100 and I have the Apcupsd package installed on my 3100. A HALT is issued to the 3100 when the UPS remaining runtime is 3 minutes,

            My 3100 has always rebooted successfully after a power failure.

            F 1 Reply Last reply Reply Quote 3
            • S
              SteveITS Galactic Empire @fg
              last edited by

              @fg https://docs.netgate.com/pfsense/en/latest/config/advanced-misc.html#ram-disk-sizes

              “The suggested sizes on the page are an absolute minimum and often much larger sizes are required.”

              There’s a note on the setting page also. On a 3100 we use 128 and 512 as I recall. Of course it depends entirely on what is using it. Some pfB lists don’t fit in 1GB.

              Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
              When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
              Upvote 👍 helpful posts!

              F 1 Reply Last reply Reply Quote 0
              • F
                fg @stephenw10
                last edited by

                @stephenw10 It would come back up so you could terminal in with the cable to fix things but the website GUI even restarted wouldn't work and internet access went off line. It kind of goes with the uncaught error regarding line 76 in the file interfaces.inc.

                S stephenw10S 2 Replies Last reply Reply Quote 0
                • F
                  fg @stephenw10
                  last edited by

                  This post is deleted!
                  1 Reply Last reply Reply Quote 0
                  • S
                    SteveITS Galactic Empire @fg
                    last edited by

                    @fg Wait, so you're saying if you pull power you get that error at boot every time? That's not normal. We had a bunch of 3100s in the field and I'm sure some have lost power over the years.

                    What happens if you Diagnostics/Reboot?

                    Pulling power isn't great but it's not usually fatal unless there is file system corruption from that incident.
                    https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-check.html

                    Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                    When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                    Upvote 👍 helpful posts!

                    1 Reply Last reply Reply Quote 0
                    • F
                      fg @MaxK 0
                      last edited by

                      @MaxK-0

                      This might be my solution.

                      I guess the power goes out. The Apcupsd package then monitors the UPS so that when three minutes are left it halts the 3100 but the 3100 stays powered on, right? Halt doesn't cut power so the battery keeps draining and then finally is used up. In this state when the power comes back on the UPS sends a "wake-on-lan" signal through the USB cable and the 3100 comes back on? What if the UPS doesn't get completely used up when the power comes back on while the 3100 is still lit up on halt? Does the "wake on lan" via usb cable still restarts the 3100?

                      Thank you for you help on this.

                      S 1 Reply Last reply Reply Quote 0
                      • S
                        SteveITS Galactic Empire @fg
                        last edited by

                        @fg The “Hibernate UPS on powerfail” checkbox I mentioned above should turn off the UPS after the pfSense shutdown happens. That way power is cut to the 3100. Then when power returns the UPS turns on and the 3100 has power again.

                        ebd00ddf-13ff-4f6f-bb8d-43a1f8dc36e6-image.png

                        I believe that defaults to unchecked.

                        WOL is for across the network. I don't think Netgate routers support WOL. (One can send a WOL to a MAC address from pfSense IIRC).

                        Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                        When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                        Upvote 👍 helpful posts!

                        F 1 Reply Last reply Reply Quote 1
                        • F
                          fg @SteveITS
                          last edited by

                          @SteveITS I agree with you. I did go with 100 and 150 before going back down to 40 and 60. Testing to see the issue. Like I said last night my memory usage was 31% and right now 15%. I would have increased it to higher memory reservation pending results.

                          I wonder why the error I saw was about line 76 in the interfaces.inc file that was reported. Do the interfaces cut off when too much memory is alloted?

                          S 1 Reply Last reply Reply Quote 0
                          • F
                            fg @SteveITS
                            last edited by

                            @SteveITS Thanks. You've cleared it up very well for me. Take care if I don't come back to pick your brains some more.

                            1 Reply Last reply Reply Quote 0
                            • S
                              SteveITS Galactic Empire @fg
                              last edited by

                              @fg I don't know about the interface error you're seeing.

                              As of a few versions (years) ago pfSense uses tmpfs for RAM disks so no longer preallocates the RAM....it is allocated when files are written. We used 128 and 512 MB on 3100s without issue but YMMV of course. (like I wrote above at least one big pfBlocker list uses well over 1 GB)

                              Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                              When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                              Upvote 👍 helpful posts!

                              F 1 Reply Last reply Reply Quote 1
                              • stephenw10S
                                stephenw10 Netgate Administrator @fg
                                last edited by

                                @fg said in Fault tolerance on return of power:

                                It kind of goes with the uncaught error regarding line 76 in the file interfaces.inc.

                                But just to be clear you are seeing that error after simply rebooting? Or after forcibly power cycling it?

                                Because there should never happen by simply rebooting it.

                                And, yes, if a UPS does not remove power to the 3100 it will not boot up after being halted. It needs to be power cycled at that point.

                                F 1 Reply Last reply Reply Quote 1
                                • F
                                  fg @SteveITS
                                  last edited by

                                  @SteveITS

                                  Thanks Steve. I'm going to be taking the 3100 up North for that property and going back to community on an Atom 27xx board I built some time ago. More hardware capacity.

                                  I like the 3100 but the 16 gig RAM atom will do me better short term until I get the 4xxx Netgate. ;-)

                                  Thanks for all your help.

                                  1 Reply Last reply Reply Quote 1
                                  • F
                                    fg @stephenw10
                                    last edited by

                                    @stephenw10

                                    Well, to be clear... it was on rebooting I got the error message. It has been explained to me that tmpfs(?) has taken over ramdisk duties. With only two gigs I'm more comfortable going back to a community appliance I built. Taking the 3100 to a cabin in the woods and letting time sort out what NextGate I might get. Thanks for all the help. Couldn't be more appreciated.

                                    1 Reply Last reply Reply Quote 3
                                    • MaxK 0M MaxK 0 referenced this topic on
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.