Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    25.07 upgrade on Netgate 4100 gets rolled back

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    24 Posts 5 Posters 6.6k Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • JeGrJ Offline
      JeGr LAYER 8 Moderator @stephenw10
      last edited by

      @stephenw10 said in 25.07 upgrade on Netgate 4100 gets rolled back:

      /var is a shared mount point. Do you not see that?

      You wouldn't see the upgrade log but that is only written out to /conf after the upgrade script completes.

      No, I only had logs of the old system coming back up. Will have hands on there in about 15h and hopefully see more then.

      But the system log I got had nothing of the other snapshot booting up or throwing errors. That's why I was curious if there are any commands or something I can use to get into the other BE to check on that.

      Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

      If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

      1 Reply Last reply Reply Quote 0
      • N Offline
        NOCling
        last edited by NOCling

        I run into it with my 2100 on a normal reboot with 24.11.
        With pfBlockerNG and RAM Disk, the default Time for Boot verification is too short.

        I increased it massively, 1800 for my 6100 and 3000 for my 2100.

        Netgate 6100 & Netgate 2100

        1 Reply Last reply Reply Quote 1
        • stephenw10S Offline
          stephenw10 Netgate Administrator
          last edited by

          You can mount the BE by simply running: bectl mount <be_name>. It will show you a mount point in /tmp.

          Just be sure to bectl unmount it.

          JeGrJ 1 Reply Last reply Reply Quote 0
          • JeGrJ Offline
            JeGr LAYER 8 Moderator @stephenw10
            last edited by JeGr

            @stephenw10 After having a look at the device from remote, it isn't obvious what is happening. The logs show rebooting from 24.11, then ~13min nothing in the logs, then the boot back to 24.11 again. So it seems 25.07 doesn't get to the stage to actually write some logs. But whatever takes the 10-12min after the install, there seems no trace of it.

            We've arranged to have someone hands-on on site ASAP that can offer us a serial console via LTE or another uplink so we can look what happens after the first reboot. So far no real idea what is happening. Also strange as this is one 4100, other 4100 have rebooted and installed fine. So no real indicator right now.

            Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

            If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

            1 Reply Last reply Reply Quote 0
            • stephenw10S Offline
              stephenw10 Netgate Administrator
              last edited by

              Hmm, strange indeed.

              JeGrJ 1 Reply Last reply Reply Quote 0
              • JeGrJ Offline
                JeGr LAYER 8 Moderator @stephenw10
                last edited by

                @stephenw10 said in 25.07 upgrade on Netgate 4100 gets rolled back:

                Hmm, strange indeed.

                To follow up on that as we finally got around to have some hands-on on site in the US locations that happened it boiled down to two points:

                1. One box had the above problem because they had 2 large old snapshots and were old 4100 boxes with the very little eMMC storage. So after cleaning up those snaps the new update was going fine. Funny though that they didn't hang while installing but somehow when booting the new snapshot but OK.

                2. The second box took more tries but the problem was ...
                  drumroll
                  pfBlockerNG!
                  The misbehavior mentioned multiple times e.g. in this post that pfBNG creates useless audit snapshots of empty config.xml diffs and the audit bug, that somehow triggered more then the configured amount of configs to be stored was the root cause of the problem.
                  The box in question had 121,387 config-<timestamp>.xml files in /cf/conf/backup directory that accounted to around 1.5G in files. But it wasn't the disk space that were the problem but somehow the snapshot booted and wouldn't be able to access /cf/conf or cf/conf/backup because the process that tried to do something didn't succeed as the directory in question had too many files that broke some shell script magic.

                After seeing the bootup breaking at that point, we booted back to the old snapshot, deleted the failed update snap and also deleted all old backups in /cf/conf/backup so only the configured 50 last backup steps were still available. Then we re-did the update that then went through without a problem.

                So not directly an update problem but a bug in pfBNG + config history audit management that resulted in thousands of backup files created (and not cleaned up) that made the /cf/conf subset unavailable while upgrade/booting into it.

                Hope that helps!

                Cheers!

                Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

                If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                S 1 Reply Last reply Reply Quote 2
                • stephenw10S Offline
                  stephenw10 Netgate Administrator
                  last edited by

                  Urgh, painful. Thanks for following up.

                  Yup that backup config bug is resolved in current versions but doesn't help at upgrades.

                  V 1 Reply Last reply Reply Quote 0
                  • S Offline
                    SteveITS Rebel Alliance @JeGr
                    last edited by

                    It'd be helpful/preventative, I think, if the upgrade would do a quick check "are there more than ___ backup config files in the directory?" before upgrading. Not sure if that should be "more than the configured number, or more than 500, or what, but a few thousand is enough to cause a long (10m?) page load and eventual timeout loading the config history page as it tries to delete them. Perhaps the warning could link to a troubleshooting document page.

                    Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                    When upgrading, allow 10-15 minutes to reboot, or more depending on packages, and device or disk speed.
                    Upvote ๐Ÿ‘ helpful posts!

                    JeGrJ 1 Reply Last reply Reply Quote 0
                    • JeGrJ Offline
                      JeGr LAYER 8 Moderator @SteveITS
                      last edited by

                      @SteveITS Indeed. Also when running a full update like firmware update -> 25.07 that could perhaps be an additional sanity check to perform as would be a check for old snapshots or disk space < xyGB free. Both things (too many snapshots, too much disk space in use) as well as the file overflow thing were stuff, that we stumbled upon on multiple customers that were running into problems when upgrading their boxes. After the first ones, it was easy to spot on subsequent customers. Even my own homebrew box had the file overflow without me noticing and I just thought it strange that it used 3.4G disk space when a normal installation would be around ~2G without snaps. Only then I remembered - oh snap, I'm running pfB, too and haven't added the hotfix for the file overflow that we were testing...

                      So perhaps those 3 cases would make for a few additional easy pre-flight checks for future updates :)

                      Cheers

                      Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

                      If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                      S 1 Reply Last reply Reply Quote 0
                      • S Offline
                        SteveITS Rebel Alliance @JeGr
                        last edited by

                        @JeGr I think (?) it tries to check space but it's not uncommon to see posts about failed upgrades for space reasons. Maybe it needs a larger free space check.

                        We had one client with an old 2440 I recently upgraded through several versions successfully but it's at 94% full because of all the old files and I don't think I want to try 25.11, remotely. :-/

                        Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                        When upgrading, allow 10-15 minutes to reboot, or more depending on packages, and device or disk speed.
                        Upvote ๐Ÿ‘ helpful posts!

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S Offline
                          stephenw10 Netgate Administrator
                          last edited by

                          Hmm, I agree. Let me see what we can do here.

                          V 1 Reply Last reply Reply Quote 0
                          • V Offline
                            vronp @stephenw10
                            last edited by

                            @stephenw10 What would help at upgrades? :-)

                            I have a 4200 and am having the same problem, presumably. I have pfblockerng installed.

                            I'm also seeing:

                            ld-elf.so.1: Shared object "libmd.so.7" not found, required by "pfSense-repoc"

                            I have a ticket open at Netgate and they want me to do a USB upgrade. That didn't feel right to me so I started searching and found this thread.

                            S stephenw10S 2 Replies Last reply Reply Quote 0
                            • V Offline
                              vronp @stephenw10
                              last edited by

                              @stephenw10

                              Any ideas on this. BTW, my memory: 30% of 3890 MiB on a 4200

                              1 Reply Last reply Reply Quote 0
                              • S Offline
                                SteveITS Rebel Alliance @vronp
                                last edited by

                                @vronp said in 25.07 upgrade on Netgate 4100 gets rolled back:

                                ld-elf.so.1: Shared object "libmd.so.7" not found, required by "pfSense-repoc"

                                That's different, see
                                https://forum.netgate.com/topic/198754/ld-elf.so.1-shared-object-libmd.so.7-not-found-required-by-pfsense-repoc

                                But too many old config files can be a problem also, sure. How much free disk space do you have?

                                Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                When upgrading, allow 10-15 minutes to reboot, or more depending on packages, and device or disk speed.
                                Upvote ๐Ÿ‘ helpful posts!

                                V 1 Reply Last reply Reply Quote 0
                                • stephenw10S Offline
                                  stephenw10 Netgate Administrator @vronp
                                  last edited by

                                  @vronp said in 25.07 upgrade on Netgate 4100 gets rolled back:

                                  I'm also seeing:

                                  ld-elf.so.1: Shared object "libmd.so.7" not found, required by "pfSense-repoc"

                                  That's just an ugly error it should not prevent upgrading. If you run at the CLI: pfSense-repoc-static -N it should succeed as expected and that's what the upgrade uses.

                                  V 1 Reply Last reply Reply Quote 0
                                  • V Offline
                                    vronp @stephenw10
                                    last edited by

                                    @stephenw10

                                    Thanks. I also discovered 15,000 files in /cf/conf/backup

                                    It seems I need to clean that up. Is there a limit setting for backups there or is this the pfblockerng bug that was mentioned?

                                    S 1 Reply Last reply Reply Quote 0
                                    • S Offline
                                      SteveITS Rebel Alliance @vronp
                                      last edited by

                                      @vronp The default is 30 I believe. fixed in 25.07:
                                      https://docs.netgate.com/pfsense/en/latest/releases/25-07.html#configuration-backend

                                      pfB just makes it worse by generating one per cron job (default per hour).

                                      Diagnostics > Configuration History will time out while it tries to delete them all, just refresh every time it does. Or delete manually.

                                      Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                      When upgrading, allow 10-15 minutes to reboot, or more depending on packages, and device or disk speed.
                                      Upvote ๐Ÿ‘ helpful posts!

                                      V 1 Reply Last reply Reply Quote 0
                                      • V Offline
                                        vronp @SteveITS
                                        last edited by

                                        @SteveITS
                                        28% of 3890 MiB
                                        28% of 4.6G (zfs)

                                        I also just found 15,000 files in /cf/conf/backup

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S Offline
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          This is a bug. It should be limited to 30 backups there. The bug was that it was only pruning the backups when the user visited the Diag > Backup&Restore page. If you visit that page it will try to prune them. It might take a while if you have 15K files! It;s fixed in 25.07.

                                          1 Reply Last reply Reply Quote 0
                                          • V Offline
                                            vronp @SteveITS
                                            last edited by

                                            @SteveITS Thank you. I'm going to try to run an upgrade again as I'm hoping that the problem described above (copied below) is the cause of my problem even though I only have 15,000 files in that directory.

                                            "The box in question had 121,387 config-<timestamp>.xml files in /cf/conf/backup directory that accounted to around 1.5G in files. But it wasn't the disk space that were the problem but somehow the snapshot booted and wouldn't be able to access /cf/conf or cf/conf/backup because the process that tried to do something didn't succeed as the directory in question had too many files that broke some shell script magic."

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.