Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    Unable to upgrade SG-1100 appliance to 23.01 - Kernel panic

    Installation and Upgrades
    update 23.01 sg-1100
    3
    11
    775
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      msoutullo last edited by msoutullo

      Hello,

      I'm trying to update my SG-1100 appliance to the latest 23.01 version, however, it's proving to be a challenge. When kicking the update using either GUI or CLI the machine automatically reboots when trying to fetch the kernel package. Nothing appears in the update log within the /conf partition.

      Suspiciously, I opened a serial connection and kick the update over there and this is the kernel panic I found out.

      [62/202] Fetching dhcpleases6-0.1_3.pkg: 100%   10 KiB  10.3kB/s    00:01
      [63/202] Fetching php81-posix-8.1.11.pkg: 100%   14 KiB  14.4kB/s    00:01
      [64/202] Fetching pfSense-u-boot-2100-20210930_1.pkg: 100%  315 KiB 322.8kB/s    00:01
      [65/202] Fetching iftop-1.0.p4.pkg: 100%   41 KiB  42.2kB/s    00:01
      [66/202] Fetching pfSense-kernel-pfSense-23.01.pkg:  72%   27 MiB   2.4MB/s    00:05 ETApanic: solaris assert: dmu_buf_hold_array(os, object, offset, size, 0, ((char *)(uintptr_t)__func__), &numbufs, &dbp) == 0 (0x5 == 0x0), file: /var/jenkins/workspace/pfSense-img-build/BUILD_NODE/aarch64/OS_MAJOR_VERSION/freebsd12/PLATFORM/aws/sources/FreeBSD-src
      cpuid = 0
      time = 1676589573
      Uptime: 6m45s
      Automatic reboot in 15 seconds - press a key on the console to abort
      --> Press a key on the console to reboot,
      --> or switch off the system now.
      Rebooting...
      

      I had a look at the ZFS pool:

      [22.05-RELEASE][root@pfsense]/root: zpool status
        pool: pfSense
       state: ONLINE
      status: Some supported features are not enabled on the pool. The pool can
              still be used, but some features are unavailable.
      action: Enable all features using 'zpool upgrade'. Once this is done,
              the pool may no longer be accessible by software that does not support
              the features. See zpool-features(7) for details.
        scan: none requested
      config:
      
              NAME         STATE     READ WRITE CKSUM
              pfSense      ONLINE       0     0     0
                mmcsd0s3a  ONLINE       0     0     0
      
      errors: No known data errors
      

      May it be the update tool is trying to download an inexistent aws package for my architecture and build?

      Thanks!

      1 Reply Last reply Reply Quote 0
      • jimp
        jimp Rebel Alliance Developer Netgate last edited by

        This bit suggests it's a problem with the filesystem or disk:

        panic: solaris assert: dmu_buf_hold_array(os, object, offset, size, 0, ((char *)(uintptr_t)__func__), &numbufs, &dbp) == 0 (0x5 == 0x0), file: /var/jenkins/workspace/pfSense-img-build/BUILD_NODE/aarch64/OS_MAJOR_VERSION/freebsd12/PLATFORM/aws/sources/FreeBSD-src
        

        The path bit in there is a red herring, that's where it was built from, not what it was installing.

        You might try a zfs scrub pfSense as mentioned in the docs, but usually ZFS is pretty resilient against that kind of issue.

        If you have direct access to the system, a fresh install would be a more reliable way to ensure you have a clean and consistent filesystem.

        Usually if it were hardware you'd see a lot more chatter on the console about the disk device, so you might want to make sure you're monitoring/logging the console output for a while. Since it could be a disk issue you can't always trust the logs on disk since if the disk stops responding it can't keep writing the logs out either.

        Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        M 1 Reply Last reply Reply Quote 1
        • M
          msoutullo @jimp last edited by

          @jimp many thanks for replying.

          You're absolutely right. The path in question corresponds to your CI system that compiles and builds the source. So innocent I was!

          Also, it looks like my FS has several permanent errors. I don't think I could recover them either, I'll kick off a recovery process.

          M 1 Reply Last reply Reply Quote 0
          • M
            msoutullo @msoutullo last edited by

            @jimp I couldn't attach the result of zfs scrub pfsense earlier.

            # zpool status -v
              pool: pfSense
             state: ONLINE
            status: One or more devices has experienced an error resulting in data
                    corruption.  Applications may be affected.
            action: Restore the file in question if possible.  Otherwise restore the
                    entire pool from backup.
               see: http://illumos.org/msg/ZFS-8000-8A
              scan: scrub repaired 0 in 0 days 00:00:56 with 253 errors on Thu Jan  1 01:03:54 1970
            config:
            
                    NAME         STATE     READ WRITE CKSUM
                    pfSense      ONLINE       0     0   253
                      mmcsd0s3a  ONLINE       0     0   506
            
            errors: Permanent errors have been detected in the following files:
            (...)
                    pfSense/tmp:<0x0>
                    pfSense/var:<0x0>
                    pfSense/ROOT/default:<0x0>
                    pfSense/ROOT/default/var_cache_pkg:<0x0>
                    pfSense/var/db:<0x0>
                    pfSense/var/log:<0x0>
                    pfSense/ROOT/default/cf:<0x0>
                    pfSense/ROOT/default/var_db_pkg:<0x0>
            
            1 Reply Last reply Reply Quote 0
            • M
              msoutullo last edited by

              @jimp hmm quite weird. When following the procedure stated on the online docs and using the latest recovery image (pfSense-plus-compat-recovery-23.01-RELEASE-aarch64.img.gz) provided by the support team I cannot proceed to a fresh install. What am I doing wrong?

              My understanding is that the recovery procedure wipes out the content of the disk partition and writes down the new fresh image, right? When the SG-1100 boots up it comes with the same 22.05.12 firmware and the FS is still corrupted.

              1 Reply Last reply Reply Quote 0
              • jimp
                jimp Rebel Alliance Developer Netgate last edited by

                Unfortunately that might be that the storage has gone read only as a failure mode, but it's not certain. Keep working with TAC they should be able to help make a determination of what is happening.

                Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                M 1 Reply Last reply Reply Quote 1
                • M
                  msoutullo @jimp last edited by

                  Hi @jimp

                  Yes, the EMMC has gone into RO. Fortunately and thanks to a recent boot environment I can still use the unit with 22.05 and a recent conf.

                  Those panic kernels I brought here were literally a symptom of the EMMC being worn out. Frankly speaking a bit of a shame, a very capable hardware but a poor endurability promise.

                  william.mandell 1 Reply Last reply Reply Quote 1
                  • william.mandell
                    william.mandell @msoutullo last edited by

                    This post is deleted!
                    1 Reply Last reply Reply Quote 0
                    • jimp
                      jimp Rebel Alliance Developer Netgate last edited by jimp

                      If you installed with an older version of pfSense Plus and upgraded then it's possible the disk is using an older ZFS version, but it's not necessary to upgrade it right away in most cases. Upgrading generally enables new features but most of the time they aren't things we activate or need right away.

                      Taking care of that step is on our radar, though, since there are some new features we may want to leverage. But we need to make sure we have code that handles the whole process properly since upgrading the ZFS version on the disk may also mean having to rewrite the boot code and so on.

                      tl;dr it's normal and not a concern, if you want to upgrade it, you can try, but just in case I'd take a backup and have install media for 23.01 handy. Fresh install of 23.01 would use the new version natively.

                      EDIT: One thing you may want to consider is that upgrading might affect the ability to boot into the older version with a ZFS Boot Environment. I haven't tried that myself so I can't say for certain but it's something to factor in.

                      Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      william.mandell 1 Reply Last reply Reply Quote 0
                      • william.mandell
                        william.mandell @jimp last edited by

                        This post is deleted!
                        jimp 1 Reply Last reply Reply Quote 0
                        • jimp
                          jimp Rebel Alliance Developer Netgate @william.mandell last edited by

                          @william-mandell said in Unable to upgrade SG-1100 appliance to 23.01 - Kernel panic:

                          @jimp

                          No Jim , sorry, but no. Not an older version of pfsense plus at all.

                          I was on the Newest version of pfSense Plus+ that was the release #, 'freshly' and (*natively) installed before > and updated to 23.01 obviously after 23.01 came out, but before I saw that it was being 'blocked' because of some problems going on.

                          Those statements are contradictory. Either you installed 23.01 directly or you were on 22.05 and upgraded to 23.01.

                          That said, I checked around and apparently it's a known issue that the 1100 and 2100 recovery installers are using an older ZFS version in the disk images.

                          While you could run zpool upgrade -a and let it upgrade that, if it didn't properly update the loader when you updated to 23.01 then it may not boot properly after. If you did reimage it with 23.01 and not an upgrade, then it's safe to run.

                          How and why would I want to boot it to the older version(s) ZFS boot environment , its has ZFS now. The 'old' environment always said it has an 'error' and shouldnt even running, but TAC said y'all had a special version, so it's fine.

                          If you upgraded from 22.05 to 23.01 and had a problem on 23.01, you could use the boot environment to boot back into 22.05 without reinstalling.

                          Can I ask again, how EXACTLY do I run a program that uses your special Chip' on board to verify and authenticate that my system is running authentic pfSense+ software. Does it need to be 'on'?ย Not sure if it's the same chip but whichever one displays whether it is on or not has always been off. Thought that chip was for VPN? Anyway, the trademarked pc sense chip to verify and authenticate - the software running.

                          The device that handles the authenticity part is used by the device when accessing the package repositories for packages and updates (all automatic). That is the "thoth" security chip.

                          VPN acceleration on 1100/2100 is handled by a different function, the SafeXcel cryptographic accelerator, which is unrelated.

                          Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                          Need help fast? Netgate Global Support!

                          Do not Chat/PM for help!

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post