Unable to upgrade SG-1100 appliance to 23.01 - Kernel panic
I'm trying to update my SG-1100 appliance to the latest 23.01 version, however, it's proving to be a challenge. When kicking the update using either GUI or CLI the machine automatically reboots when trying to fetch the
kernelpackage. Nothing appears in the update log within the /conf partition.
Suspiciously, I opened a serial connection and kick the update over there and this is the kernel
panicI found out.
[62/202] Fetching dhcpleases6-0.1_3.pkg: 100% 10 KiB 10.3kB/s 00:01 [63/202] Fetching php81-posix-8.1.11.pkg: 100% 14 KiB 14.4kB/s 00:01 [64/202] Fetching pfSense-u-boot-2100-20210930_1.pkg: 100% 315 KiB 322.8kB/s 00:01 [65/202] Fetching iftop-1.0.p4.pkg: 100% 41 KiB 42.2kB/s 00:01 [66/202] Fetching pfSense-kernel-pfSense-23.01.pkg: 72% 27 MiB 2.4MB/s 00:05 ETApanic: solaris assert: dmu_buf_hold_array(os, object, offset, size, 0, ((char *)(uintptr_t)__func__), &numbufs, &dbp) == 0 (0x5 == 0x0), file: /var/jenkins/workspace/pfSense-img-build/BUILD_NODE/aarch64/OS_MAJOR_VERSION/freebsd12/PLATFORM/aws/sources/FreeBSD-src cpuid = 0 time = 1676589573 Uptime: 6m45s Automatic reboot in 15 seconds - press a key on the console to abort --> Press a key on the console to reboot, --> or switch off the system now. Rebooting...
I had a look at the ZFS pool:
[22.05-RELEASE][root@pfsense]/root: zpool status pool: pfSense state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: none requested config: NAME STATE READ WRITE CKSUM pfSense ONLINE 0 0 0 mmcsd0s3a ONLINE 0 0 0 errors: No known data errors
May it be the update tool is trying to download an inexistent
awspackage for my architecture and build?
This bit suggests it's a problem with the filesystem or disk:
panic: solaris assert: dmu_buf_hold_array(os, object, offset, size, 0, ((char *)(uintptr_t)__func__), &numbufs, &dbp) == 0 (0x5 == 0x0), file: /var/jenkins/workspace/pfSense-img-build/BUILD_NODE/aarch64/OS_MAJOR_VERSION/freebsd12/PLATFORM/aws/sources/FreeBSD-src
The path bit in there is a red herring, that's where it was built from, not what it was installing.
You might try a
zfs scrub pfSenseas mentioned in the docs, but usually ZFS is pretty resilient against that kind of issue.
If you have direct access to the system, a fresh install would be a more reliable way to ensure you have a clean and consistent filesystem.
Usually if it were hardware you'd see a lot more chatter on the console about the disk device, so you might want to make sure you're monitoring/logging the console output for a while. Since it could be a disk issue you can't always trust the logs on disk since if the disk stops responding it can't keep writing the logs out either.
@jimp many thanks for replying.
You're absolutely right. The path in question corresponds to your CI system that compiles and builds the source. So innocent I was!
Also, it looks like my FS has several permanent errors. I don't think I could recover them either, I'll kick off a recovery process.
@jimp I couldn't attach the result of
zfs scrub pfsenseearlier.
# zpool status -v pool: pfSense state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 0 in 0 days 00:00:56 with 253 errors on Thu Jan 1 01:03:54 1970 config: NAME STATE READ WRITE CKSUM pfSense ONLINE 0 0 253 mmcsd0s3a ONLINE 0 0 506 errors: Permanent errors have been detected in the following files: (...) pfSense/tmp:<0x0> pfSense/var:<0x0> pfSense/ROOT/default:<0x0> pfSense/ROOT/default/var_cache_pkg:<0x0> pfSense/var/db:<0x0> pfSense/var/log:<0x0> pfSense/ROOT/default/cf:<0x0> pfSense/ROOT/default/var_db_pkg:<0x0>
@jimp hmm quite weird. When following the procedure stated on the online docs and using the latest recovery image (pfSense-plus-compat-recovery-23.01-RELEASE-aarch64.img.gz) provided by the support team I cannot proceed to a fresh install. What am I doing wrong?
My understanding is that the recovery procedure wipes out the content of the disk partition and writes down the new fresh image, right? When the SG-1100 boots up it comes with the same 22.05.12 firmware and the FS is still corrupted.
Unfortunately that might be that the storage has gone read only as a failure mode, but it's not certain. Keep working with TAC they should be able to help make a determination of what is happening.
Yes, the EMMC has gone into RO. Fortunately and thanks to a recent boot environment I can still use the unit with 22.05 and a recent conf.
Those panic kernels I brought here were literally a symptom of the EMMC being worn out. Frankly speaking a bit of a shame, a very capable hardware but a poor endurability promise.
william.mandell last edited by
This post is deleted!
jimp Rebel Alliance Developer Netgate last edited by jimp
If you installed with an older version of pfSense Plus and upgraded then it's possible the disk is using an older ZFS version, but it's not necessary to upgrade it right away in most cases. Upgrading generally enables new features but most of the time they aren't things we activate or need right away.
Taking care of that step is on our radar, though, since there are some new features we may want to leverage. But we need to make sure we have code that handles the whole process properly since upgrading the ZFS version on the disk may also mean having to rewrite the boot code and so on.
tl;dr it's normal and not a concern, if you want to upgrade it, you can try, but just in case I'd take a backup and have install media for 23.01 handy. Fresh install of 23.01 would use the new version natively.
EDIT: One thing you may want to consider is that upgrading might affect the ability to boot into the older version with a ZFS Boot Environment. I haven't tried that myself so I can't say for certain but it's something to factor in.
william.mandell last edited by
This post is deleted!
@william-mandell said in Unable to upgrade SG-1100 appliance to 23.01 - Kernel panic:
No Jim , sorry, but no. Not an older version of pfsense plus at all.
I was on the Newest version of pfSense Plus+ that was the release #, 'freshly' and (*natively) installed before > and updated to 23.01 obviously after 23.01 came out, but before I saw that it was being 'blocked' because of some problems going on.
Those statements are contradictory. Either you installed 23.01 directly or you were on 22.05 and upgraded to 23.01.
That said, I checked around and apparently it's a known issue that the 1100 and 2100 recovery installers are using an older ZFS version in the disk images.
While you could run
zpool upgrade -aand let it upgrade that, if it didn't properly update the loader when you updated to 23.01 then it may not boot properly after. If you did reimage it with 23.01 and not an upgrade, then it's safe to run.
How and why would I want to boot it to the older version(s) ZFS boot environment , its has ZFS now. The 'old' environment always said it has an 'error' and shouldnt even running, but TAC said y'all had a special version, so it's fine.
If you upgraded from 22.05 to 23.01 and had a problem on 23.01, you could use the boot environment to boot back into 22.05 without reinstalling.
Can I ask again, how EXACTLY do I run a program that uses your special Chip' on board to verify and authenticate that my system is running authentic pfSense+ software. Does it need to be 'on'? Not sure if it's the same chip but whichever one displays whether it is on or not has always been off. Thought that chip was for VPN? Anyway, the trademarked pc sense chip to verify and authenticate - the software running.
The device that handles the authenticity part is used by the device when accessing the package repositories for packages and updates (all automatic). That is the "thoth" security chip.
VPN acceleration on 1100/2100 is handled by a different function, the SafeXcel cryptographic accelerator, which is unrelated.