Problems upgrading/installing 24.11 on SG-5100
-
Greetings folks,
I've tried some googling, and finally decided to see if anyone else has better google-fu than I do.
After a seemingly smooth upgrade from 24.03->24.11, I thought all was well. Later that night the router locked up. Dhcp died, no ssh with a static address, and even the local console was dead. Only pings were working.
So I went back and tried to do a fresh install of 24.11 from USB and failed. Then I tried a fresh USB install of 24.03 and also failed. I've even tried going back to the previous install media for 24.03 and am arriving at the same results.
Install failed. [1/12] Extracting pciids-20240920: ..... done [2/12] Installing libpci-3.13.0... [2/12] Extracting libpci-3.13.0: .......... done [3/12] Installing e2fsprogs-libuuid-1.47.1... [3/12] Extracting e2fsprogs-libuuid-1.47.1: .......... done [4/12] Installing aws-sdk-php83-3.273.3... [4/12] Extracting aws-sdk-php83-3.273.3: .......... done [5/12] Installing zip-3.0_2... [5/12] Extracting zip-3.0_2: .......... done [6/12] Installing flashrom-1.3.0_4... [6/12] Extracting flashrom-1.3.0_4: .......... done [7/12] Installing pfSense-pkg-WireGuard-0.2.9... [7/12] Extracting pfSense-pkg-WireGuard-0.2.9: .......... done [8/12] Installing drm-515-kmod-5.15.160... [8/12] Extracting drm-515-kmod-5.15.160: .......... done [9/12] Installing pfSense-pkg-ipsec-profile-wizard-1.2.4... [9/12] Extracting pfSense-pkg-ipsec-profile-wizard-1.2.4: ......... done [10/12] Installing pfSense-pkg-Netgate_Firmware_Upgrade-23.05.01... [10/12] Extracting pfSense-pkg-Netgate_Firmware_Upgrade-23.05.01: .......... done [11/12] Installing pfSense-pkg-aws-wizard-0.12... [11/12] Extracting pfSense-pkg-aws-wizard-0.12: ....... done [12/12] Installing pfSense-plus-24.11... [12/12] Extracting pfSense-plus-24.11: ... done ===== Message from drm-515-kmod-5.15.160: -- The drm-515-kmod port can be enabled for amdgpu (for AMD GPUs starting with the HD7000 series / Tahiti) or i915kms (for Intel APUs starting with HD3000 / Sandy Bridge) through kld_list in /etc/rc.conf. radeonkms for older AMD GPUs can be loaded and there are some positive reports if EFI boot is NOT enabled. For amdgpu: kld_list="amdgpu" For Intel: kld_list="i915kms" For radeonkms: kld_list="radeonkms" Please ensure that all users requiring graphics are members of the "video" group. pfSense Post Installation setup mount_msdosfs: /dev/ada0p1: Invalid argument Error: Failed to run the post installation script.
Any ideas? I'm going to try and go back to 23.x next.
Attached log file:
install-log.txt-LamaZ
-
Sooo frustrating. I went back to the 23.09.1 installer (
pfSense-plus-memstick-serial-23.09.1-RELEASE-amd64.img
), and after it finished and prompted me to reboot it STILL reboots into 24.11-RELEASE!-LamaZ
-
How do I get out of 24.11 hell? I can't reboot out of it. I set the boot environment to 24.03 and it still forces me back. See below the screenshot of my Boot Environments taunting me.
The option (the play button) option didn't work either:
-LamaZ -
What are you booting from here?
The installer should remove all ZFS BEs from the target drive. So possibly it installed to the eMMC and you are booting from the SSD?
Another possibility is that the eMMC has gone read-only. Though that is not the usual failure mode for it.
Steve
-
@stephenw10 Thanks for the tip. The eMMC died many years ago. I thought I disabled it in the BIOS. I'll double check. I definitely targeted the hard disk with the installer
/dev/ada0
.Edit: It is the M.2 hard drive. No luck so far.
-
AFAIK there is no way to disable the eMMC. It's a problem on the 5100.
Perhaps the SSD has gone read only? That would be a lot of writes but that is a much more common failure mode for an SSD.
-
@stephenw10 OK, I could see something like ntopng writing a lot to disk. Question, how would I go about zeroing in on the SSD going into read only mode as the culprit if I can't SSH, webConfig, or console once it locks up? Is an rsyslog server going to be needed, or is there some other method to diagnose this?
To be clear, the thought is that the install/upgrade process has so many consecutive writes that the SSD goes into read only mode. Did I get that right?
PS -Thanks for the validation on the eMMC cannot be nuked on the 5100. I was going crazy trying to figure out how to permanently disable it.
Thanks!
-LamaZ
-
First check the SMART data. It should have an estimated wear level value.
[24.11-RELEASE][admin@5100.stevew.lan]/root: smartctl -a /dev/ada0 smartctl 7.4 2023-08-01 r5530 [FreeBSD 15.0-CURRENT amd64] (local build) Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: NT-32 Serial Number: 987032300377 LU WWN Device Id: 5 000000 000000000 Firmware Version: 1.095.06 User Capacity: 32,017,047,552 bytes [32.0 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches TRIM Command: Available Device is: Not in smartctl database 7.3/5528 ATA Version is: ACS-2 T13/2015-D revision 3 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Dec 3 00:58:51 2024 GMT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 32) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 1) minutes. SCT capabilities: (0x0039) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 0 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0 2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0 3 Spin_Up_Time 0x0007 100 100 050 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0013 100 100 050 Pre-fail Always - 0 7 Unknown_SSD_Attribute 0x000b 100 100 050 Pre-fail Always - 0 8 Unknown_SSD_Attribute 0x0005 100 100 050 Pre-fail Offline - 0 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 46130 10 Unknown_SSD_Attribute 0x0013 100 100 050 Pre-fail Always - 0 12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 79 167 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0 169 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 262146 170 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 0 171 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 0 172 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 0 173 Unknown_Attribute 0x0012 134 134 000 Old_age Always - 3560591852425 175 Program_Fail_Count_Chip 0x0013 100 100 010 Pre-fail Always - 0 180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 100 100 020 Pre-fail Always - 69 187 Reported_Uncorrect 0x0032 000 000 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 62 194 Temperature_Celsius 0x0022 075 075 030 Old_age Always - 25 (0 60 0 30 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 231 Unknown_SSD_Attribute 0x0033 070 070 005 Pre-fail Always - 30 240 Unknown_SSD_Attribute 0x0013 100 100 050 Pre-fail Always - 0 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 18383081362 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 359301512 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 18192 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. The above only provides legacy SMART information - try 'smartctl -x' for more
So there the 'Unused_Rsvd_Blk_Cnt_Tot' is above zero, it still has spare blocks to use in the event of others failing. Other drives often have more useful values.
-
@stephenw10 said in Problems upgrading/installing 24.11 on SG-5100:
smartctl -a /dev/ada0
You sir win! Here is my most un-favorite line in my output:
=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA. No failed Attributes found.
Ordering another disk ASAP.
-LamaZ
-
Yikes!
-
@stephenw10 I can't thank you enough. Seriously. I've been going stir crazy for the past couple of days.
Question, should we rename the title of this thread? I don't want to give other SG-5100 owners the wrong impression when they see the title.
-LamaZ
-
I think it's probably fine to leave it. Not many people come looking for problems before they encounter them.
-
@stephenw10 said in Problems upgrading/installing 24.11 on SG-5100:
I think it's probably fine to leave it. Not many people come looking for problems before they encounter them.
Uhhh I do! That is how I ended up in this thread. I have an SG 5100 and wanted to see what others experience has been upgrading to 24.11.
I suspect it will not be as much of an issue (I hope) since I went with a bit larger SSD. It shows about 1.5 years of life left.
@LamaZ - What size was your drive?
Phizix
-
I could change the title? The drive failure here isn't actually 5100 specific.
-
@stephenw10 - I totally agree, but that is what originally pulled me into the thread. I wouldn't bother changing the thread title.
I also think that my wear rate is actually slower so I expect the 1.5 year estimate to be more like 3 years.
Phizix