Netgate 1100 Can't Update from 23.05.1 to 23.09 or clean install
-
This started with trying to install the 23.09 update.
I first tried running the install from the Web UI, it runs until it gets to the "Checking Integrity" and then hung. A few minutes later everything came back online.
Next, I connected to the console and ran the update, same issue, though this time I get the full logs... turns out the device "panics":[89/91] Fetching php82-filter-8.2.11.pkg: . done [90/91] Fetching py311-libzfs-1.1.2023020700.pkg: . done [91/91] Fetching php82-pdo_sqlite-8.2.11.pkg: . done Checking integrity...panic: VERIFY0(0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp)) failed (0 == 5) cpuid = 1 time = 1699299913 KDB: enter: panic [ thread pid 6 tid 100186 ] Stopped at kdb_enter+0x44: undefined f907c27f db:0:kdb.enter.default> textdump set textdump set db:0:kdb.enter.default> capture on db:0:kdb.enter.default> run pfs db:1:pfs> bt Tracing pid 6 tid 100186 td 0xffff0000c75be800 db_trace_self() at db_trace_self db_stack_trace() at db_stack_trace+0x11c db_command() at db_command+0x358 db_script_exec() at db_script_exec+0x1a4 db_command() at db_command+0x358 db_script_exec() at db_script_exec+0x1a4 db_script_kdbenter() at db_script_kdbenter+0x58 db_trap() at db_trap+0xf4 kdb_trap() at kdb_trap+0x284 handle_el1h_sync() at handle_el1h_sync+0x10
I reached out to TAC Lite support and was told to look into ZFS boot environments & confirm disk space. It looks like I have plenty of disk space. Additionally, I was provided with the 23.09 image file.
So, I try the clean install approach. I connect to the console again, run through the installation process, select ZFS, confirm the install. The install appears to be successful, however after power cycling, I'm dumped back into pfSense 23.05.1. I tried the install a second time, selecting UFS and the same behaviour. I confirmed the image signature and re-flashed my USB drive which was said to be successful.
I have let TAC Lite support know the above, but I'm curious if anyone else has any ideas as I'm thoroughly stumped right now.
Any thoughts?
-
Hmm, so it boots the recovery image OK and appears to flash the image as expected?
-
@strigona I Have tried the same on a SG-1100 - Turned out it was the eMMC (Flash disk) that was “dead”. You could still read all data from it, but writes would not actually be written. They were accepted and seemed commited - but only because of the ZFS cache. If you actually read the diskblocks they were never changed.
Became very obvious when reflashing the device. Reflash went just fine but it still booted up on the old image and old config. Nothing I could do could change any blocks on the eMMC (Online or offline). -
@strigona So sorry, but your device is dead - unless you can still use it at it’s current firmware and config level.
-
@keyser Yep, support just replied after reviewing the logs and said the same thing. Thankfully it's still operational in effectively a read-only state I guess, but I'm living on borrowed time and am now trying to decide on what hardware to purchase. It's for home and I'm not super keen on buying another Netgate 1100 if I'm going to run into the same issue in a few years and the Netgate 2100 is probably out of my budget.
-
@strigona I realize this is too late but for future reference or others there is a doc page on write life:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html
There is also a separate list of packages where an SSD is recommended or required due to writing:
https://www.netgate.com/supported-pfsense-plus-packages
(side note: it'd be handy if this was documented in the package description text or somewhere more obvious than a non-docs web page)In my experience one can tone down logging significantly, and/or use a RAM disk (though the 1100 has only 1 GB). For instance we always disable logging of the default block rules, since we can just enable it if desired when troubleshooting something. Or in Suricata/Snort for some reason there is an "Enable HTTP Log" setting that logs all HTTP requests by default.
-
@SteveITS Yeah I definitely shot myself in the foot here and had turned up logging when troubleshooting something a while back & didn't turn logging down. I'm sure I burned through a lot of my disk life with that. Live & learn.
-
You may be able to run from USB instead. But it would need to be either a different ZFS pool name or UFS to conflict with the read-only eMMC.