Using "find /" brings pfSense down
-
@patient0 The name was from a download I had, but it was just representative. I tried other names that I was looking for yesterday and
find
also locked up.I an using zfs. I am not familiar with zfs, but I've just tried:
[2.7.2-RELEASE][root@pfSense.howitts.co.uk]/root: zpool scrub pfSense [2.7.2-RELEASE][root@pfSense.howitts.co.uk]/root: zpool status -v pfSense pool: pfSense state: SUSPENDED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC scan: scrub in progress since Fri Apr 18 10:57:11 2025 1.64G / 1.64G scanned, 1007M / 1.64G issued at 67.1M/s 0B repaired, 59.92% done, 00:00:10 to go config: NAME STATE READ WRITE CKSUM pfSense ONLINE 0 0 0 ada0p4 ONLINE 207 33 410 errors: List of errors unavailable: pool I/O is currently suspended
How should I proceed?
-
@NickJH Hmm. Now:
[2.7.2-RELEASE][root@pfSense.howitts.co.uk]/root: zpool status -v pfSense pool: pfSense state: SUSPENDED status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC scan: scrub in progress since Fri Apr 18 10:57:11 2025 1.64G / 1.64G scanned, 1007M / 1.64G issued at 1.98M/s 0B repaired, 59.92% done, no estimated completion time config: NAME STATE READ WRITE CKSUM pfSense ONLINE 0 0 0 ada0p4 ONLINE 207 33 410 errors: List of errors unavailable: pool I/O is currently suspended
Stuck at 59.92%?
-
@NickJH I don't have experience with ZFS repair. Hopefully someone else can chime in.
If you search the internet for 'pfSense zfs suspended` it returns quite a few results.
What does SMART pfSense Doc: S.M.A.R.T. Hard Disk Status report for your disk? ( reddit: How can I check the health, run diagnostics on the SSD in my pfSense machine? )? Is it in a health state?
-
@patient0 SMART data looks OK. I've performed a Short Test and it was fine:
SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 8769 -
is it worth doing a long test?
I did a forced power-off/power-on (as the console was unresponsive) then a controlled reboot before doing those tests.
Zpool status doesn't look good:
[2.7.2-RELEASE][root@pfSense.howitts.co.uk]/root: zpool status -v pfSense pool: pfSense state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:41:55 with 29 errors on Fri Apr 18 11:39:06 2025 config: NAME STATE READ WRITE CKSUM pfSense ONLINE 0 0 0 ada0p4 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /var/cache/pkg/pfSense-base-2.7.2~0135c9e036.pkg //usr/local/share/pfSense/base.txz //usr/sbin/wpa_supplicant //lib/libzpool.so.2 //usr/local/include/boost/fusion/algorithm/transformation/detail/preprocessed/zip50.hpp //usr/local/include/php/Zend/zend_weakrefs.h //usr/bin/ztest //usr/lib/clang/15.0.7/lib/freebsd/libclang_rt.fuzzer-x86_64.a //usr/lib/libpmc.so.5 //usr/local/lib/libpkg.so.4 //usr/local/share/examples/dnsmasq/dnslist/dnslist.pl //usr/local/lib/libboost_test_exec_monitor.a //usr/share/syscons/fonts/koi8-rb-8x16.fnt //usr/local/lib/perl5/5.34/mach/CORE/libperl.so.5.34.1 //usr/lib/clang/16/lib/freebsd/libclang_rt.stats-x86_64.a //usr/local/include/boost/fusion/container/vector/detail/cpp03/vector40.hpp //usr/lib/libc.a //usr/local/lib/python3.11/xml/dom/__pycache__/expatbuilder.cpython-311.opt-2.pyc //usr/lib/librpcsvc.a //usr/include/xlocale/_stdlib.h
If I can't find a way of repairing it, I'll investigate a reload and restore. I have a recent configuration backup.
-
@NickJH Hmm. This is my gateway but pfSense only have an online installer. With some searching, I've found a link to the old offline installer - https://forum.netgate.com/topic/188105/ce-edition-download/9. I hope it is good.
Could I install onto another system and copy the corrupt files across? I am not sure it is very safe to do that.
-
@NickJH said in Using "find /" brings pfSense down:
With some searching, I've found a link to the old offline installer - https://forum.netgate.com/topic/188105/ce-edition-download/9. I hope it is good.
They are still good, the offline 2.7.2 installer, yes.
Could I install onto another system and copy the corrupt files across?
A reinstall with the offline installer will be faster I'd assume then copying file over from another installation (never tried that).
Have you had power cuts or multiple forced reboot? If yes, then that is a reason why the file system gets corrupted.
If no then you really have to low-level check of the SSD. -
@patient0 I have had multiple forced reboots. I am using a Chinese N100 system, but I don't know how relevant that is. I have just changed from Static to DHCP and that didn't go well yesterday. I did a few hard resets in order to try to get online, but I think I had the problem before that. Some of the hard resets hung and had to be reset again.
What is the best way of doing a low level check of the SSD (mSATA)? A long SMART test?
-
@NickJH FYI in the online installer one configures WAN and it downloads what it needs.
If you think the disk is fine I’d reinstall. However I’d also check for memory problems because it sounds like the data on disk was corrupted somehow.
As https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-check.html notes there isn’t a file system repair for ZFS as it’s not typically needed.
If pfSense is running it has SMART tests in https://docs.netgate.com/pfsense/en/latest/monitoring/status/smart.html.
-
Yeah that does seem like a drive issue to me, find should never lock like that. Or take very long at all:
[2.7.2-RELEASE][admin@cedev-3.stevew.lan]/root: time find / -name config-pfSense.howitts.co.uk-20250317222358.xml 0.029u 0.313s 0:00.95 34.7% 39+184k 3143+0io 2pf+0w
-
@stephenw10 Thanks.
The long SMART test was clean, so, once my son is back at school next week, I'll try a reinstall, bootstrapping it with my latest config.xml - https://docs.netgate.com/pfsense/en/latest/backup/restore-during-install.html.
When I take it down I'll check the seating of the mSATA before reloading the software. If it fails again, I'll look at replacing the disk.