ZFS pool degraded - no dashboard warning?
-
Maybe I'm missing a configuration step but in case I'm not. It would be useful to be able to display a ZFS pool degradation warning on the dashboard should one happen.
Ive had one of my two mirrored SATADOM's fail - SMART status shows green for the one drive remaining, zpool status shows degraded status.[2.4.2-RELEASE][root@pfSense.local.lan]/root: zpool status zroot pool: zroot state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested config: NAME STATE READ WRITE CKSUM zroot DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 ada0p4 ONLINE 0 0 0 1900527549686068822 REMOVED 0 0 0 was /dev/ada1p4
-
It has been said quite a few times: The devs see ZFS in pfSense as experimental and there is no GUI integration yet.
If you want to manage ZFS install the cron package to schedule scrubs and use the mailreport package to send you status information.
-
Theres something funky going on with my box which I cant get to the bottom of.
4am: weekly 'smartctl -long' tests both drives. No problems reported.
9am: drive drops out of array, no errors reported on ZFS array.Mar 11 10:02:34 php-fpm 12021 /status_logs.php: Session timed out for user 'admin' from: 192.168.20.101 Mar 11 10:00:39 php [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Mar 11 10:00:00 php [pfBlockerNG] Starting cron process. Mar 11 09:12:40 kernel (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted Mar 11 09:12:40 kernel (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:12:40 kernel (aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 Mar 11 09:12:40 kernel ahcich3: is 00000000 cs 00002000 ss 00000000 rs 00002000 tfd 80 serr 00000000 cmd 0000cd17 Mar 11 09:12:40 kernel ahcich3: Poll timeout on slot 13 port 0 Mar 11 09:12:29 kernel ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Mar 11 09:11:52 ZFS vdev is removed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822 Mar 11 09:11:52 kernel (ada1:ahcich3:0:0:0): Periph destroyed Mar 11 09:11:52 ZFS vdev state changed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822 Mar 11 09:11:52 ZFS vdev state changed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822 Mar 11 09:11:52 ZFS vdev state changed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822 Mar 11 09:11:52 kernel (ada1:ahcich3:0:0:0): Error 5, Periph was invalidated Mar 11 09:11:52 kernel (ada1:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:11:52 kernel (ada1:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 a8 50 6c dd 40 00 00 00 00 00 00 Mar 11 09:11:52 kernel (ada1:ahcich3:0:0:0): Error 5, Periph was invalidated Mar 11 09:11:52 kernel (ada1:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request Mar 11 09:11:52 kernel (ada1:ahcich3:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00 Mar 11 09:11:52 kernel ahcich3: is 00000000 cs 00001000 ss 00001000 rs 00001000 tfd 80 serr 00000000 cmd 0000cc17 Mar 11 09:11:52 kernel ahcich3: Timeout on slot 12 port 0 Mar 11 09:11:22 kernel (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted Mar 11 09:11:22 kernel (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:11:22 kernel (aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 Mar 11 09:11:22 kernel ahcich3: is 00000000 cs 00000800 ss 00000000 rs 00000800 tfd 80 serr 00000000 cmd 0000cb17 Mar 11 09:11:22 kernel ahcich3: Poll timeout on slot 11 port 0 Mar 11 09:11:05 kernel ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Mar 11 09:10:34 kernel (ada1:ahcich3:0:0:0): Error 5, Periph was invalidated Mar 11 09:10:34 kernel (ada1:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:10:34 kernel (ada1:ahcich3:0:0:0): SETFEATURES ENABLE WCACHE. ACB: ef 02 00 00 00 40 00 00 00 00 00 00 Mar 11 09:10:34 kernel ahcich3: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr 00000000 cmd 0000ca17 Mar 11 09:10:34 kernel ahcich3: Timeout on slot 10 port 0 Mar 11 09:10:04 kernel (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted Mar 11 09:10:04 kernel (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:10:04 kernel (aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 Mar 11 09:10:04 kernel ahcich3: is 00000000 cs 00000200 ss 00000000 rs 00000200 tfd 80 serr 00000000 cmd 0000c917 Mar 11 09:10:04 kernel ahcich3: Poll timeout on slot 9 port 0 Mar 11 09:09:53 kernel ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Mar 11 09:09:16 kernel (ada1:ahcich3:0:0:0): Error 5, Periph was invalidated Mar 11 09:09:16 kernel (ada1:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:09:16 kernel (ada1:ahcich3:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 00 00 00 00 00 Mar 11 09:09:16 kernel ahcich3: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 80 serr 00000000 cmd 0000c817 Mar 11 09:09:16 kernel ahcich3: Timeout on slot 8 port 0 Mar 11 09:08:46 kernel (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted Mar 11 09:08:46 kernel (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:08:46 kernel (aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 Mar 11 09:08:46 kernel ahcich3: is 00000000 cs 00000080 ss 00000000 rs 00000080 tfd 80 serr 00000000 cmd 0000c717 Mar 11 09:08:46 kernel ahcich3: Poll timeout on slot 7 port 0 Mar 11 09:08:35 kernel ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Mar 11 09:07:58 kernel (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted Mar 11 09:07:58 kernel (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:07:58 kernel (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Mar 11 09:07:58 kernel ahcich3: is 00000000 cs 00000040 ss 00000000 rs 00000040 tfd 80 serr 00000000 cmd 0000c617 Mar 11 09:07:58 kernel ahcich3: Timeout on slot 6 port 0 Mar 11 09:07:28 kernel ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Mar 11 09:06:56 kernel (aprobe0:ahcich3:0:0:0): Retrying command Mar 11 09:06:56 kernel (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:06:56 kernel (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Mar 11 09:06:56 kernel ahcich3: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd 80 serr 00000000 cmd 0000c517 Mar 11 09:06:56 kernel ahcich3: Timeout on slot 5 port 0 Mar 11 09:06:26 kernel ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Mar 11 09:05:55 kernel ada1: <sata ssd="" s9fm02.1="">s/n B4500756172400100699 detached Mar 11 09:05:55 kernel ada1 at ahcich3 bus 0 scbus3 target 0 lun 0 Mar 11 09:05:55 kernel (aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked Mar 11 09:05:55 kernel (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:05:55 kernel (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Mar 11 09:05:55 kernel ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 00000000 cmd 0000c417 Mar 11 09:05:55 kernel ahcich3: Timeout on slot 4 port 0 Mar 11 09:05:25 kernel ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Mar 11 09:04:53 kernel (aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted Mar 11 09:04:53 kernel (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:04:53 kernel (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Mar 11 09:04:53 kernel ahcich3: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd 80 serr 00000000 cmd 0000c317 Mar 11 09:04:53 kernel ahcich3: Timeout on slot 3 port 0 Mar 11 09:04:23 kernel ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Mar 11 09:03:52 kernel (aprobe0:ahcich3:0:0:0): Retrying command Mar 11 09:03:52 kernel (aprobe0:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:03:52 kernel (aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Mar 11 09:03:52 kernel ahcich3: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd 80 serr 00000000 cmd 0000c217 Mar 11 09:03:52 kernel ahcich3: Timeout on slot 2 port 0 Mar 11 09:03:22 kernel ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080) Mar 11 09:02:50 kernel (ada1:ahcich3:0:0:0): Retrying command Mar 11 09:02:50 kernel (ada1:ahcich3:0:0:0): CAM status: Command timeout Mar 11 09:02:50 kernel (ada1:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 a8 50 6c dd 40 00 00 00 00 00 00 Mar 11 09:02:50 kernel ahcich3: is 00000000 cs 00000000 ss 00000002 rs 00000002 tfd 40 serr 00000000 cmd 0000c117 Mar 11 09:02:50 kernel ahcich3: Timeout on slot 1 port 0 Mar 11 09:00:38 php [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Mar 11 09:00:00 php [pfBlockerNG] Starting cron process. Mar 11 08:00:38 php [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Mar 11 08:00:00 php [pfBlockerNG] Starting cron process. Mar 11 07:00:38 php [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Mar 11 07:00:00 php [pfBlockerNG] Starting cron process. Mar 11 06:13:16 pkg-static pfSense-repo upgraded: 2.4.3.a.20180309.0651 -> 2.4.3.a.20180310.0802 Mar 11 06:00:38 php [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Mar 11 06:00:00 php [pfBlockerNG] Starting cron process. Mar 11 05:00:44 php [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Mar 11 05:00:00 php [pfBlockerNG] Starting cron process. Mar 11 04:00:42 php [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Mar 11 04:00:00 php [pfBlockerNG] Starting cron process. Mar 11 03:00:39 php [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Mar 11 03:00:00 php [pfBlockerNG] Starting cron process.</sata>
Not sure if this is hardware, FreeBSD or pfSense related at this stage.
-
Looks like a failing disk to me.