ZFS pool degraded - no dashboard warning?



  • Maybe I'm missing a configuration step but in case I'm not. It would be useful to be able to display a ZFS pool degradation warning on the dashboard should one happen.
    Ive had one of my two mirrored SATADOM's fail - SMART status shows green for the one drive remaining, zpool status shows degraded status.

    
    [2.4.2-RELEASE][root@pfSense.local.lan]/root: zpool status zroot
      pool: zroot
     state: DEGRADED
    status: One or more devices has been removed by the administrator.
    	Sufficient replicas exist for the pool to continue functioning in a
    	degraded state.
    action: Online the device using 'zpool online' or replace the device with
    	'zpool replace'.
      scan: none requested
    config:
    
    	NAME                     STATE     READ WRITE CKSUM
    	zroot                    DEGRADED     0     0     0
    	  mirror-0               DEGRADED     0     0     0
    	    ada0p4               ONLINE       0     0     0
    	    1900527549686068822  REMOVED      0     0     0  was /dev/ada1p4
    
    




  • It has been said quite a few times: The devs see ZFS in pfSense as experimental and there is no GUI integration yet.

    If you want to manage ZFS install the cron package to schedule scrubs and use the mailreport package to send you status information.



  • Theres something funky going on with my box which I cant get to the bottom of.

    4am: weekly 'smartctl -long' tests both drives. No problems reported.
    9am: drive drops out of array, no errors reported on ZFS array.

    
    Mar 11 10:02:34	php-fpm	12021	/status_logs.php: Session timed out for user 'admin' from: 192.168.20.101
    Mar 11 10:00:39	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Mar 11 10:00:00	php		[pfBlockerNG] Starting cron process.
    Mar 11 09:12:40	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
    Mar 11 09:12:40	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:12:40	kernel		(aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
    Mar 11 09:12:40	kernel		ahcich3: is 00000000 cs 00002000 ss 00000000 rs 00002000 tfd 80 serr 00000000 cmd 0000cd17
    Mar 11 09:12:40	kernel		ahcich3: Poll timeout on slot 13 port 0
    Mar 11 09:12:29	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    Mar 11 09:11:52	ZFS		vdev is removed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822
    Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): Periph destroyed
    Mar 11 09:11:52	ZFS		vdev state changed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822
    Mar 11 09:11:52	ZFS		vdev state changed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822
    Mar 11 09:11:52	ZFS		vdev state changed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822
    Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): Error 5, Periph was invalidated
    Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 a8 50 6c dd 40 00 00 00 00 00 00
    Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): Error 5, Periph was invalidated
    Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
    Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
    Mar 11 09:11:52	kernel		ahcich3: is 00000000 cs 00001000 ss 00001000 rs 00001000 tfd 80 serr 00000000 cmd 0000cc17
    Mar 11 09:11:52	kernel		ahcich3: Timeout on slot 12 port 0
    Mar 11 09:11:22	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
    Mar 11 09:11:22	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:11:22	kernel		(aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
    Mar 11 09:11:22	kernel		ahcich3: is 00000000 cs 00000800 ss 00000000 rs 00000800 tfd 80 serr 00000000 cmd 0000cb17
    Mar 11 09:11:22	kernel		ahcich3: Poll timeout on slot 11 port 0
    Mar 11 09:11:05	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    Mar 11 09:10:34	kernel		(ada1:ahcich3:0:0:0): Error 5, Periph was invalidated
    Mar 11 09:10:34	kernel		(ada1:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:10:34	kernel		(ada1:ahcich3:0:0:0): SETFEATURES ENABLE WCACHE. ACB: ef 02 00 00 00 40 00 00 00 00 00 00
    Mar 11 09:10:34	kernel		ahcich3: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr 00000000 cmd 0000ca17
    Mar 11 09:10:34	kernel		ahcich3: Timeout on slot 10 port 0
    Mar 11 09:10:04	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
    Mar 11 09:10:04	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:10:04	kernel		(aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
    Mar 11 09:10:04	kernel		ahcich3: is 00000000 cs 00000200 ss 00000000 rs 00000200 tfd 80 serr 00000000 cmd 0000c917
    Mar 11 09:10:04	kernel		ahcich3: Poll timeout on slot 9 port 0
    Mar 11 09:09:53	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    Mar 11 09:09:16	kernel		(ada1:ahcich3:0:0:0): Error 5, Periph was invalidated
    Mar 11 09:09:16	kernel		(ada1:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:09:16	kernel		(ada1:ahcich3:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 00 00 00 00 00
    Mar 11 09:09:16	kernel		ahcich3: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 80 serr 00000000 cmd 0000c817
    Mar 11 09:09:16	kernel		ahcich3: Timeout on slot 8 port 0
    Mar 11 09:08:46	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
    Mar 11 09:08:46	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:08:46	kernel		(aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
    Mar 11 09:08:46	kernel		ahcich3: is 00000000 cs 00000080 ss 00000000 rs 00000080 tfd 80 serr 00000000 cmd 0000c717
    Mar 11 09:08:46	kernel		ahcich3: Poll timeout on slot 7 port 0
    Mar 11 09:08:35	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    Mar 11 09:07:58	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
    Mar 11 09:07:58	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:07:58	kernel		(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
    Mar 11 09:07:58	kernel		ahcich3: is 00000000 cs 00000040 ss 00000000 rs 00000040 tfd 80 serr 00000000 cmd 0000c617
    Mar 11 09:07:58	kernel		ahcich3: Timeout on slot 6 port 0
    Mar 11 09:07:28	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    Mar 11 09:06:56	kernel		(aprobe0:ahcich3:0:0:0): Retrying command
    Mar 11 09:06:56	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:06:56	kernel		(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
    Mar 11 09:06:56	kernel		ahcich3: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd 80 serr 00000000 cmd 0000c517
    Mar 11 09:06:56	kernel		ahcich3: Timeout on slot 5 port 0
    Mar 11 09:06:26	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    Mar 11 09:05:55	kernel		ada1: <sata ssd="" s9fm02.1="">s/n B4500756172400100699 detached
    Mar 11 09:05:55	kernel		ada1 at ahcich3 bus 0 scbus3 target 0 lun 0
    Mar 11 09:05:55	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked
    Mar 11 09:05:55	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:05:55	kernel		(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
    Mar 11 09:05:55	kernel		ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 00000000 cmd 0000c417
    Mar 11 09:05:55	kernel		ahcich3: Timeout on slot 4 port 0
    Mar 11 09:05:25	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    Mar 11 09:04:53	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
    Mar 11 09:04:53	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:04:53	kernel		(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
    Mar 11 09:04:53	kernel		ahcich3: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd 80 serr 00000000 cmd 0000c317
    Mar 11 09:04:53	kernel		ahcich3: Timeout on slot 3 port 0
    Mar 11 09:04:23	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    Mar 11 09:03:52	kernel		(aprobe0:ahcich3:0:0:0): Retrying command
    Mar 11 09:03:52	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:03:52	kernel		(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
    Mar 11 09:03:52	kernel		ahcich3: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd 80 serr 00000000 cmd 0000c217
    Mar 11 09:03:52	kernel		ahcich3: Timeout on slot 2 port 0
    Mar 11 09:03:22	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
    Mar 11 09:02:50	kernel		(ada1:ahcich3:0:0:0): Retrying command
    Mar 11 09:02:50	kernel		(ada1:ahcich3:0:0:0): CAM status: Command timeout
    Mar 11 09:02:50	kernel		(ada1:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 a8 50 6c dd 40 00 00 00 00 00 00
    Mar 11 09:02:50	kernel		ahcich3: is 00000000 cs 00000000 ss 00000002 rs 00000002 tfd 40 serr 00000000 cmd 0000c117
    Mar 11 09:02:50	kernel		ahcich3: Timeout on slot 1 port 0
    Mar 11 09:00:38	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Mar 11 09:00:00	php		[pfBlockerNG] Starting cron process.
    Mar 11 08:00:38	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Mar 11 08:00:00	php		[pfBlockerNG] Starting cron process.
    Mar 11 07:00:38	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Mar 11 07:00:00	php		[pfBlockerNG] Starting cron process.
    Mar 11 06:13:16	pkg-static		pfSense-repo upgraded: 2.4.3.a.20180309.0651 -> 2.4.3.a.20180310.0802
    Mar 11 06:00:38	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Mar 11 06:00:00	php		[pfBlockerNG] Starting cron process.
    Mar 11 05:00:44	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Mar 11 05:00:00	php		[pfBlockerNG] Starting cron process.
    Mar 11 04:00:42	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Mar 11 04:00:00	php		[pfBlockerNG] Starting cron process.
    Mar 11 03:00:39	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Mar 11 03:00:00	php		[pfBlockerNG] Starting cron process.</sata> 
    

    Not sure if this is hardware, FreeBSD or pfSense related at this stage.


  • Netgate

    Looks like a failing disk to me.