Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    ZFS pool degraded - no dashboard warning?

    Scheduled Pinned Locked Moved Off-Topic & Non-Support Discussion
    4 Posts 3 Posters 995 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Q
      q54e3w
      last edited by

      Maybe I'm missing a configuration step but in case I'm not. It would be useful to be able to display a ZFS pool degradation warning on the dashboard should one happen.
      Ive had one of my two mirrored SATADOM's fail - SMART status shows green for the one drive remaining, zpool status shows degraded status.

      
      [2.4.2-RELEASE][root@pfSense.local.lan]/root: zpool status zroot
        pool: zroot
       state: DEGRADED
      status: One or more devices has been removed by the administrator.
      	Sufficient replicas exist for the pool to continue functioning in a
      	degraded state.
      action: Online the device using 'zpool online' or replace the device with
      	'zpool replace'.
        scan: none requested
      config:
      
      	NAME                     STATE     READ WRITE CKSUM
      	zroot                    DEGRADED     0     0     0
      	  mirror-0               DEGRADED     0     0     0
      	    ada0p4               ONLINE       0     0     0
      	    1900527549686068822  REMOVED      0     0     0  was /dev/ada1p4
      
      

      smartstatus.png
      smartstatus.png_thumb

      1 Reply Last reply Reply Quote 0
      • GrimsonG
        Grimson Banned
        last edited by

        It has been said quite a few times: The devs see ZFS in pfSense as experimental and there is no GUI integration yet.

        If you want to manage ZFS install the cron package to schedule scrubs and use the mailreport package to send you status information.

        1 Reply Last reply Reply Quote 0
        • Q
          q54e3w
          last edited by

          Theres something funky going on with my box which I cant get to the bottom of.

          4am: weekly 'smartctl -long' tests both drives. No problems reported.
          9am: drive drops out of array, no errors reported on ZFS array.

          
          Mar 11 10:02:34	php-fpm	12021	/status_logs.php: Session timed out for user 'admin' from: 192.168.20.101
          Mar 11 10:00:39	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
          Mar 11 10:00:00	php		[pfBlockerNG] Starting cron process.
          Mar 11 09:12:40	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
          Mar 11 09:12:40	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:12:40	kernel		(aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
          Mar 11 09:12:40	kernel		ahcich3: is 00000000 cs 00002000 ss 00000000 rs 00002000 tfd 80 serr 00000000 cmd 0000cd17
          Mar 11 09:12:40	kernel		ahcich3: Poll timeout on slot 13 port 0
          Mar 11 09:12:29	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
          Mar 11 09:11:52	ZFS		vdev is removed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822
          Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): Periph destroyed
          Mar 11 09:11:52	ZFS		vdev state changed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822
          Mar 11 09:11:52	ZFS		vdev state changed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822
          Mar 11 09:11:52	ZFS		vdev state changed, pool_guid=7630167079469879448 vdev_guid=1900527549686068822
          Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): Error 5, Periph was invalidated
          Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 a8 50 6c dd 40 00 00 00 00 00 00
          Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): Error 5, Periph was invalidated
          Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
          Mar 11 09:11:52	kernel		(ada1:ahcich3:0:0:0): DSM TRIM. ACB: 06 01 00 00 00 40 00 00 00 00 01 00
          Mar 11 09:11:52	kernel		ahcich3: is 00000000 cs 00001000 ss 00001000 rs 00001000 tfd 80 serr 00000000 cmd 0000cc17
          Mar 11 09:11:52	kernel		ahcich3: Timeout on slot 12 port 0
          Mar 11 09:11:22	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
          Mar 11 09:11:22	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:11:22	kernel		(aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
          Mar 11 09:11:22	kernel		ahcich3: is 00000000 cs 00000800 ss 00000000 rs 00000800 tfd 80 serr 00000000 cmd 0000cb17
          Mar 11 09:11:22	kernel		ahcich3: Poll timeout on slot 11 port 0
          Mar 11 09:11:05	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
          Mar 11 09:10:34	kernel		(ada1:ahcich3:0:0:0): Error 5, Periph was invalidated
          Mar 11 09:10:34	kernel		(ada1:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:10:34	kernel		(ada1:ahcich3:0:0:0): SETFEATURES ENABLE WCACHE. ACB: ef 02 00 00 00 40 00 00 00 00 00 00
          Mar 11 09:10:34	kernel		ahcich3: is 00000000 cs 00000400 ss 00000000 rs 00000400 tfd 80 serr 00000000 cmd 0000ca17
          Mar 11 09:10:34	kernel		ahcich3: Timeout on slot 10 port 0
          Mar 11 09:10:04	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
          Mar 11 09:10:04	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:10:04	kernel		(aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
          Mar 11 09:10:04	kernel		ahcich3: is 00000000 cs 00000200 ss 00000000 rs 00000200 tfd 80 serr 00000000 cmd 0000c917
          Mar 11 09:10:04	kernel		ahcich3: Poll timeout on slot 9 port 0
          Mar 11 09:09:53	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
          Mar 11 09:09:16	kernel		(ada1:ahcich3:0:0:0): Error 5, Periph was invalidated
          Mar 11 09:09:16	kernel		(ada1:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:09:16	kernel		(ada1:ahcich3:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 00 00 00 00 00
          Mar 11 09:09:16	kernel		ahcich3: is 00000000 cs 00000100 ss 00000000 rs 00000100 tfd 80 serr 00000000 cmd 0000c817
          Mar 11 09:09:16	kernel		ahcich3: Timeout on slot 8 port 0
          Mar 11 09:08:46	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
          Mar 11 09:08:46	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:08:46	kernel		(aprobe0:ahcich3:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
          Mar 11 09:08:46	kernel		ahcich3: is 00000000 cs 00000080 ss 00000000 rs 00000080 tfd 80 serr 00000000 cmd 0000c717
          Mar 11 09:08:46	kernel		ahcich3: Poll timeout on slot 7 port 0
          Mar 11 09:08:35	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
          Mar 11 09:07:58	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
          Mar 11 09:07:58	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:07:58	kernel		(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
          Mar 11 09:07:58	kernel		ahcich3: is 00000000 cs 00000040 ss 00000000 rs 00000040 tfd 80 serr 00000000 cmd 0000c617
          Mar 11 09:07:58	kernel		ahcich3: Timeout on slot 6 port 0
          Mar 11 09:07:28	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
          Mar 11 09:06:56	kernel		(aprobe0:ahcich3:0:0:0): Retrying command
          Mar 11 09:06:56	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:06:56	kernel		(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
          Mar 11 09:06:56	kernel		ahcich3: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd 80 serr 00000000 cmd 0000c517
          Mar 11 09:06:56	kernel		ahcich3: Timeout on slot 5 port 0
          Mar 11 09:06:26	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
          Mar 11 09:05:55	kernel		ada1: <sata ssd="" s9fm02.1="">s/n B4500756172400100699 detached
          Mar 11 09:05:55	kernel		ada1 at ahcich3 bus 0 scbus3 target 0 lun 0
          Mar 11 09:05:55	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked
          Mar 11 09:05:55	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:05:55	kernel		(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
          Mar 11 09:05:55	kernel		ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 00000000 cmd 0000c417
          Mar 11 09:05:55	kernel		ahcich3: Timeout on slot 4 port 0
          Mar 11 09:05:25	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
          Mar 11 09:04:53	kernel		(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
          Mar 11 09:04:53	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:04:53	kernel		(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
          Mar 11 09:04:53	kernel		ahcich3: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd 80 serr 00000000 cmd 0000c317
          Mar 11 09:04:53	kernel		ahcich3: Timeout on slot 3 port 0
          Mar 11 09:04:23	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
          Mar 11 09:03:52	kernel		(aprobe0:ahcich3:0:0:0): Retrying command
          Mar 11 09:03:52	kernel		(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:03:52	kernel		(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
          Mar 11 09:03:52	kernel		ahcich3: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd 80 serr 00000000 cmd 0000c217
          Mar 11 09:03:52	kernel		ahcich3: Timeout on slot 2 port 0
          Mar 11 09:03:22	kernel		ahcich3: AHCI reset: device not ready after 31000ms (tfd = 00000080)
          Mar 11 09:02:50	kernel		(ada1:ahcich3:0:0:0): Retrying command
          Mar 11 09:02:50	kernel		(ada1:ahcich3:0:0:0): CAM status: Command timeout
          Mar 11 09:02:50	kernel		(ada1:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 a8 50 6c dd 40 00 00 00 00 00 00
          Mar 11 09:02:50	kernel		ahcich3: is 00000000 cs 00000000 ss 00000002 rs 00000002 tfd 40 serr 00000000 cmd 0000c117
          Mar 11 09:02:50	kernel		ahcich3: Timeout on slot 1 port 0
          Mar 11 09:00:38	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
          Mar 11 09:00:00	php		[pfBlockerNG] Starting cron process.
          Mar 11 08:00:38	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
          Mar 11 08:00:00	php		[pfBlockerNG] Starting cron process.
          Mar 11 07:00:38	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
          Mar 11 07:00:00	php		[pfBlockerNG] Starting cron process.
          Mar 11 06:13:16	pkg-static		pfSense-repo upgraded: 2.4.3.a.20180309.0651 -> 2.4.3.a.20180310.0802
          Mar 11 06:00:38	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
          Mar 11 06:00:00	php		[pfBlockerNG] Starting cron process.
          Mar 11 05:00:44	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
          Mar 11 05:00:00	php		[pfBlockerNG] Starting cron process.
          Mar 11 04:00:42	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
          Mar 11 04:00:00	php		[pfBlockerNG] Starting cron process.
          Mar 11 03:00:39	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
          Mar 11 03:00:00	php		[pfBlockerNG] Starting cron process.</sata> 
          

          Not sure if this is hardware, FreeBSD or pfSense related at this stage.

          1 Reply Last reply Reply Quote 0
          • DerelictD
            Derelict LAYER 8 Netgate
            last edited by

            Looks like a failing disk to me.

            Chattanooga, Tennessee, USA
            A comprehensive network diagram is worth 10,000 words and 15 conference calls.
            DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
            Do Not Chat For Help! NO_WAN_EGRESS(TM)

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.