Notifications for ZFS status
-
I'm aware of https://redmine.pfsense.org/issues/9226, but I wanted to raise this because I just recovered from both SSDs in a mirror dying in the same day and there wasn't a peep from the standard pfSense software, nor that script, which I had configured (and alerted me to a capacity issue before due to excessive retention from ntopng).
Not sure what's gone on with the hardware, but one drive's controller doesn't respond and the other drive can't be imported. I've made some recovery attempts with UFS Explorer, but it couldn't see the newest boot environment at all, so I had to revert to a backup and lost a few recent config changes from after that backup that I had to recreate.
The things that script checks for are a great start. Drives going offline or having SMART status changes should be alerted.
Booting from the pfSense Installer image made me wish for a few things like smartmontools to be in that image to aid with recovery.
-
If you have SMART enabled and it fails I'd expect to see some sort of alert. Were you able to see anything logged before it failed?
-
@stephenw10 No emails were emitted; it just stopped routing and didn't come back when rebooted.
If a drive went offline and the pool changed to DEGRADED, the only time I'd have learned that from that script is after the cron job successfully ran, weekly.
Something needs to detect the pool transitioning away from ONLINE or a drive going offline.
-
Hmm, do the drives now fail a SMART check? I guess one doesn't respond at all....
This seems like something that has been raised before. Unsurprisingly. I thought there was an open ticket for it.....
-
Oh I'm thinking of the bug you linked.

Here this didn't help because weekly tests were not frequent enough. Some active service that triggered on status change would be better I agree. However that script looks pretty light apart from the scrub every time it runs. You could run that far more frequently if you removed that.
-
@stephenw10 said in Notifications for ZFS status:
Here this didn't help because weekly tests were not frequent enough. Some active service that triggered on status change would be better I agree. However that script looks pretty light apart from the scrub every time it runs. You could run that far more frequently if you removed that.
It would be ideal for the base OS to be watching for status change. I just wanted to put that thought out there -- polling consumes extra power, isn't ever frequent enough, etc.
I'm not sure that this would have actually helped in my situation or not as I don't know if there was enough time between the two drive failures to have done anything to rebuild the pool.
-
While I'm thinking about ZFS-related things, I have just enough modifications to pfSense that require backup that a built-in ZFS send/recv solution would be a good addition.
If you're wondering what those non-packaged mods are:
- https://github.com/clara-j/pfsense_zfs_check
- wpa_supplicant setup for AT&T fiber
- https://github.com/LeonStraathof/pfsense-speedtest-widget
I think all of these could be handled if there was a designated directory that was included in backups and/or packed into config.xml.
-
Consider using :

it permits you to store the files you've added yourself in the main pfSense config.xml. As this file is already backed up regularly, (right ?) you've everything in one place.
ZFS is pretty well protected for the "shoot in the foot" situations like shutting down the system by ripping out the power (get an UPS anyway !).
pfSense shouldn't really contain any important data except for the above mentioned config.xml.Also : apply the Keep It Simple rule. The day you need to re setup or re install, you do this probably to regain Internet access, and KIS helps to do ASAP.
You don't want to take any complicated steps at that moment.@ohmantics said in Notifications for ZFS status:
https://github.com/LeonStraathof/pfsense-speedtest-widget
Speed tests shouldn't be executed on a router (firewall). You should 'speedtest' through your router/firewall.
-
@Gertjan said in Notifications for ZFS status:
Speed tests shouldn't be executed on a router (firewall).
Generally that's true but not always. For example my WAN here is ~80Mbps and that really presents no problem for the client on the firewall itself.
Also if you understand the problem then it can still be useful, even if it can't reach the full rate, comparative results. Or to confirm you are not limited to some much lower rate.
But, yes, to test the full bandwidth possible with the pfSense install you should test through it and not to or from it.

-
@Gertjan said in Notifications for ZFS status:
Consider using :

it permits you to store the files you've added yourself in the main pfSense config.xml. As this file is already backed up regularly, (right ?) you've everything in one place.
I've been using pfSense for over a decade and I never noticed this package. Looks like an excellent solution to preserve some of these customizations. Kinda a shame that none of the recipes online for modifying pfSense use this.
@ohmantics said in Notifications for ZFS status:
https://github.com/LeonStraathof/pfsense-speedtest-widget
Speed tests shouldn't be executed on a router (firewall). You should 'speedtest' through your router/firewall.
This is just a convenience widget, not setup as a cron job. It gives one data point among many to consider when debugging performance problems. I don't see an issue with that.