Data Compression: Is it Worth it?



  • As I am organizing my ever growing data collection I am beginning to wonder if data compression is worth it. Of course I have compressed file types, but I've never used compression on an entire volume. Would I see a significant, let's say anything above 5%, increase in writable disk space by compressing an entire volume of uncompressed file types?



  • How much more fun will the compression cause if you have to try to recover your data?

    What type of data is going to be on the volume and is it already compressed?



  • Based on ZFS number, compression is quite useful.



  • @stan-qaz:

    How much more fun will the compression cause if you have to try to recover your data?

    My data back up system is becoming more robust every few months. Soon I'll have offsite backups of my server. But still, you raise a good question and I don't have and answer.

    @stan-qaz:

    What type of data is going to be on the volume and is it already compressed?

    The data would be uncompressed formats of course. In my case either .wav,  .iso. and other image formats.

    @Harvy66:

    Based on ZFS number, compression is quite useful.

    Good to know. I do plan to switch over to a ZFS system within this year.



  • A wise old man with a beard once told me it depends on the data type. From 99% compression to next to zero compression.

    As in, for example: you have an *.mkv movie in H264; compression won't do much that as it already is compressed. Same apparently for *.jpg. I do recall the story was this: if the bits/bytes in the actual file can be summarized by a mathematical formula, compression will work since the redundant bits/bytes will be removed and the formula will just be written to it.



  • With good backups recovery isn't an issue for what is actually backed up but we found that compressed data was not something that the recovery company we used at work had much success with. Been a few years now so the specifics escape me but it would be worth a call and discussion if you anticipated using someone for data recovery.

    A lot of image formats are already well compressed, an iso file may also have much of the content already compressed. You might want to do some simple testing by compressing some chunks of your data to see just how well it does.

    ZFS has a lot of good features but if you are waiting a year and on Linux BTRFS might be worth considering as your distribution may support it while not supporting ZFS due to the licencing issues with it and Linux.

    Today for my use ext4 and uncompressed is good enough, adding a drive for more space when needed is still the easy path.



  • BTRFS is not going to be production ready for a long time, it also has a lot of logic design issues that make it less than ideal for sysadmins. It wasn't designed by sysadmins, so they don't know the kinds of issues sysadmins have to deal with.

    One example is how volumes are managed. In BTRFS, each volume is separate from each other. This means you can have the same volume mounted at multiple points in your File System hierarchy. BTRFS specifically designed it this way because they didn't want dependencies between volumes to give some "cool" flexibility.

    What this means

    1. You can't atomically snapshot between volumes. If you have one volume under another volume, and you snapshot the parent volume, it does not include the child volume. You have to write a script to transverse the parent volume for any child volumes and separate snapshot them, but this is not transactional, so the snapshots can get out of sync with each other if someone writes data during this window. It also means the snapshots are attached to the children directly, and not the parent, but because a volume can have multiple parents, anyone else looking at the volume may get confused as to why there is a snapshot.

    2. Volumes do not know about each other, they must be specifically mounted. With ZFS, mounts are metadata in the FS, so they are actually part of the snapshots, and you don't need to remount every reboot. Don't lose your BTRFS mount scripts, since mounts are not part of the snapshots.

    3. Since the same volume may be mounted under multiple parents or even a volume can be mounted under itself, making it recursive, it is nearly impossible to figure out how large a volume really is. BTRFS has a hard time knowing how much space is actually used by a given volume. They let you know how much free space is available at the drive level, but they can't tell you what is using that space. Some tools can make educated guesses, but they can never really know, and the answer could be off by factors.

    There are a lot of other issues, but these ones stood out the most when I read about BTRFS a long while ago. These "features" are baked in and would require a major re-write of BTRFS, from what I understand. Someone might come up with an interesting solution, but BTRFS was not designed for sysadmins, so stuff like atomic snapshots and knowing how much space a volume is consuming was an after thought that might be logically impossible.

    BTRFS also have some strange failure cases and corruption. ZFS does not need fsck because the FS is either in a stable state or it is beyond repair. Very binary, because it is a transactional FS. BTRFS has a fsck tool because it is not truly a transactional FS and it can be left in unstable/corrupted states that must be repaired.

    ZFS is not perfect, but it is designed well, simple to understand, and just works. There are bugs to be found, but it is 15+ years ahead of BTRFS in bug fixes. An FS is not a simple thing.

    As for compression, the original topic, the ZFS people said they recommend just leaving compression on, even for volumes where you store pre-compressed data. The CPU overhead is negligible, unless you have a really weak CPU, like past-gen Intel Atoms.

    P.S. Some of my BTRFS info could be outdated. Please correct me if it is no longer valid.


  • LAYER 8 Netgate

    On this dataset, I find it totally worth it.

    ![Screen Shot 2015-01-17 at 11.13.47 AM.png](/public/imported_attachments/1/Screen Shot 2015-01-17 at 11.13.47 AM.png)
    ![Screen Shot 2015-01-17 at 11.13.47 AM.png_thumb](/public/imported_attachments/1/Screen Shot 2015-01-17 at 11.13.47 AM.png_thumb)



  • @Harvy66:

    BTRFS is not going to be production ready for a long time, it also has a lot of logic design issues that make it less than ideal for sysadmins. It wasn't designed by sysadmins, so they don't know the kinds of issues sysadmins have to deal with.

    Thanks for that information, it is interesting stuff to think about and I've filed a copy away for review prior to my next Linux upgrade. OpenSuse Linux is shipping BTRFS as the default filesystem for some partition types today in version 13.2 and they intend to fully move to it at some point. They have jumped too early before on technical decisions, recently on systemd, so I am a bit leery of their choices at this point and don't wish to have another mess like I did when they went to Reiser (spelling?) as a file system and then it went away.

    I don't see ZFS as a realistic option for my Linux needs due to the non-support by Linux distributions but since I'm not wedded to Linux as an operating system you make some interesting points that recommend an OS that offers native support for ZFS. For someone wedded to Linux the use of the ZFS format from the project might be a good option but it seems like it would be a lot safer to just go with a BSD and native ZFS.



  • @stan-qaz:

    @Harvy66:

    BTRFS is not going to be production ready for a long time, it also has a lot of logic design issues that make it less than ideal for sysadmins. It wasn't designed by sysadmins, so they don't know the kinds of issues sysadmins have to deal with.

    Thanks for that information, it is interesting stuff to think about and I've filed a copy away for review prior to my next Linux upgrade. OpenSuse Linux is shipping BTRFS as the default filesystem for some partition types today in version 13.2 and they intend to fully move to it at some point. They have jumped too early before on technical decisions, recently on systemd, so I am a bit leery of their choices at this point and don't wish to have another mess like I did when they went to Reiser (spelling?) as a file system and then it went away.

    I don't see ZFS as a realistic option for my Linux needs due to the non-support by Linux distributions but since I'm not wedded to Linux as an operating system you make some interesting points that recommend an OS that offers native support for ZFS. For someone wedded to Linux the use of the ZFS format from the project might be a good option but it seems like it would be a lot safer to just go with a BSD and native ZFS.

    I hear the recent FreeBSD implementation of ZFS supports feature flags, which allows upgrades to not change the current ZFS, so you can properly snapshot and roll-back your OS. Say you have 10.1 installed, snap shot before upgrading to 10.2, and if something goes wrong, just switch back. Even maintain two snapshots and you can reboot into the other, or possibly run it in a jail.

    Always back-up your stuff  :p


Log in to reply