Native File System Compression on Mac OS X

Disks drive capacities are huge these days and many file formats already utilize compression, yet there is still a need for native compression in the file system. Apple added file compression support to HFS+ in SnowLeopard (10.6). Perhaps the decreased capacity in the initial SSD drives in the MacBook Air was a motivating factor. Prior to 10.6, portions of the OS used compression on system files, like man pages and log files, but those methods were not entirely transparent.

We recently did a head-to-head compression comparison between HFS+ and ZFS. This was not an exhaustive comparison and we intentionally used an OS image as our working set since HFS+ compression is currently only recommended for system files. Other working sets will likely have different outcomes. We also acknowledge that there are other file systems on OS X, for example NTFS, that have native compression support.

Our comparison tests began with a stock 10.6 OS install. We then inflated the data (using the ditto command) to form our baseline working set. The same disk drive was used for each test and we made use of ditto again to move the working set into the file system. ZFS was configured as a single drive zpool. The table below summarizes the results.

Observations

HFS+ does an excellent job compressing the system files in the working set. It yielded an impressive 43% overall reduction. For users with limited drive capacity this is clearly a big win.

We observed that the HFS+ yield was better than ZFS (43% versus 35%) for the same compression algorithm (gzip-5). This makes sense since HFS+ can store multiple compressed objects in a single attribute b-tree node. One possible downside of using metadata instead of file data for compression storage is a performance impact if the active set of compressed files is large. There is a limited amount of metadata that can be cached, file data caching on the other hand, typically has more resources available.

The stock uncompressed zfs footprint is impressive if you consider that all the data has checksums, all the metadata is replicated (at least twice) and ZFS has a much larger storage capacity. You get much more and it costs less (disk footprint wise).

HFS+ and ZFS Native Compression

Summary

Both file systems use proven gzip compression in a transparent fashion. HFS+ compression is currently limited to system files and is opt-in per object. ZFS compression is per file system but it will opt-out (per file) if the yield is not significant. ZFS also has a few more compression choices and includes data integrity in the price of admission.

16 Comments
  1. Perhaps strange question.

    Is the ZFS or Z-410 PPC-compatible. My NAS-drive with ZFS crashed, propbably not the disk.

    Yours Uffe

  2. HFS+ compression is not exactly limited to system objects: afsctool (available at http://web.me.com/brkirch/brkirchs_Software/afsctool/afsctool.html ) will compress whatever you want, at any gzip compression level (I’ve done so for many third-party apps in /Applications, and nothing broke!). The source code is included.

    • Thanks for the tip Dave. AFAIK, Apple hasn’t yet formally supported generic HFS+ compression. We haven’t seen an API/KPI or any mention in the HFS Plus Volume Format document. As with any undocumented feature, caveat emptor.

      • ditto(1) also supports HFS+ compression, with the –hfsCompression command-line option.

        –hfsCompression
        When copying files or extracting content from an archive, if the destination is an HFS+ volume that supports compression, all the content will be compressed if appropriate. This
        is only supported on Mac OS X 10.6 or later, and is only intended to be used in installation and backup scenarios that involve system files. Since files using HFS+ compression are
        not readable on versions of Mac OS X earlier than 10.6, this flag should not be used when dealing with non-system files or other user-generated content.

  3. …would love to see the CPU cost for various compression options in ZFS. Perhaps some beta testers can setup and benchmark throughput on a ramdisk zpool :)

  4. I am currently obligated to buy multiple Terabytes of storage. Are there any suggestions as to how to approach the purchase with an eye to using your ZFS file system in the future?

  5. So very happy to see continuing progress on this, and looking forward to the beta and future improvements as well (particularly security related). One thing I would be curious to see is a modern impression of performance penalties under heavy load for the various compression schemes. Comparisons (under Solaris for example) of lzjb to gzip such as here (http://don.blogs.smugmug.com/tag/lzjb/) found some pretty massive CPU utilization, even though gzip did produce superior compression. In applying compression system wide it is probably worth testing this, although perhaps optimizations made since that time have negated the issue to a large extent. If you are able to utilize hardware acceleration, perhaps via GPU or other add ons for heavier users that of course too would change the picture.

    Thank you for your hard work, and looking forward to seeing more in the coming months!

  6. Interesting. Was deduplication in force with this test? I find myself wondering if there is a dual benefit to being able to leverage dedup along with standard gzip compression.

    For archival purposes, ie. family photo galleries comprised of scans and tons of raw files, the gzip-9 option along with dedup would be amazing for retaining live accessible archivals.

    W.

    • No dedup wasn’t enabled for this experiment. My guess is that the system file working set won’t have a ton of duplicate data.

  7. Hey this project is really cool, I am a PC user and Z-410 is enough to make me consider a change to Mac. Would love to hear more about the project in regular blogs.

  8. I have been messing with the beta Linux ZFS code, and I have a question. Is ZFS really a filesystem? I don’t see a mkfs.zfs under Linux. Is that a missing feature? Does your ZFS have the ability to format a drive as ZFS? Then is this ZFS formatted drive mountable as a ZFS volume?

  9. Z410 sounds like a very promising technology and I look forward to more information.

    The intro material I have seen, however (Ars) suggests this is for users with a super-high priority on data integrity. As a foundation for a laptop user who has gotten suckered into using a SSD for a boot drive, that sounds like the right mix, leading to a synchronized, clean image when I get home.

    But this first post, oddly, is about what seems a tangential feature, one that is somewhat irrelevant to the main thrust of Z410. I understand GZip might be great for all sorts of text files or perhaps Word documents, but it seems few of the admins who will want Z410 have compression — since the whole notion is that disk is cheap — anywhere the top of their priorities.

  10. Which hardware do you use for the ZFS disk setup. I found no drives for firewire that mount the disks as JBOD with 4 or more drives. Help wanted.

    M.

  11. How about a post discussing ZFS on modern 4k sector disks?

    • Excellent idea. We had already started an investigation on Advance Format drives and their impact on ZFS.