Storing your stuff with clever filesystems: ZFS and tmpfs

The filesystem is a a critical component of just about any operating system, however it’s often overlooked. When setting up a new server, the default filesystem options are often ticked and never thought about again. However, there exist a couple of filesystems which can provide some extraordinary features and speed. I’m talking about ZFS and tmpfs.

ZFS was originally developed by Oracle for their Solaris operating system, but has now been open-sourced and is freely available on linux. Tmpfs is a temporary file system which uses system memory to provide fast temporary storage for files. Together, they can provide outstanding reliability and speed for not very much effort.

Hard disk capacity has increased exponentially over the last 50 years. In the 1960s, you could rent a 5MB hard disk from IBM for the equivalent of $130,000 per month. Today you can buy for less than $600 a 12TB disk – a 2,400,000 times increase.

As storage technology has moved on, the filesystems which sit on top of them ideally need to be able to access the full capacity of those ever increasing disks. Many relatively new, or at least in-use, filesystems have serious limitations. Akin to “640K ought to be enough for anybody”, the likes of the FAT32 filesystem supports files which are at most 4GB on a chunk of disk (a partition) which can be at most 16TB. Bear in mind that arrays of disks can provide a working capacity of many times that of a single disk. You can buy the likes of a supermicro sc946ed shelf which will add 90 disks to your server. In an ideal world, as you buy bigger disks you should be able to pile them into your computer and tell your existing filesystem to make use of them, your filesystem should grow and you won’t have to remember a different drive letter or path depending on the hardware you’re using.

ZFS is a 128-bit file system, which means a single installation maxes out at 256 quadrillion zettabytes. All metadata is allocated dynamically so there isn’t the need to pre-allocate inodes and directories can have up to 2^48 (256 trillion) entries. ZFS provides the concept of “vdevs” (virtual devices) which can be a single disk or redundant/striped collections of multiple disks. These can be dynamically added to a pool of vdevs of the same type and your storage will grow onto the fresh hardware.

A further consideration is that both disks of the “spinning rust” variety and SSDs are subject to silent data corruption, i.e. “bit rot”. This can be caused by a number of factors even including cosmic rays, but the consequence is read errors when it comes time to retrieve your data. Manufacturers are aware of this and buried in the small print for your hard disk will be values for “unrecoverable read errors” i.e. data loss. ZFS works around this by providing several mechanisms:

Checksums for each block of data written.
Checksums for each pointer to data.
Scrub – Automatically validates checksums when the system is idle.
Multiple copies – Even if you have a single disk, it’s possible to provide redundancy by setting a copies=n variable during filesystem creation.
Self-healing – When a bad data block is detected, ZFS fetches the correct data from a redundant copy and replaces it with the correct data.

An additional bonus to ZFS is its ability to de-duplicate data. Should you be working with a number of very similar files, on a normal filesystem, each file will take up space proportional to the amount of data that’s contained. As ZFS keeps checksums of each block of data, it’s able to determine if two blocks contain identical data. ZFS therefore provides the ability to have multiple pointers to the same file and only store the differences between them.

ZFS also provides the ability to take a point in time snapshot of the entire filesystem and roll it back to a previous time. If you’re a software developer, got a package that has 101 dependencies and you need to upgrade it? Afraid to upgrade it in case it breaks things horribly? Working on code and you want to roll back to a previous version? ZFS snapshots can be run with cron or manually and provide a version of the filesystem which can be used to extract previous versions of overwritten or deleted files or used to roll everything back to a point in time when it worked.

Similar to deduplication, a snapshot won’t take up any disk extra space until the data starts to change.

The other filesystem worth mentioning is tmpfs. Tmpfs takes part of the system memory and turns it into a usable filesystem. This is incredibly useful for systems which create huge numbers of temporary files and attempt to re-read them. Tmpfs is also just about as fast as a filesystem can be. Compared to a single SSD or a RAID array of six disks, tmpfs blows them out of the water speed wise.

Creating a tmpfs filesystem is simple:
First create your mountpoint for the disk:

mkdir /mnt/ramdisk

Then mount it. The options are saying make it 1GB in size, it’s of type tmpfs and to mount it at the previously created mount point:

mount –t tmpfs -o size=1024m tmpfs /mnt/ramdisk

At this point, you can use it like any other filesystem:

df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1  218G 128G   80G  62% /
/dev/sdb1  6.3T 2.4T  3.6T  40% /spinnyrust
tank       946G 3.5G  942G   1% /tank
tmpfs      1.0G 391M  634M  39% /mnt/ramdisk

Author

Eoin Malins

View all posts

Oxford Protein Informatics Group

or "OPIG" to friends

Storing your stuff with clever filesystems: ZFS and tmpfs

Author