Review of btrfs, Linux’s perpetually half-finished file system

[ad_1]

Enlarge / We do not recommend that you allow btrfs to directly manage a complex set of disks — floppy or otherwise.

Btrfs — short for “B-Tree File System” and frequently pronounced “butter” or “butter eff ess” —is the most advanced file system present in the main Linux kernel. In some ways, btrfs is simply looking to supplant ext4, the default file system for most Linux distributions. But btrfs also aims to provide next-generation features that break the simple “file system” mold, combining the functionality of a RAID array manager, volume manager, and so on.

We have good news and bad news on this. First, btrfs is a perfectly cromulent single-disk ext4 replacement. But if you’re hoping to replace ZFS – or a more complex stack based on discrete RAID management, volume management, and a simple file system – the picture isn’t so rosy. While the btrfs project has solved many of the glaring issues it started with in 2009, other issues remain largely unchanged 12 years later.

Story

Chris Mason is the founding developer of btrfs, which he started working on in 2007 while working at Oracle. This leads many people to believe that btrfs is an Oracle project, it is not. The project belonged to Mason, not his employer, and it remains an unencumbered community project by corporate ownership to this day. In 2009, btrfs 1.0 was accepted into the main Linux 2.6.29 kernel.

Although btrfs went mainline in 2009, it wasn’t actually production ready. Over the next four years, creating a btrfs filesystem would display the following deliberately scary message to the admin who dared mkfs a btrfs, and it required a default Y proceed:

Btrfs is a new filesystem with extents, writable snapshotting,
support for multiple devices and many more features.

Btrfs is highly experimental, and THE DISK FORMAT IS NOT YET
FINALIZED. You should say N here unless you are interested in
testing Btrfs with non-critical data.

Since Linux users are Linux users, many chose to ignore this warning and, unsurprisingly, a lot of data was lost. This four-year beta may have had a lasting impact on the btrfs developer community, which in my experience tended to fall back on ‘well, it’s all beta anyway’ whenever problems reported by users were occurring. It was going well after mkfs.btrfs lost her creepy dialogue at the end of 2013.

It has now been almost eight years since the “experimental” tag was removed, but many age-old btrfs issues go unanswered and remain unchanged. So, we’ll repeat this one more time: aa single disk file system, btrfs has been stable and for the most part have been performing well for years. But the deeper you delve into the new features that btrfs offers, the more unstable the ground you walk on, which is what we are focusing on today.

Characteristics

Btrfs has only one real competitor in the Linux and BSD file system space: OpenZFS. It’s almost impossible to avoid comparing and pitting btrfs against OpenZFS, as the Venn diagram of their respective feature sets is little more than a simple, slightly lumpy circle. But we will try to avoid as much as possible to compare and oppose the two directly. If you are an OpenZFS administrator, you already know this; and if you are not an OpenZFS administrator, they are not really useful.

In addition to being a simple single disk file system, btrfs offers multiple disk topologies (RAID), volume managed storage (see Linux Logical Volume Manager), atomic copy-on-write snapshots, incremental replication asynchronous, automatic healing of corrupted data, and compression to disk.

Comparison with legacy storage

If you wanted to create a system without btrfs or ZFS with similar functionality, you would need a discrete layer stack – mdraid at the bottom for RAID, LVM then for snapshots, then a filesystem such as ext4 or xfs on the top of your ice cream cup storage.

Unfortunately, an mdraid + LVM + ext4 storage stack always ends up missing some of the theoretically most compelling features of btrfs. LVM offers atomic snapshots but no direct snapshot replication. Neither ext4 nor xfs offer inline compression. Well mdraid can offer data healing if you enable the dm-integrity target, that sucks.

The dm-integrity default target to an extremely low crc32 collision prone hash algorithm, it requires the target devices to be completely overwritten during initialization, and it too requires the complete overwrite of every block of a replaced disk after a failure, beyond the complete write of the disk required during initialization.

In short, you really can’t replicate the promised feature set of btrfs to a legacy storage stack. To get the set you need btrfs or ZFS.

Btrfs multidisk topologies

Now that we’ve seen where things go wrong with a legacy storage stack, it’s time to take a look at where btrfs itself crashes. For that, the first place we’ll look is in multiple disk topologies of btrfs.

Btrfs offers five multiple disk topologies: btrfs-raid0, btrfs-raid1, btrfs-raid10, btrfs-raid5, and btrfs-raid6. Although the documentation tends to refer to these topologies more simply, for example, just like raid1 rather than btrfs-raid1– we strongly recommend that you keep the prefix in mind. These topologies can in some cases be extremely different from their conventional counterparts.

Topology	Conventional version	Btrfs version
RAID0	Single tape: lose any disk, lose the array	Single tape: lose any disk, lose the array
RAID1	Simple mirror—All data blocks on disk m and disk O are the same	Guaranteed redundancy– copies of all blocks will be saved on two separate devices
RAID10	Striped mirror sets — for example, one stripe on three pairs of mirrored discs	Striped mirror sets — for example, one stripe on three pairs of mirrored discs
RAID5	Diagonal parity RAID: single parity (one parity block per stripe), fixed bandwidth	Diagonal Parity RAID: Single parity (one parity block per stripe) with variable bandwidth
RAID6	Diagonal parity RAID: double parity (two parity blocks per stripe), fixed bandwidth	Diagonal Parity RAID: Double parity (two parity blocks per stripe) with variable bandwidth

As you can see in the table above, btrfs-raid1 differed quite radically from its conventional analogue. To understand how, let’s think about a hypothetical collection of “mutt” disks of incompatible sizes. If we have one 8T disk, three 4T disks, and one 2T disk, it is difficult to make a useful conventional RAID array out of it. For example, a RAID5 or RAID6 should treat them all as 2T drives (producing only 8T of raw storage before parity).

However, btrfs-raid1 offers a very interesting premise. Since it does not actually pair the discs in pairs, it can use the entire disc collection without waste. Whenever a block is written to the btrfs-raid1, it is written identically on two separate disks—all two separate discs. As there are no fixed pairings, btrfs-raid1 is free to simply fill all disks at the same approximate rate proportional to their free capacity.

The btrfs-raid5 and btrfs-raid6 topologies are somewhat similar to btrfs-raid1 in that, unlike their conventional counterparts, they are able to handle incompatible disk sizes by dynamically changing the stripe widths as the smaller disks fill up. Or btrfs-raid5 or btrfs-raid6 should be used in production, however, for reasons we’ll cover shortly.

The btrfs-raid10 and btrfs-raid0 Topologies are much closer to their classic counterparts, and in most cases they can be seen as direct replacements sharing the same strengths and weaknesses.

[ad_2]

Source link