Software Engineering Blog

Tipps und Tricks aus dem Leben eines Systemadministrators.

SSD caching using Linux

BTRFS + bcache + RAID5

Windows users have always been used to distribute their folders over the various disks, but as a linux user I just want to store everything nice and decent in my home folder. The only problem therewith is that the SSD does not provide enough (cheap) space to store all my data on the fast memory.

To meet this limitation, I mounted my large HDDs into folders in /home/felix/. Hence I had the same problem as the windows users described above. Furthermore this solution is not suitable for a multi-user environment.
Wouldn't it be nice if I can combine the advantages of both SSD and HDD, and I don't mean just to buy a consumer grade SSHD? My idea is to create a RAID5 with 3 HDDs and cache them all with bcache.

The target audience of this blog posts are sophisticated linux users who are familiar with the command line. Hence, all commands are posted without sudo to avoid users overwriting their existing filesystems by just copying the lines.

bcache

bcache is a block device cache, backed by a fast SSD and is optimized to detect sequential accesses. Hence, I expect to get the high IOPS rates of a SSD combined with a high throughput when reading large files, as this transfer should entirely bypass the cache. In addition, the relatively small cache is not trashed by large file transfers. The cache can be used in a read and a read/write mode, where the latter is disabled by default. In my opinion the only disadvantage of bcache is that in cannot be installed in-place.

RAID5

As the mean time between failure decreases massively with each disk added to a pool, I use a RAID5 to keep my data save. Using that, you need at least 3 HDDs and a single HDD can break without data loss. A nice side effect is that the throughput to the disk array is higher compared to a single disk, as the data is written to all disks in parallel.

Be aware: RAID is no alternative for backups!!!

Setup

Requirements

  • RAID5
  • Snapshots
  • easily extendable
  • SSD cachabel

Environment & Hardware

  • 3 x 3TB HDD (7200rpm)
  • 128GB SSD with OS and cache partition
  • Ubuntu 15.10

If you want to try this setup using a different OS, check if your kernel version is at least 3.19. Otherwise you might end up with a corrupted filesystem due to a bug.

But moment: Why not use ZFS with an SSD L2ARC and ZIL cache? At first, ZFS is until version 16.04 not native on Ubuntu. Furthermore, the memory demands of the filesystem are huge. Another aspect is the missing capability to change settings like the RAID mode "on the fly".

Layout

/dev/sda1 ------------------- EXT4 /
/dev/sda2 ---------
                   | 
/dev/sdb -- /dev/bcache0 -|
/dev/sdc -- /dev/bcache1 -|- BTRFS /home
/dev/sdd -- /dev/bcache2 -|

Commands

Be careful, these commands reflect the setup above. Your disk lables will virtually certain differ from mine. Before doing any changes to the filesystem, make a backup to a disk and remove it from the system.

# create cache
make-bcache -C /dev/sda2
make-bcache -B /dev/sdb /dev/sdc /dev/sdd

#attache cache, determine cache cset.uuid
bcache-super-show /dev/sda2 | grep cset.uuid
echo <csetuuid> > /sys/block/bcache0/bcache/attach
echo <csetuuid> > /sys/block/bcache1/bcache/attach
echo <csetuuid> > /sys/block/bcache2/bcache/attach

#create btrfs
mkfs.btrfs /dev/bcache0 /dev/bcache1 /dev/bcache2 -d raid5 -m raid5

#mount filesystem to copy home folder
mount /dev/bcache0 /mnt

#if you trust in your SSD, enable write caching
echo 'writeback' /sys/block/bcache*/bcache/cache_mode

Benchmarks

I will definetly provide some IO benchmarks, but currently I do not have the time to specify and run these :( However the subjective feeling is a way higher performance. The time to the fully loaded desktop decreased from around 1 minute to approx. 15 seconds.

Further Information

Kommentare

Kommentar von Cyril LOUSTEAU |

Hey there, M.Mößbauer, i was reading you with much interest, a question if you may, is your setup still in working order? i read here from " https://wiki.ubuntu.com/ServerTeam/InstallOnBcache " quote : "I format both ext4 (had a recent issue with bcache + btrfs)".
Are you still happy with btrfs and bcache, as i d like to do it myself on a ubuntu server 16.04 at home, just for sport. Thank you for your time and articles.
Cheers, Cyril.

Antwort von Felix Mößbauer

Hi Cyril,
the setup is still working without any problems and the performance is great. However, compared to your link I do not use btrfs and bcache as rootfs. I would definitely not recommend that, as the time to bring the system back after a crash is way to long for me. If you have to do a recovery, you can't just mount the partition in a live ubuntu. At first you have to install the appropriate packages and try to reassemble the cache. That is not exactly brilliant...

Another point to mention is even a bcached disk does not deliver the performance of a SSD. My recommendation is to split the SSD in two partitions: One for the OS (30 GB should be sufficient) and the remaining space for bcache.
As your setup is just for sport, try it and share your experience. I guess the problem mentioned in the article is a known bug which has been fixed at least a year ago (https://btrfs.wiki.kernel.org/index.php/Gotchas, Sect. Historical references).

Best, Felix

Kommentar von Heiri |

Hallo,
Ich habe diesen Blog mit grosser Aufmerksamkeit gelesen und angewendet.
Ich habe den Eindruck, dass BCache mir ein bisschen viel versteckt: Ich habe eine Harddisk aus dem Array ausgeschaltet. BTrFS zeigt mir keine "missing disk" an. Auch

btrfs device delete missing /dev/bcache0

sagt mir: "ERROR: not a btrfs filesystem: /dev/bcache0"
(Auch mit /dev/sd* klappt's nicht.)
Hättest Du dazu irgendwelche Infos?

Antwort von Felix Mößbauer

Hallo Heiri,

BCache versteckt an und für sich gar nichts, sondern stellt lediglich ein Blockdevice bereit. Aus den drei Blockdevices bcache[0-2] wird dann das btrfs Dateisystem aufgebaut.

Btrfs hat hierbei die Eigenheit, dass selbst bei RAID 5 ein Pool nur gemountet werden kann, wenn er nicht degraded ist (alle Geräte funktionsfähig). Fällt während des Betriebs eine Platte aus, bleibt der Pool weiterhin eingehängt.

Möchte man einen degraded Pool mounten, so muss "degraded" als Option übergeben werden:

mount /dev/bcache0 -o degraded

Wobei hier bcache0 durch ein noch vorhandenes Gerät des Pools ersetzt werden muss, d.h. falls bcache0 ausgefallen ist, einfach bcache1 nehmen. Siehe hierzu auch https://wiki.ubuntuusers.de/Installieren_auf_Btrfs-Dateisystem/#Probleme-und-Abhilfe

Ist der Pool wieder eingehängt, so kann die fehlende Platte mit `btrfs device delete missing` entfernt werden, oder mittels `btrfs replace` ersetzt werden.

Ich hoffe, ich konnte dir weiterhelfen.

Viele Grüße,
Felix

Einen Kommentar schreiben

Bitte addieren Sie 6 und 3.

Ähnliche Beiträge

Reverse Engineering a Dotnet Monitor

We reverse engineer a Dotnet Monitor in Windbg to see how it is internally implemented.

Weiterlesen …

Tune bcache for large SSDs

As SSDs are getting cheaper, low HDD / SSD ratios of 10/1 or better become an option. This article describes how to tune bcache for this scenario from an empirical perspective.

Weiterlesen …