Advanced Linux SSD caching for hard drives on Debian Jessie – bcache and enhanceIO

Please note, i am keeping this thread for myself as a reminder, so if sometimes you feel that it is getting too general, it is probably more of a reminder for me.

Also note that i am organizing this post as i go as i spend some time creating my new MySQL machine every day, so this is an ACTIVE work in progress

—————————————————
First note, the benefits of RAID 0 on the SSDs is more about wear than it is about speed.
Here is why and how

1- All modern disks use wear leveling to prolong the life of your disk, bcache does nothing in this regard, so using thumb drives is not very smart.
2- Your SSD has to be larger than the active data set, otherwise, bcache or any other will start kicking out data and writing in it’s place too often, wearing your disk, yet, if the data that is always accessed on your hard drive is about 500GBs of your 3TB disk, and your cache drive is more than 500GBs, odds are that your disk will probably send this data, then spin down because everything we need is already on the SSD reducing the wear to almost nothing. If you do not have a large enough SSD, odds are you will save their life with a RAID setup, either software or hardware, it makes no difference.
3- Larger disks have longer lives in general with or without raid for all purposes, the reason is arithmetically simple, the larger the disk, the linger it takes for a flash chip to get it’s turn for a write again because the others need to get their turn first, the larger the disk, the more the banks, hence, the longer the useful life.
4- Erase commands are expensive, And even with a workaround, trim support means nothing on a busy server
—————————————————
My own setup

1- My computer has 6 sata ports (ASUS P9X79).
2- I prefer to never group spinning hard drives in one metal case that does not have vibration isolation, so the mainboard SATA will host the 5 SSDs, and the main hard drive (2TB black).
3- I have 2 PCIe 2 port esata cards (silicon image supporting port multiplier)
4- 1 SATA port multiplier (Silicon image) that even though has raid function, is probably more reliable in JBOD mode, where the 5 spinning disks other than the main disk will need to live.

Most data on the 6 spinning disks is only stored and seldom accessed. furthermore, most of this data does not get cached because it is accessed sequentially.

The boot partition of the hard drive (Spinning) is not cached, the second partition is, this is safer as the system can boot anyways, then the raid array and the bcache initialized with no unforeseen effects, boot time is not so important as the PC is always on, we reboot it once a year

—————————————————
Note for self:
1- What are the possibilities of 2 cache layers, one with write-back, the other is only for read cache.
Possible Advantages:
You can use less reliable flash thumb drives with no wear leveling mechanism as an extra cache layer, where failure will not cause data corruption, but will leave more space for the SSDs
Overcache problem considerations ?
The SSD will initially get the same cache as the thumb drive, but when the hit rate on the thumb drive increases, the SSD will remove that data, so we will only have 1 copy
The SSD with write-back can make writing a hell lot faster for database data.
Bcache only allows one SSD per spinning partition. you can not combine, while the other methods could ! and since they are read only, you could even RAID the thumb drives
Caching software available
Device mapping
bcache (part of the new kernels)
dm-cache (part of the new kernels)
Flashcache (Facebook’s)
EnhanceIO (Fork of flashcache)

EnhanceIO can be attached to any block device on-the-fly even
when device is already mounted.
dm-cache is faster, but bcache is safer

If so…
we will probably be using bcache in FIFO mode, and enhanceio in most (least recently used (LRU)).

Also take a look at https://github.com/Feh/nocache
—————————————————
You probably noticed that there are not many tutorials and instructions on bcache for Debian online, the reason is that the user tools are not packaged for Debian, not even for Jessie, but that is a simple thing, the hard part has already been done at the kernel level so you dont need to recompile the kernel, but you need the instructions on how to setup bcache !

So here is a step by step, no theory and no explinations, just the procedure

bcache, the simple and safer way

So here is how i setup bcache on my linux server for MySQL usage.

1- create 2 partitions on my 2TB disk, one would be 200GB and the other 1800GB (With fdisk or parted you chose)

2- Delete the first 200GB partition leaving only the other

3- Run the debian installer and instructing it to use the continuous free space, which is now the small first partition

apt-get install git make gcc pkg-config uuid openssl util-linux uuid-dev libblkid-dev

git clone https://github.com/g2p/bcache-tools.git

cd bcache-tools

make

make install

-------------------------------------
install -m0755 make-bcache bcache-super-show /usr/sbin/
install -m0755 probe-bcache bcache-register /lib/udev/
install -m0644 69-bcache.rules /lib/udev/rules.d/
install -m0644 -- *.8 /usr/share/man/man8/
install -D -m0755 initramfs/hook /usr/share/initramfs-tools/hooks/bcache
install -D -m0755 initcpio/install /usr/lib/initcpio/install/bcache
install -D -m0755 dracut/module-setup.sh /lib/dracut/modules.d/90bcache/module-setup.sh

-----------------------------------------------------

Now, use wipefs to delete the filesystem on the big partition (/dev/sda2) so that we can format it as bcache rather than ext4

wipefs -a /dev/sda2

Take note of the UUIDs that come out of thie following 2 commands.

Why ? well, caching devices are the SSDs, backing devices are the normal sata disks, what happens when you run the following 2 commands is that you tell the kernell that one is a backing device, and the other is a caching device, you can have as many backing and caching devices as you want, but pay attention to this, you can not use 2 SSDs to cache one disk, although you can use 1 SSD to cache multiple disks, so for example, you can have 2 SSDs caching 5 disks, 1 SSD can cache 3 disks for example and the other can cache 2 disks.

Also, this is debian, so this step is done for you automatically, you will have /sys/fs/bcache/somediskuuid off the bat, that does not apply to all linux systems (some systems might requiere you to do the following to the backing deice ... echo "the bcache uuid" > /sys/fs/register)

First step, initialize my 1.8TB partition into a bcache block, -B means this is a backed device

make-bcache -B /dev/sda5

mkfs.ext4 /dev/bcache0

Then initialize the SSD (120GB intel SSD) into a bcache block, -C means caching device

make-bcache -C /dev/sdb1

--------------------

UUID: 4e9aed54-bf48-4d7a-b5b2-b041f2a811f8
Set UUID: 13544b4e-99de-42a0-905a-c6efbe669151
version: 0
nbuckets: 228944
block_size: 1
bucket_size: 1024
nr_in_set: 1
nr_this_dev: 0
first_bucket: 1

----------------------------

Now we have a backing and caching device, let's connect them together, this step is called "attaching"

first,

cd /sys/fs/bcache/

ls

the ls command will give you the caching devices UUID as a folder, which is actually a folder.

Now, Mount the bcache partition.

mount /dev/bcache0 /hds/cacheddisk (Check that you have bcache0 not bcache1 for example).



One thought on “Advanced Linux SSD caching for hard drives on Debian Jessie – bcache and enhanceIO”

  1. Thank you for this. I am trying to get bcache in Debian Jessie and I think this solves it. I’m glad I won’t need to customize my kernel.

    Something to keep in mind is that bcache can use a btrfs device as a backing device so you can dynamically grow and expand your backing device with multiple SSDs.

Leave a Reply

Your email address will not be published. Required fields are marked *