24 Nov 2014
Today we discuss how to manage wisely your cache setting in Ceph (docs.ceph.com/en/latest/rbd/index.html). The type of cache discussed below is the user space implementation of the Ceph block device (i.e., librbd
).
Suddenly after migration to Ceph we start observing the doubling of memory usage by every virtual machine.
After some research we found the configuration error in ceph.conf
. The option rbd cache size
describes the amount of cache reserved for every virtual disk, so the total amount of RAM reserved for cache is given by
total amount of ram = (number of virtual disks) X (rbd cache size)
After some rational reasoning of the kind - the standard hard drive usually possess 64M
of cache - the final version of configuration will be the following
rbd cache = true
rbd cache size = 67108864 # (64MB)
rbd cache max dirty = 50331648 # (48MB)
rbd cache target dirty = 33554432 # (32MB)
rbd cache max dirty age = 2
rbd cache writethrough until flush = true
Now push the configuration to the virtual hosts
$ ceph-deploy --overwrite-conf config push host1 host2
To apply settings without virtual machine restart use KVM
live migration. After the virtual machine is migrated from host1
to host2
the new process will start taking into account the modification to the ceph configuration.
If not sure about setting leave Ceph’s default values for amount of cache, simply enable it
rbd cache = true
rbd cache writethrough until flush = true
20 Nov 2014
STEP 1:
Choose a disk with the most optimal ratio IOPS/price. Check out the list of disks in order of IOPS (Input/Output Operations Per Second) performance: en.wikipedia.org/wiki/IOPS.
Quite often SSD disks are released with firmware bugs or with non-optimal configurations, so before putting the system in production checkout the latest version.
STEP 2:
Check that AHCI
- Advanced Host Controller Interface is enabled and working:
$ sudo dmesg | grep -i ahci
ahci 0000:00:11.0: version 3.0
ahci 0000:00:11.0: irq 43 for MSI/MSI-X
ahci 0000:00:11.0: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA mode
ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
Check whether your controller supports AHCI
:
$ sudo lshw | grep -i ahci
product: 82801JI (ICH10 Family) SATA AHCI Controller
capabilities: storage msi pm ahci_1.0 bus_master cap_list emulated
configuration: driver=ahci latency=0
Quite often the AHCI
is disabled in BIOS, in this case reboot and enable it.
I observed unstable behavior of disks without AHCI
enabled and even the inability to execute TRIM
correctly.
Identify the type of SATA modes available (for ex. SATA-II: 3Gbps gives the theoretical limit of speed 375MB/s)
$ sudo dmesg | grep SATA
ahci 0000:00:11.0: AHCI 0001.0200 32 slots 6 ports 3 Gbps 0x3f impl SATA mode
ata1: SATA max UDMA/133 abar m1024@0xfddffc00 port 0xfddffd00 irq 43
Check what is supported by disk
$ sudo hdparm -I /dev/sda | grep SATA
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
The ideal would be any revision higher than SATA Rev 3.0
which guaranties the 6Gbps or higher speeds.
STEP 2.5: Pause
STEP 3:
Check that disk TRIM
(wikipedia.org/wiki/Trim) works fine
$ sudo hdparm -I /dev/sda | grep -i trim
* Data Set Management TRIM supported (limit 8 blocks)
It is very important to have TRIM
functioning. Without TRIM
the disk speed will degrade with time due to the fact that the SSD will have to erase the cell before every write operation.
Let’s test it simply by executing fstrim
$ sudo fstrim -v /
/: 98147174400 bytes were trimmed
Lets test whether the TRIM
is really doing what it should
$ sudo wget -O /tmp/test_trim.sh "https://sites.google.com/site/lightrush/random-1/checkiftrimonext4isenabledandworking/test_trim.sh?attredirects=0&d=1"
$ sudo chmod +x /tmp/test_trim.sh
$ sudo /tmp/test_trim.sh <tempfile> 50 /dev/sdX
If TRIM
is properly working the result of the last command should be a bunch of zeros, thanks to Nicolay Doytchev for this script.
STEP 4:
Be sure to format disk in the SSD friendly file system: EXT4
, F2FS
, BTRFS
, XFS
or any other from this list File systems optimized for flash memory.
If not, you’ll better migrate it to EXT4
at least with the help of this manual
Migrating a live system from ext3 to ext4 filesystem or consider the complete
re-installation of the operating system.
STEP 5:
Mount disk with correct parameters
$ cat /etc/fstab
/dev/pve/data /var/lib/vz ext4 discard,noatime,commit=600,defaults 0 1
Check whether the disk is identified as non rotational
$ sudo for f in /sys/block/sd?/queue/rotational; do printf "$f is "; cat $f; done
/sys/block/sda/queue/rotational is 0
If you see 1
on SSD, that means there are some problem with kernel or AHCI
The next is to check that the scheduler option is selected on deadline
for our brand new SSD drive
$ sudo for f in /sys/block/sd?/queue/scheduler; do printf "$f is "; cat $f; done
/sys/block/sda/queue/scheduler is noop [deadline] cfq
If not execute the following
$ sudo echo deadline > /sys/block/sda/queue/scheduler
References