Ceph client data striping
Storage devices have throughput limitations, which impact performance and scalability. As a result, storage systems often support striping to increase throughput and performance. Striping is the storing sequential pieces of information across multiple storage devices. The most common form of data striping comes from RAID. The RAID type most similar to Ceph’s striping is RAID 0, or a striped volume. Ceph’s striping offers the throughput of RAID 0 striping, the reliability of n-way RAID mirroring and faster recovery.
librados must perform the striping, and parallel I/O for
themselves to obtain these benefits.The simplest Ceph striping format involves a stripe count of 1 object. Ceph Clients write stripe units to a Ceph Storage Cluster object until the object is at its maximum capacity, and then create another object for additional stripes of data. The simplest form of striping may be sufficient for small block device images, S3 or Swift objects. However, this simple form doesn’t take maximum advantage of Ceph’s ability to distribute data across placement groups, and consequently does not improve performance very much. The following diagram depicts the simplest form of striping:
In the following diagram, client data gets striped across an object set (object set
1 in the following diagram) consisting of 4 objects, where the first stripe unit is
stripe unit 0 in object 0, and the fourth stripe unit is
stripe unit 3 in object 3. After writing the fourth stripe, the
client determines if the object set is full. If the object set is not full, the client begins
writing a stripe to the first object again, see object 0 in the following diagram.
If the object set is full, the client creates a new object set, see object set 2 in
the following diagram, and begins writing to the first stripe, with a stripe unit of 16, in the
first object in the new object set, see object 4 in the diagram.
Three important variables determine how Ceph stripes data:
- Object Size
-
Objects in the Ceph Storage Cluster have a maximum configurable size, such as 2 MB, or 4 MB. The object size should be large enough to accommodate many stripe units, and should be a multiple of the stripe unit.
Important: IBM recommends a safe maximum value of 16 MB. - Stripe Width
-
Stripes have a configurable unit size, for example 64 KB. The Ceph Client divides the data it will write to objects into equally sized stripe units, except for the last stripe unit. A stripe width should be a fraction of the Object Size so that an object may contain many stripe units.
- Stripe Count
-
The Ceph Client writes a sequence of stripe units over a series of objects determined by the stripe count. The series of objects is called an object set. After the Ceph Client writes to the last object in the object set, it returns to the first object in the object set.