Storage strategies

Edit online

A storage strategy is a method of storing data that serves a particular use case. Creating storage strategies for IBM Storage Ceph clusters includes creating CRUSH hierarchies, estimating the number of placement groups, determining which type of storage pool to create, and managing pools.

An example of using storage strategy is if you need to store volumes and images for a cloud platform like OpenStack, you might choose to store data on reasonably functional SAS drives with SSD-based journals. By contrast, if you need to store object data for an S3- or Swift-compliant gateway, you might choose to use something more economical, like SATA drives. IBM Storage Ceph accommodates both scenarios in the same cluster. However, you need a means of providing the SAS/SSD storage strategy to the cloud platform, for example, Glance and Cinder in OpenStack, and for providing SATA storage for your object store.

Interacting with the Ceph storage cluster is made simple from the perspective of the Ceph client. Storage strategies are invisible to the Ceph client in all but the storage capacity and performance.

The interface enables the Ceph client to select one of the defined storage strategies. The interaction is as follows:

Connect to the cluster.
Create a pool I/O context.

Figure 1 shows the logical data flow starting from the client into the IBM Storage Ceph cluster.

Figure 1. Ceph storage architecture data flow

Storage strategies include:

Storage media (hard disk drives, SSDs, and the rest).
The CRUSH maps that set up performance and failure domains for the storage media.
The number of placement groups.
The pool interface.

IBM Storage Ceph supports multiple storage strategies. Use cases, cost/benefit performance tradeoffs, and data durability are the primary considerations that drive storage strategies.

Use cases: Ceph provides massive storage capacity, and it supports numerous use cases. For example, the Ceph Block Device client is a leading storage backend for cloud platforms like OpenStack that provides limitless storage for volumes and images with high-performance features like copy-on-write cloning. Likewise, Ceph can provide container-based storage for OpenShift environments. By contrast, the Ceph Object Gateway client is a leading storage backend for cloud platforms that provides RESTful S3-compliant and Swift-compliant object storage for objects like audio, bitmap, video, and other data.
Cost/benefit performance tradeoff: Bigger is better. High durability is better. However, there is a price for each superlative quality, and a corresponding cost/benefit tradeoff. Consider the following use cases from a performance perspective:; SSDs can provide fast storage for relatively small amounts of data and journaling. Storing a database or object index might benefit from a pool of fast SSDs, but prove too expensive for other data. SAS drives with SSD journaling provide fast performance at an economical price for volumes and images. SATA drives without SSD journaling provide cheap storage with lower overall performance. When you create a CRUSH hierarchy of OSDs, you need to consider the use case and an acceptable cost/performance tradeoff.
Durability: In large-scale clusters, hardware failure is an expectation, not an exception. However, data loss and service interruption remain unacceptable. For this reason, data durability is important . Ceph addresses data durability with multiple deep copies of an object or with erasure coding and multiple coding chunks. Multiple copies or multiple coding chunks present an extra cost/benefit tradeoff. It is cheaper to store fewer copies or coding chunks, but it might lead to the inability to service write requests in a degraded state. Generally, one object with two more copies (that is, size = 3) or two coding chunks might allow a cluster to service writes in a degraded state while the cluster recovers. The CRUSH algorithm aids this process by ensuring that Ceph stores more copies or coding chunks in different locations within the cluster. This ensures that the failure of a single storage device or node does not lead to a loss of all the copies or coding chunks necessary to preclude data loss.

You can capture use cases, cost/benefit performance tradeoffs and data durability in a storage strategy and present it to a Ceph client as a storage pool.

Important: Ceph’s object copies or coding chunks make RAID obsolete. Do not use RAID because IBM Storage Ceph already handles data durability. A degraded RAID has a negative impact on performance and recovering data by using RAID is substantially slower than using deep copies or erasure coding chunks.