Ceph architecture
IBM Storage Ceph cluster is a distributed data object store designed to provide excellent performance, reliability and scalability. The power of IBM Storage Ceph cluster can transform your organization’s IT infrastructure and your ability to manage vast amounts of data, especially for cloud computing platforms like Red Hat Enterprise Linux OSP. The cluster delivers extraordinary scalability–thousands of clients accessing petabytes to exabytes of data and beyond.
Distributed object stores are the future of storage, because they accommodate unstructured data, and because clients can use modern object interfaces and legacy interfaces simultaneously.
For example:
-
APIs in many languages (C/C++, Java, Python)
-
RESTful interfaces (S3/Swift)
-
Block device interface
-
Filesystem interface
At the heart of every Ceph deployment is the IBM Storage Ceph cluster. It consists of three types of daemons:
- Ceph OSD Daemon
-
Ceph OSDs store data on behalf of Ceph clients. Additionally, Ceph OSDs utilize the CPU, memory and networking of Ceph nodes to perform data replication, erasure coding, rebalancing, recovery, monitoring and reporting functions.
- Ceph Monitor
-
A Ceph Monitor maintains a master copy of the IBM Ceph Storage cluster map with the current state of the cluster. Monitors require high consistency, and use Paxos to ensure agreement about the state of the cluster.
- Ceph Manager
-
The Ceph Manager maintains detailed information about placement groups, process metadata and host metadata in lieu of the Ceph Monitor—significantly improving performance at scale. The Ceph Manager handles execution of many of the read-only Ceph CLI queries, such as placement group statistics. The Ceph Manager also provides the RESTful monitoring APIs.
Ceph client interfaces read data from and write data to the IBM Ceph Storage cluster. Clients need the following data to communicate with the IBM Ceph Storage cluster:
-
The Ceph configuration file, or the cluster name (usually
ceph) and the monitor address. -
The pool name.
-
The user name and the path to the secret key.
Ceph clients maintain object IDs and the pool names where they store the objects. However, they
do not need to maintain an object-to-OSD index or communicate with a centralized object index to
look up object locations. Then, Ceph clients provide an object name and pool name to
librados, which computes an object’s placement group and the primary OSD for
storing and retrieving data using the CRUSH (Controlled Replication Under Scalable Hashing)
algorithm. The Ceph client connects to the primary OSD where it may perform read and write
operations. There is no intermediary server, broker or bus between the client and the OSD.
Ceph OSDs store all data as objects in a flat namespace. There are no hierarchies of directories. An object has a cluster-wide unique identifier, binary data, and metadata consisting of a set of name/value pairs.
For more information, see the IBM Storage Ceph architecture chapter, within the IBM Storage Ceph Concepts and Architecture Guide Redpaper publication.