CRUSH rules
CRUSH rules define how a Ceph client selects buckets and the primary OSD within them to store objects, and how the primary OSD selects buckets and the secondary OSDs to store replicas or coding chunks.
In some cases, you might create a rule that selects a pair of target OSDs backed by SSDs for two object replicas, and another rule that selects three target OSDs backed by SAS drives in different data centers for three replicas.
A rule takes the following form, where the values information is provided in Table 1:
rule <rulename> {
id <unique number>
type [replicated | erasure]
min_size <min-size>
max_size <max-size>
step take <bucket-type> [class <class-name>]
step [choose|chooseleaf] [firstn|indep] <N> <bucket-type>
step emit
}
| Value | Description | Type | Required | Example/Default |
|---|---|---|---|---|
id |
A unique whole number for identifying the rule. Purpose: A component of the rule mask. |
Integer | Yes. | Default: 0 |
type |
Describes a rule for either a storage drive replicated or erasure coded. Purpose: A component of the rule mask. Valid value: |
String | Yes. | Default: replicated |
min_size |
If a pool makes fewer replicas than this number, CRUSH will not select this rule. Purpose: A component of the rule mask. |
Integer | Yes. | Default: 1 |
max_size |
If a pool makes more replicas than this number, CRUSH will not select this rule. Purpose: A component of the rule mask. |
Integer | Yes. | Default: 10 |
step take <bucket-name> [class <class-name>] |
Takes a bucket name, and begins iterating down the tree. Purpose: A component of the rule. |
N/A | Yes | Example: step take data
step take data class ssd |
step choose firstn <num> type <bucket-type> |
Selects the number of buckets of the given type. The number is usually the number of replicas
in the pool (that is, pool size).
Purpose: A component of the rule. Prerequisite: Follows |
N/A | N/A | Example: step choose firstn 1 type row |
step chooseleaf firstn <num> type <bucket-type> |
Selects a set of buckets of {bucket-type} and chooses a leaf node from the
subtree of each bucket in the set of buckets. The number of buckets in the set is usually the number
of replicas in the pool (that is, pool size).
Purpose: A component of the rule. Usage removes the need to select a device using two steps. Prerequisite: Follows |
N/A | N/A | Example: step chooseleaf firstn 0 type row |
step emit |
Outputs the current value and empties the stack. Typically used at the end of a rule, but
might also be used to pick from different trees in the same rule. Purpose: A component of the rule. Prerequisite: Follows |
Example: step emit |
||
firstn versus indep |
Controls the replacement strategy CRUSH uses when OSDs are marked down in the CRUSH map. If
this rule is to be used with replicated pools it should be firstn and if it is for
erasure-coded pools it should be indep. |
N/A | N/A | You have a PG stored on OSDs 1, 2, 3, 4, 5 in which 3 goes down.. In the first scenario, with
the firstn mode, CRUSH adjusts its calculation to select 1 and 2, then selects 3
but discovers it is down, so it retries and selects 4 and 5, and then goes on to select a new OSD 6.
The final CRUSH mapping change is from 1, 2, 3, 4, 5 to 1, 2, 4, 5, 6. In the second
scenario, with indep mode on an erasure-coded pool, CRUSH attempts to select the
failed OSD 3, tries again and picks out 6, for a final transformation from 1, 2, 3, 4, 5 to
1, 2, 6, 4, 5.Important: A given CRUSH rule can be assigned to multiple pools,
but it is not possible for a single pool to have multiple CRUSH rules. |