CRUSH rules

CRUSH rules define how a Ceph client selects buckets and the primary OSD within them to store objects, and how the primary OSD selects buckets and the secondary OSDs to store replicas or coding chunks.

In some cases, you might create a rule that selects a pair of target OSDs backed by SSDs for two object replicas, and another rule that selects three target OSDs backed by SAS drives in different data centers for three replicas.

A rule takes the following form, where the values information is provided in Table 1:

rule <rulename> {

    id <unique number>
    type [replicated | erasure]
    min_size <min-size>
    max_size <max-size>
    step take <bucket-type> [class <class-name>]
    step [choose|chooseleaf] [firstn|indep] <N> <bucket-type>
    step emit
}
Table 1. CRUSH rule values
Value Description Type Required Example/Default
id A unique whole number for identifying the rule.

Purpose: A component of the rule mask.

Integer Yes. Default: 0
type Describes a rule for either a storage drive replicated or erasure coded.

Purpose: A component of the rule mask.

Valid value: replicated

String Yes. Default: replicated
min_size If a pool makes fewer replicas than this number, CRUSH will not select this rule.

Purpose: A component of the rule mask.

Integer Yes. Default: 1
max_size If a pool makes more replicas than this number, CRUSH will not select this rule.

Purpose: A component of the rule mask.

Integer Yes. Default: 10
step take <bucket-name> [class <class-name>] Takes a bucket name, and begins iterating down the tree.

Purpose: A component of the rule.

N/A Yes Example: step take data step take data class ssd
step choose firstn <num> type <bucket-type> Selects the number of buckets of the given type. The number is usually the number of replicas in the pool (that is, pool size).
  • If <num> == 0, choose pool-num-replicas buckets (all available).
  • If <num> > 0 && < pool-num-replicas, choose that many buckets.
  • If <num> < 0, it means pool-num-replicas - {num}.

Purpose: A component of the rule.

Prerequisite: Follows step take or step choose.

N/A N/A Example: step choose firstn 1 type row
step chooseleaf firstn <num> type <bucket-type> Selects a set of buckets of {bucket-type} and chooses a leaf node from the subtree of each bucket in the set of buckets. The number of buckets in the set is usually the number of replicas in the pool (that is, pool size).
  • If <num> == 0, choose pool-num-replicas buckets (all available).
  • If <num> > 0 && < pool-num-replicas, choose that many buckets.
  • If <num> < 0, it means pool-num-replicas - <num>.

Purpose: A component of the rule. Usage removes the need to select a device using two steps.

Prerequisite: Follows step take or step choose.

N/A N/A Example: step chooseleaf firstn 0 type row
step emit Outputs the current value and empties the stack. Typically used at the end of a rule, but might also be used to pick from different trees in the same rule.

Purpose: A component of the rule.

Prerequisite: Follows step choose.

Example: step emit
firstn versus indep Controls the replacement strategy CRUSH uses when OSDs are marked down in the CRUSH map. If this rule is to be used with replicated pools it should be firstn and if it is for erasure-coded pools it should be indep. N/A N/A You have a PG stored on OSDs 1, 2, 3, 4, 5 in which 3 goes down.. In the first scenario, with the firstn mode, CRUSH adjusts its calculation to select 1 and 2, then selects 3 but discovers it is down, so it retries and selects 4 and 5, and then goes on to select a new OSD 6. The final CRUSH mapping change is from 1, 2, 3, 4, 5 to 1, 2, 4, 5, 6. In the second scenario, with indep mode on an erasure-coded pool, CRUSH attempts to select the failed OSD 3, tries again and picks out 6, for a final transformation from 1, 2, 3, 4, 5 to 1, 2, 6, 4, 5.
Important: A given CRUSH rule can be assigned to multiple pools, but it is not possible for a single pool to have multiple CRUSH rules.