Ceph File System snapshot schedules

A Ceph File System (CephFS) can schedule snapshots of a file system directory. The scheduling of snapshots is managed by the Ceph Manager, and relies on Python Timers. The snapshot schedule data is stored as an object in the CephFS metadata pool, and at runtime, all the schedule data lives in a serialized SQLite database.

Important: The scheduler is precisely based on the specified time to keep snapshots apart when a storage cluster is under normal load. When the Ceph Manager is under a heavy load, it’s possible that a snapshot might not get scheduled immediately, resulting in a slightly delayed snapshot. If this happens, then the next scheduled snapshot acts as if there was no delay. Scheduled snapshots that are delayed do not cause drift in the overall schedule.

Scheduling snapshots for a Ceph File System (CephFS) is managed by the snap_schedule Ceph Manager module. This module provides an interface to add, query, and delete snapshot schedules, and to manage the retention policies. This module also implements the ceph fs snap-schedule command, with several subcommands to manage schedules, and retention policies. All subcommands take the CephFS volume path and subvolume path arguments to specify the file system path when using multiple Ceph File Systems. Not specifying the CephFS volume path, the argument defaults to the first file system listed in the fs_map, and not specifying the subvolume path argument defaults to nothing.

Snapshot schedules are identified by the file system path, the repeat interval, and the start time. The repeat interval defines the time between two subsequent snapshots. The interval format is a number plus a time designator: h(our), d(ay), or w(eek). For example, having an interval of 4h, means one snapshot every four hours. The start time is a string value in the ISO format, %Y-%m-%dT%H:%M:%S, and if not specified, the start time uses a default value of last midnight. For example, you schedule a snapshot at 14:45, using the default start time value, with a repeat interval of 1h, the first snapshot will be taken at 15:00.

Retention policies are identified by the file system path, and the retention policy specifications. Defining a retention policy consist of either a number plus a time designator or a concatenated pair in the format of COUNTTIME_PERIOD. The policy ensures that a number of snapshots are kept, and the snapshots are at least for a specified time period apart. The time period designators are: h(our), d(ay), w(eek), M(onth), Y(ear), and n. The n time period designator is a special modifier, which means to keep the last number of snapshots regardless of timing. For example, 4d means keeping four snapshots that are at least one day, or longer apart from each other.

For more information about snapshots, see Ceph File System snapshots.