Strategies for disaster protection

IBM Spectrum Protect provides strategies to protect data if a disaster occurs. These strategies include node replication to a remote site, storage pool protection, database backups, moving backup tapes offsite, and device replication to a standby server.

Replication to a remote site

Node replication is the process of incrementally copying data from one server to another server. The server from which client data is replicated is called a source replication server. The server to which client data is replicated is called a target replication server. For the purposes of disaster protection, the target replication server is on a remote site. A replication server can function as a source server, a target server, or both. You use replication processing to maintain the same level of files on the source and the target servers.

Node replication provides for immediate availability of data through failover. Although node replication protects most of the metadata, this approach does not provide adequate protection for database damage. You can provide more comprehensive protection by using storage pools to store data backups.

Advantages

Failover so that data is available immediately if a disaster occurs.
Incremental replication, which results in fast transmission of data.
Electronic transfer
Protects both data and most metadata

Disadvantages

Both data and metadata must be recovered.
Data on the source server must be replicated again from the remote site.

Figure 1 shows the node replication process to a remote site.

Illustration shows the node replication process. — Figure 1. Node replication process

When client data is replicated, data that is not on the target server is copied to the target server. When replicated data exceeds the retention limit, the target server automatically removes the data from the source server. To maximize data protection, you synchronize the local server and the remote server; for example, Site B replicates data from Site A and Site A replicates data from Site B. As part of replication processing, client data that was deleted from the source server is also deleted from the target server.

IBM Spectrum Protect provides the following replication functions:

You can define policies for the target server in the following ways:
- Identical policies on the source server and target server
- Different policies on the source server and target server to meet different business requirements.
If a disaster occurs and the source server is not available, clients can recover data from the target server. If the source server cannot be recovered, you can direct clients to store data on the target server. When an outage occurs, the clients that are backed up to the source server can automatically fail over to restore their data from the target server.
You can use replication processing to recover damaged files from storage pools. You must replicate the client data to the target server before the file damage occurs. Subsequent replication processes detect damaged files on the source server and replace the files with undamaged files from the target server.

Role of replication in disaster protection

If a disaster occurs, you can recover replicated data from the remote site and maintain the same level of files on the source and target servers. You use replication to achieve the following objectives:

Control network throughput by scheduling node replication at specific times
Recover data after a site loss.
Recover damaged files on the source server.

Storage pool protection

As part of a disaster recovery strategy, ensure that a backup copy of data in storage pools is available at a remote site.

Advantages

Fast recovery and rebuild of the source system.

Disadvantages

Only data is protected; metadata is not protected.
For each storage pool, you must define the storage medium.

You use different techniques to protect against the permanent loss of data that is stored in container storage pools and in FILE and DISK storage pools.

Directory-container storage pools

If you do not need to replicate all the data that is contained in a client node, you use container-copy storage pools to protect some directory-container storage pools. By protecting a directory-container storage pool, you do not use resources that replicate existing data and metadata, which improves server performance.

The preferred method is to protect the directory-container storage pool before you replicate the client node. When node replication is started, the data extents that are already replicated through storage pool protection are skipped, which reduces the replication processing time. If the data in a directory-container storage pool becomes damaged, you can repair the data from a copy in a container-copy storage pool.

Container-copy storage pools

You protect directory-container storage pools by copying the data in the directory-container storage pool to container-copy storage pools. Use container-copy storage pools to create up to two tape copies of a directory-container storage pool. The tape copies can be stored onsite or offsite. Damaged data in directory-container storage pools can be repaired by using container-copy storage pools. Container-copy storage pools provide an alternative to using a replication server to protect data in a directory-container storage pool.

Storage pools that are associated with FILE and DISK device classes

For storage pools that are associated with FILE and DISK device classes, you use node replication to maintain a node-consistent copy of the data at the target server. The data copy can be directly restored from the target server to the storage pools.

Database backups

You use database backups to recover your system following database damage. Also, database backup operations must be used to prevent Db2® from running out of archive log space. Database backup operations are not part of node replication. A database backup can be full, incremental, or snapshot. To provide for disaster recovery, a copy of the database backups must be stored offsite. To restore the database, you must have the backup volumes for the database. You can restore the database from backup volumes by either a point-in-time restore or a most current restore operation.

Point-in-time restore

Use point-in-time restore operations for situations such as disaster recovery or to remove the effects of errors that can cause inconsistencies in the database. Restore operations for the database that use snapshot backups are a form of point-in-time restore operation. The point-in-time restore operation includes the following actions:

Removes and re-creates the active log directory and archive log directory that are specified in the dsmserv.opt file.
Restores the database image from backup volumes to the database directories that are recorded in a database backup or to new directories.
Restores archive logs from backup volumes to the overflow directory.
Uses log information from the overflow directory up to a specified point in time.

Most current restore

If you want to recover the database to the time when the database was lost, recover the database to the most current state. The most current restore operation includes the following actions:

Restores a database image from the backup volumes to the database directories that are recorded in a database backup or to new directories.
Restores archive logs from backup volumes to the overflow directory.
Uses log information from the overflow directory and archive logs from archive log directory.

The most current restore does not remove and re-create the active log directory or archive log directory.

Alternative methods for disaster protection

In addition to replication, storage pool protection, and database backups, you can also use the following methods to protect data and implement disaster recovery with IBM Spectrum Protect:

Sending backup tapes to a remote site: Data is backed up to tape at scheduled times by the source server. The tapes are sent to a remote site. If a disaster occurs, the tapes are returned to the site of the source server and the data is restored on the source clients. Offsite copies of data on backup tape can also help you to recover from ransomware attacks.
Multisite appliance replication to a standby server: In the multisite appliance configuration, the source appliance is replicated to a remote server in a SAN architecture. In this configuration, if the client hardware at the original site is damaged, the source device can be replicated from the standby server at the remote site. This configuration provides disk-based backup and restore operations.

Comparison of protection configuration strategies

Consider the following potential data-loss scenarios:

Database data is damaged: protect against loss of data in the database by using onsite database backup.
Storage pool data is damaged: protect against loss of data in storage pools by using onsite copy storage pools or node replication.
Disaster scenario where both the onsite database and storage pools are lost: protect against a full disaster by using node replication and both off-site database backup and storage pool backup copies.

The following possible configurations address the most common data protection scenarios:

Configurations for damage protection only

Implement database backup operations onsite with an optional container-copy storage pool onsite to protect data in directory-container storage pools.
Implement database backup operations onsite and node replication onsite.

Configurations for disaster recovery and damage protection

Implement database backup operations offsite with container-copy storage pools offsite to protect data in directory-container storage pools.
Implement database backup operations onsite and node replication offsite with an optional container-copy storage pool onsite for faster recovery of damaged data.