Add Storage Pool

Create a storage pool, which represents a set of disk or tape volumes, directories, or cloud-based storage space.

The following pages are shown in the wizard:

Type page

Select a storage pool type from one of the following categories:

General

Select this option to create various general-purpose pools. This type of storage pool can represent tape volumes, disk directories, or cloud storage. The general-purpose storage pools use either container-based storage or traditional volume-based storage. Container-based storage provides optimized inline data deduplication and compression. Container storage pools create logical containers for storage pool data. The containers are stored in file system directories or in cloud storage. Traditional volume-based storage uses device classes to identify the storage devices that can be used to store data that is backed up to the pool.

If you select the General storage pool type, you are later prompted to select the specific type of container storage pool or traditional volume-based storage pool.

You can select the following types of container storage pools:

Directory

Select this option to configure a directory-container storage pool. In this type of storage pool, containers are created in one or more file system directories that you identify during configuration. The file system directories map to one or more disk devices. By using directory-container storage pools, you remove the need for volume reclamation, which can improve server performance and reduce the cost of storage hardware.

Data that is stored in directory-container storage pools uses either inline data deduplication or client-side data deduplication. Inline data deduplication in a directory-container storage pool is different from the data deduplication that is available for storage pools that use the FILE device class. Although both approaches provide server-side data deduplication, directory-container pools are easier to manage and typically provide the best performance.

If you plan to replicate client data, for example in a multisite disk configuration, use directory-container pools to reduce the amount of data that is transferred between replication sites.

On-premises cloud

A cloud-container storage pool represents object-based cloud storage. Select this option to configure a cloud-container storage pool when the physical location of the cloud is on premises. An on-premises cloud is managed by internal IT staff in your data center. For example, IBM Cloud® Object Storage and other certified S3 providers. Data that is stored in cloud-container storage pools uses either inline data deduplication or client-side data deduplication. You can configure a cloud-container storage pool to temporarily store data in one or more local file system directories during data ingestion. The data is then moved from local storage to the cloud.

Off-premises cloud

Select this option to configure a cloud-container storage pool when the physical location of the cloud is off premises in a vendor-supplied cloud. For example, IBM Cloud Object Storage, Amazon S3, Microsoft Azure, or Google Cloud Storage. Data that is stored in cloud-container storage pools uses either inline data deduplication or client-side data deduplication. You can configure a cloud-container storage pool to temporarily store data in one or more local file system directories during data ingestion. The data is then moved from local storage to the cloud.

You can select the following types of traditional volume-based storage pools:

Disk (primary): Select this option to configure a primary storage pool for storage on disk. The storage pool is a set of volumes that the server uses to store backup versions of files, files that are archive copies, and files that are migrated. Depending on the device class that you select during configuration, data is stored in random access disk blocks or in sequential volumes. Select a DISK device class to create a random-access storage pool, which stores data in random access disk blocks. Select a FILE device class to create a sequential-access storage pool, which stores data in sequential volumes. A FILE device class specifies one or more file system directories in which the server can create files to store client data. A file is a form of sequential-access media.
Tape (primary): Select this option to configure a primary sequential-access storage pool for storage on tape or in a virtual tape library (VTL). During configuration, select a device class that represents the tape storage device or VTL. The wizard defines a sequential-access storage pool for the volumes that the server uses to store backup versions of files, files that are archive copies, and files that are migrated.
Tape (copy): Select this option to configure a copy storage pool for data that is backed up to tape from primary storage pools. During configuration, select a device class that represents the tape storage device or VTL. The wizard defines a copy storage pool for storing copies of files that are in primary storage pools. Copy storage pools are used only to back up the data that is stored in primary storage pools. A copy storage pool cannot be a destination for a backup copy group, an archive copy group, or a management class.
Use copy storage pools to have a copy of active and inactive data that you can restore to a primary storage pool after a disaster or outage. You can move the volumes of copy storage pools off site and still have the server track the volumes. Moving these volumes off site provides a means of recovering from an onsite disaster.

Retention

Select this option to configure a retention storage pool. A retention storage pool can be used only for storing retention set data on tape or in cloud object storage.

If you are storing retention sets on tape, a retention storage pool represents 3592 tape devices, LTO tape devices, or StorageTek drives.
If you are storing data in cloud object storage, a retention storage pool represents a supported cloud object storage environment (https://www.ibm.com/support/pages/ibm-spectrum-protect-cloud-object-storage-support).

A retention storage pool has an associated retention-copy storage rule, which is automatically created when you define the pool. The retention-copy storage rule runs once each day to copy retention set data from primary storage to the retention storage pool. You specify the daily start time for the retention-copy storage rule when you configure the retention storage pool.

Object Client

Select this option to configure a cold-data-cache storage pool. A cold-data-cache storage pool consists of one or more file system directories on disk. It used only by object clients as a temporary staging area for sequential volumes during tape backup and restore operations. It is an intermediary storage pool between the object client and a tape device or VTL. It is linked to the primary sequential access storage pool that represents the tape device or VTL. During configuration, you identify one or more existing file system directories for temporary disk storage and you identify the primary sequential access storage pool that represents the tape device or VTL.

Identity page

Specify a name for the storage pool and the server on which to define the storage pool.

If you are adding a retention storage pool, you also specify the following information:

Collocation

When collocation is enabled for a retention storage pool that specifies a tape-based device class, the server tries to keep files for each entity of the selected type on a minimal number of tape volumes. If the retention storage pool specifies a cloud device class, specifying collocation has no effect.

The value that you select for this property affects how a retention set's data is spread across tape volumes. Therefore, it affects the time that it takes to write a retention set to tape and the time it takes to restore data if necessary. In general, there is a tradeoff between the performance of writing a retention set to tape and the performance of restoring data from the retention set. This value affects the number of tape volumes that hold a particular retention set's data. If data from a retention set that is stored offsite must be restored, the value therefore affects the number of volumes that must be retrieved from the offsite location.

This property can have the following values:

NODE: Data is collocated by client node. The server attempts to put data for each client node on as few volumes as possible. For each client node, the server first attempts to use a volume that already contains data from the same client node. If no volume already contains data from the same client node, the server attempts to use an empty volume or the volume with the most available free space.
FILESPACE: Data is collocated by each file space of each client node. File spaces of a client node represent individual virtual machines. The server attempts to place data for each file space on as few volumes as possible. For each file space, the server first attempts to use a volume that already contains data from the same file space. If no volume already contains files from the same file space, the server attempts to use an empty volume, or a volume that contains data from the same client node. If no volumes fit those criteria, the server attempts to use the volume with the most available free space.
GROUP: Data is collocated by collocation group, which is a group of client nodes or a group of file spaces on a particular client node. The server attempts to put data for the client nodes or the file spaces that belong to the same collocation group on as few volumes as possible. However, because collocation is at the group level, data for individual client nodes or file spaces can be spread across the volumes. If you specify this value, but do not define any collocation groups, or if you do not add nodes or file spaces to a collocation group, data is collocated by node.
NO: Collocation is not enabled. The server attempts to use all available space on each volume before it selects a new volume. The server writes data to volume without regard for keeping data for a particular entity together. Although this process can provide fuller use of individual volumes, data for individual entities might be spread across many volumes. If data needs to be restored from a retention set, more tape mounts might be required.

A retention storage pool has an associated retention-copy storage rule, which is created when the storage pool is created. The storage rule defines a daily processing window during which retention sets can be copied to the storage pool. Specify the following information to define this processing window. You can later modify the processing window for the storage rule from the Details notebook for the storage rule.

Daily start time: Schedule the daily start time to minimally impact other scheduled tasks, such as backup or replication operations, that require server and network resources.
Max run time: Specifies how long the server processes the storage rule, or whether processing continues until all eligible data is copied. To copy all eligible data each day when the storage rule runs, select the No limit check box. When no limit is specified, processing runs to completion. To limit the processing window for the storage rule, specify a maximum run time. When a maximum run time is specified, the server stops copying data when the limit is reached. You might want to set a maximum run time to prevent storage rule processing from affecting the performance of other scheduled tasks, such as backup or replication operations.

Directories page (shown only for directory-container pools)

Specify existing file system directories for disk storage. The storage pool allocates space in these directories for containers and writes data to the container. When a container is full, a new container is dynamically allocated. In this way, directories can grow to use more file system space.

Enter a fully qualified path name that conforms to the syntax that is used by the server operating system. For example, enter c:\temp\dir1\ for Microsoft Windows or /tmp/dir1/ for UNIX.

For directory-container pools, the directories define the storage pool size. The pool size is determined by the capacity of the directory file systems.

Data that is written to a container pool is distributed across the directories that you specify. If you distribute data across the available directories, it can improve I/O performance, but only if the storage pool directories map to different physical disks. You must specify at least one storage pool directory. Each additional physical disk that you identify can increase parallelism and improve I/O performance.

Container storage pools do not use device classes, so you must ensure that the disk devices have similar availability, storage, and performance characteristics.

Restriction: Because storage pool directories can grow to use more space on the file system, do not specify a directory that is on the same physical file system as the server database and logs.

You can later add storage pool directories by going to the Directories page for the storage pool (Storage Pools > Details > Directories).

Cold Data Cache page (shown only for cold-data-cache pools)

A cold data cache is intermediate storage on disk between an object client and a primary tape storage pool. It is a set of one or more file system directories for temporarily holding object data during tape backup and recovery operations. The object data is stored in sequential volumes in the file system directories.

An object client can copy infrequently accessed data, or cold data, to physical tape media or to a VTL. When an object client copies cold data, the data is first stored in the cold data cache. The data is then migrated, without a migration delay, to the primary tape storage pool that represents the physical tape media or VTL. After the data is migrated to tape, it is deleted from the cold data cache.

The cold data cache is also used as a staging area for restoring cold data to the object client. During restore operations, the data is copied to the cold data cache. The data remains in the cold data cache for a period that is specified by the object client. Data is restored to the object client from the cold data cache, and not directly from the tape or VTL.

To create the cold data cache, specify one or more existing file system directories for disk storage. Enter a fully qualified path name that conforms to the syntax that is used by the server operating system. For example, enter c:\temp\dir1\ for Microsoft Windows or /tmp/dir1/ for UNIX. If the server needs to allocate a scratch volume, it creates a new file in one of specified directories. To optimize performance, if you specify multiple directories, ensure that the directories correspond to separate physical volumes. Although the cold data cache is temporary storage, it must be large enough to hold the data that is copied from the object client before the data is migrated to tape. It must also be large enough to hold data during restore operations for the period that is specified by the object client.

Protect Pool page (shown only for directory-container pools in Version 8.1.12 and earlier, when replication is enabled)

Storage pool protection copies backup data to the target server without the associated metadata that is used to manage the data. As a result, data can be transferred more quickly than with client replication alone, which transfers both data and metadata.

Running storage pool protection typically improves replication performance and enables the automatic repair of damaged files on the source server.

Restriction: Until client replication runs and the metadata is synchronized, data that was transferred by storage pool protection cannot be restored from the target server.

The Operations Center creates up to four administrative schedules to run storage pool protection at times when client replication is not scheduled to run. These schedules are named PROTECT1 - PROTECT4.

Tips:

If only one directory-container pool is defined on the target server, the Operations Center uses this pool for storage pool protection. To use a different protection pool, close the wizard, create the new pool, and start the wizard again.
This page is not shown if the server is configured for replication but a standard schedule does not exist. To use this wizard to configure storage pool protection for a directory-container pool, you must use a single replication schedule named REPLICATE, with a value specified for the MAXRUNTIME parameter.

Container Copy page (shown only for directory-container pools in Version 8.1.10 and earlier, when a tape device is available and replication is not configured)

Container-copy storage pools provide a tape-based alternative to using replication to protect directory-container storage pools.

Restriction: Tape copies from container-copy storage pools can be used to repair minor to moderate storage pool damage. For example, you can repair damaged containers or directories. You can use the following methods for complete disaster recovery protection:

Copy storage rules that back up data from container storage pools to tape storage.
Replication to directly restore client data from the target server if the source server is unavailable.

For information about using tape copies for disaster recovery protection, see Determining whether to use container-copy storage pools for disaster protection in the IBM Spectrum Protect documentation.

Tips:

For servers that are running version 8.1.11 or later, the Operations Center introduces copy storage rules as an alternative method of backing up client data directory-container storage pools. If the storage pool is defined on a V8.1.11 or later server, use copy storage rules for copying data to tape. Copy storage rules have better recovery-time objectives by eliminating the requirement to restore the directory-container storage pool before you can restore data to clients. If the spoke server is running version 8.1.10 or earlier, use container-copy storage pools.
If you create multiple container-copy pools on the same server, all the pools use the same schedule settings.
To change the schedule after you create a container-copy pool, use the UPDATE SCHEDULE command. The schedule that is created by the Operations Center is named CONTAINER_COPY.
By default, tape reclamation is enabled for the new storage pool. You can change this setting on the Properties page for the storage pool (Storage Pools > Details > Properties).

Copy Storage Rule (shown only for directory-container pools in Version 8.1.11 and later)

Copy storage rules define your organization's policies for copying client data to tape storage. Data is backed up to directory-container storage pools and then copied to tape. By specifying copy storage rules, you can control the scheduling of data backups to tape and the amount of data that is copied. To create a copy storage rule, ensure that the spoke server is version 8.1.11 or later. Otherwise, create a container-copy storage pool.

If you add a copy storage rule, you can specify the following information:

Backup this pool

You can specify one of the following options for the directory-container storage pool:

Use a new copy rule: Use a new copy rule to back up data to the directory-container storage pool.
Use an existing copy rule: Use an existing copy rule to back up data to the directory-container storage pool.
Do not backup: Do not back up the data to the directory-container storage pool.

Overflow page (shown only for directory-container pools)

Directory-container pools store data in one or more file system directories. If the directories use all the available file system space, no further data can be stored. To avoid backup interruptions, select another storage pool to which files are stored when the directory-container pool is full.

Restrictions:

To prevent the overflow pool from also using all available space on a file system, the overflow pool cannot be another directory pool. Only cloud pools and primary storage pools with a random-access or sequential-access device class are listed.
If you select a volume-based pool, data is not automatically deduplicated in the overflow pool. This can affect performance in a replicated environment.

Credentials page (shown only for cloud-container pools)

Specify the connection information that the server uses to transfer data to and from cloud storage. From the Cloud Connection list, either select an existing connection that already contains the credentials or select the option to create a new cloud connection. If you are creating a connection, then, depending on the Cloud type that you select, the following properties might be available:

Bucket name

The term bucket refers to an Amazon S3 or Google Cloud Storage bucket or an IBM Cloud Object Storage vault. For the Amazon S3 or IBM Cloud Object Storage cloud types, you can enter the name of an existing bucket or allow the Operations Center to create a new bucket. For the Google Cloud Storage cloud type, you can enter the name of an existing bucket or create a new bucket with your cloud provider.

Region

For the Amazon S3 cloud type, the region that you select can affect the cost and performance of backing up data. Select the correct region for your geographical location and verify the associated URL. If the correct URL is not shown, select Other (Specify) and manually enter the URL. Do not include a bucket name in the URL.

SAS token

For the Microsoft Azure cloud type, you must specify a shared access signature (SAS) token to grant the storage pool permission to your storage account. In this field, supply the token string that you obtained from Microsoft Azure. Although the server verifies that the SAS token is valid when the storage pool is defined, neither the server nor the Operations Center monitors the expiration date of the token. When the SAS token expires, the storage pool loses access to your storage account. To prevent the storage pool from losing access to your storage account, specify a new valid SAS token before the current token expires. You can specify a new SAS token on the Properties page for the storage pool (Storage Pools > Details > Properties). To avoid having to frequently create new SAS tokens, you can configure tokens with later expiration dates.

Cloud read cache

A cloud read cache can help improve performance when you restore a large amount of data from the cloud-container storage pool. When a cloud read cache is enabled, the system can download a copy of cloud-container object data that is being restored, and then stage a copy of that data locally in the cloud cache. As a result, the cloud cache can store both ingested data and read cache data.

The On and On, Prefer ingest settings both enable a cloud read cache. However, with the On, Prefer ingest setting, if the system detects a low-space condition when data is ingested, read cache data is removed from the cloud cache so that the space can instead be used for ingested data.

Cloud data lock

Cloud data lock leverages the Write Once Read Many (WORM) capabilities of your cloud storage provider to help prevent containers from being deleted or overwritten. When cloud data lock is enabled, containers that are written to the cloud-container storage pool are considered immutable, or locked, for the duration you specify. This locking is designed to protect the object from unintended changes or deliberate tampering, and can help satisfy data retention requirements.

Cloud data lock applies only to data after it is written to cloud storage. To improve backup performance, a local directory might be assigned to the storage pool to temporarily store data on disk before the data is moved to cloud storage. While the data is temporarily stored in this local directory cache, data lock is not applied.

When you enable cloud data lock, you must also specify a lock duration. The lock duration specifies the number of days that each object must be remain locked after it is written to the storage pool. If you do not set a lock duration that is greater than 0, then cloud data lock is effectively not enabled. When the server writes a container to cloud storage, a lock expiration date for the container is calculated based on the lock duration. You can view the lock expiration date for all containers from the Containers page of the storage pool's Details notebook (Storage Pools > Details > Containers). You can later update the lock duration for the storage pool from the Properties page of the storage pool's Details notebook (Storage Pools > Details > Properties).

For the Amazon S3 cloud type, cloud data lock leverages the S3 Object Lock feature. The bucket that you specify by using the Add Storage Pool wizard must already exist and S3 Object Lock must be enabled for the bucket. S3 Object Lock can be enabled only when the bucked is created, and you cannot later disable S3 Object Lock. An Object Expiration rule must not be assigned to the bucket.

You should define a default retention period for the bucket that specifies the same number of days as the lock duration. Although the bucket's default retention period and the lock duration should match, the server calculates and specifies the lock expiration for a container object explicitly, based solely on the lock duration.

The retention mode that is specified on the bucket determines the rigidity of the object lock. If governance mode is specified for the bucket, then some users with special permissions can delete or overwrite container objects. If compliance mode is specified for the bucket, then no users are able to delete or overwrite objects until the lock expires.

For more information about the Object Lock feature, see to the Amazon S3 documentation.

For on-premises cloud pools, you can select from a list of cloud service providers or use another service provider that is validated for use with IBM Spectrum Protect. To use another service provider, you must specify an existing bucket name in the Add Storage Pool wizard. For information about validated cloud service providers, see the support website for your operating system:

By default, the storage pool is created with read/write access. You can change this setting by using the UPDATE STGPOOL command after you complete the wizard.

For more information about configuring cloud storage, see Configuring a cloud-container storage pool for data storage in the IBM Spectrum Protect documentation.

Local Storage page (shown only for cloud-container pools)

Enter a fully qualified path name that conforms to the syntax that is used by the server operating system. For example, enter c:\temp\dir1\ for Microsoft Windows or /tmp/dir1/ for UNIX.

For cloud-container pools, the directories define the size of a disk cache that is used to optimize data transfer to the cloud. The cache size is determined by the capacity of the directory file systems. The data is only held temporarily on disk. After the data is transferred to the cloud, it is deleted from the directory cache. For this reason, less file system capacity is required for cloud-container pools than is required for directory-container pools.

Data that is written to a container pool is distributed across the directories that you specify. If you distribute data across the available directories, it can improve I/O performance, but only if the storage pool directories map to different physical disks. If the cloud-container pool is not being used as a target pool for tiering, specify at least one storage pool directory. Each additional physical disk that you identify can increase parallelism and improve I/O performance.

Container storage pools do not use device classes, so you must ensure that the disk devices have similar availability, storage, and performance characteristics.

Restriction: Because storage pool directories can grow to use more space on the file system, do not specify a directory that is on the same physical file system as the server database and logs.

You can later add storage pool directories to the disk cache by going to the Directories page for the storage pool (Storage Pools > Details > Directories).

Protect encryption keys page (shown only for cloud-container pools and retention storage pools that write data to cloud storage)

You must encrypt storage pool data in off-premises cloud-container pools. You can optionally encrypt data in on-premises cloud-container pools.

Encryption is managed by the server, and requires no administrator intervention. The master encryption-key file is automatically included in server database backups to provide disaster recovery protection.

Restriction: Before you can enable encryption, you must specify a device class for server database backups. You can specify a device class by using the SET DBRECOVERY command. Schedule database backups to run on a regular basis.

For instructions about backing up the server database and related files, see Defining schedules for server maintenance activities in the IBM Spectrum Protect documentation.

Although you do not have to manage encryption, you must manage the password that is used to protect the master encryption-key file. After you specify this password, it will be required to restore a server database backup.

Tip: If you encrypt storage pool data, it can significantly affect performance. Enable encryption for on-premises cloud pools only if it is necessary.

Device Class page (shown only for retention storage pools and certain volume-based pools)

A device class identifies the type of storage device that can be used by a storage pool. Device classes are collections of similar storage devices.

To create a new device class or change existing settings, use the DEFINE DEVCLASS and UPDATE DEVCLASS commands.

Migration page (shown only for certain volume-based pools)

You can optionally migrate data from a primary storage pool to one or more additional primary pools to create a storage hierarchy. For example, data can be initially stored on disk and later migrated to lower-cost tape storage.

By default, migration starts when capacity usage reaches 90% and stops at 70%. You can change this setting on the Properties page for the storage pool (Storage Pools > Details > Properties).

Files are migrated to the next storage pool in the hierarchy by using the MIGRATE STGPOOL command. Files are also written to this next storage pool if they exceed the maximum size that is specified for the primary storage pool.

Restriction: You can select only primary storage pools with a data format of NATIVE or NONBLOCK as a migration destination.

Storage Pool Encryption page (shown only for retention pools that write data to cloud storage and directory-container pools)

You can specify whether the server encrypts client data before the server writes the data to the storage pool. If you enable encryption, client data is encrypted by the server by using 256-bit Advanced Encryption Standard (AES) data encryption. If you do not enable encryption, be aware that unencrypted data does not have data privacy and integrity protections against unauthorized users who gain access to the data.

You can later enable or disable encryption for the storage pool from the Properties page for a storage pool (Storage Pools > Details > Properties).

Copy Storage Pool page (shown only for certain volume-based pools)

Copy storage pools back up primary storage pools to provide a means to recover from disasters or media failures.

Tip: Even if data is simultaneously written to the copy pool when data is backed up to the primary pool, you must still regularly back up the primary pool. If you choose to not specify a daily backup time in the wizard, ensure that the primary pool is backed up regularly to maintain a valid copy of its data.

After you complete the wizard, you can change the storage pool or specify more options by using the UPDATE STGPOOL command. You can also change some settings on the Properties page for the storage pool (Storage Pools > Details > Properties).

For more information about storage pools and using IBM Spectrum Protect commands, see the IBM Spectrum Protect documentation.