Directory-container storage pools FAQs

Question & Answer

Question

Tip: IBM Spectrum Protect was previously known as IBM Tivoli Storage Manager.

What is a container type storage pool? Are there different types of container storage pools?
How are DIRECTORY type containers different from DISK or SEQUENTIAL type volumes?
Can I move data out of a directory-container storage pool?
Can I use existing commands or utilities to move data out of a directory-container storage pool?
Can I move data within a directory-container storage pool?
How can I store existing data to a directory-container pool by using replication? Are there other techniques?
How can I protect storage pool data if I cannot use the BACKUP STGPOOL command with this type of storage pool?
Can I reclaim volumes in a container-copy storage pool?
Do I need to issue the PROTECT STGPOOL command if I replicate all nodes in a directory-container storage pool?
Can I issue a REPLICATE NODE command if the target server is a version 7.1.2 or earlier server?
What is inline data deduplication?
Does the inline data deduplication use the same data deduplication methods as client-side data deduplication?
Do I need to reorganize the database tables if I use directory-container storage pools?
Do I need to reorganize database indexes if I use directory-container storage pools?
Can I reduce the number of servers I use by migrating data to directory-container pool? How do I calculate how many servers I need for the hardware that I use?
Why do I see two different types of container storage pools stored in my filesystem(s)?
Why do directory container sizes vary?
Can I generate node-level data deduplication reports for directory-container storage pools?
Is it possible to get detailed space savings information through the server SUMMARY SQL table for the directory-container storage pools?
Is there a background housekeeping process for directory-container storage pools similar to existing deduplicated storage pools (SHOW DEDUPDELETEINFO)?
Is there a DELETE CONTAINER command, or some equivalent, for removing storage from a directory-container storage pool?
How does the storage pool REUSEDELAY parameter affect data extents in a container storage pool?
Do directory-container type storage pools complete any defragmentation of containers?
Is there a first failure data capture file I can look at if I am experiencing some type of failure?
Is there any guidance document available to set up a container storage pool? Specifically, documents around provisioning proper system and storage and system tuning?
Can I enable at-rest encryption for an existing directory-container storage pool?
When should I schedule the ENCRYPT STGPOOL operation?
How many processes should be used during the ENCRYPT STGPOOL operation?
How does directory-container storage pool encryption affect my replication target storage pool?
What are the benefits of server-side at-rest encryption vs client-side encryption?

Answer

Q1. What is a container type storage pool? Are there different types of container storage pools?

A container storage pool is a new type of storage pool that is designed specifically for data deduplication. Data is stored in containers and is either deduplicated at the source (client) or inline during the server ingest phase. By using directory-container storage pools, you do not need to post-process the data to deduplicate the data. There are two types of container storage pools:

Cloud-container: Cloud based storage pools. For the FAQs for the cloud-container storage pool, see Cloud-container storage pools FAQs.
Directory-container: Directory based storage pools that are defined by using the DEFINE STGPOOLDIRECTORY command.

Q2. How are DIRECTORY type containers different from DISK or SEQUENTIAL type volumes?: Directory-containers are the best of both DISK and SEQUENTIAL worlds. Directory-containers are hybrids of the two techniques that allow for advantages of both without some of the disadvantages. Directory-containers are managed like sequential type file volumes but don't have the disadvantage of reclamation. Data extents are stored in a random fashion like DISK volumes but don't have the disadvantage of a static footprint and lack of variable sizing.

Q3. Can I move data out of a directory-container storage pool?: No. There is no utility available to move data out of a container storage pool. This capability is being considered by IBM Spectrum Protect product management for the future roadmap.

Q4. Can I use existing commands or utilities to move data out of a directory-container storage pool?

No, but these, or similar, capabilities are being considered by IBM Spectrum Protect product management for the future roadmap. Traditional move and copy features are not supported for directory-container storage pools. You cannot use the following commands with directory-container storage pools:

BACKUP STGPOOL
EXPORT/IMPORT
GENERATE BACKUPSET
MOVE DATA
MOVE NODEDATA
MIGRATE STGPOOL

Q5. Can I move data within a directory-container storage pool?: Yes. The MOVE CONTAINER command can be used to move data from one container to another container. A new container is created for the output so there must be enough space to allocate the entire size of the source container. This command can be useful to expand the storage pool size or to move data from one storage device to another.

Q6. How can I store existing data to a directory-container pool by using replication? Are there other techniques?
You can convert a primary storage pool that uses a FILE device class, a tape device class, or a virtual tape library (VTL) to a container storage pool. Data that is stored in a container storage pool can use inline data deduplication, inline compression, and at-rest encryption. For more information about storage pool conversion, see the following topics:: For information more information about storage pool conversion best practices and recommendations, see the following technote: https://www.ibm.com/support/pages/node/555549

Q7. How can I protect storage pool data if I cannot use the BACKUP STGPOOL command with this type of storage pool?: Use the PROTECT STGPOOL command to protect data that is stored in directory-container storage pools. The data extents are sent to a target server through an underlying replication method. Both deduplicated data and metadata extents are protected and copied to the target storage pool. You cannot protect the inventory with the PROTECT STGPOOL command. You must still issue the REPLICATE NODE command to ensure data protection in case of a failover.

Q8. Can I reclaim tape volumes in a container-copy storage pool?

You can reclaim tape volumes in a container-copy storage pool without running a protection operation. Similarly, you can protect a container-copy storage pool without running a reclamation operation.

Q9. Do I need to issue the PROTECT STGPOOL command if I replicate all nodes in a directory-container storage pool?

As part of a disaster recovery strategy, ensure that a backup copy of data in storage pools is available at a remote site. It is highly recommended that you issue the PROTECT STGPOOL command regardless of the current replication strategy in place. The PROTECT STGPOOL command is a storage level protection mechanism and allows for the repair of damaged extents without any interaction with the inventory. If the PROTECT STGPOOL is issued before the REPLICATE NODE command completes, the replication process completes faster. This is an added benefit as the primary goal is to protect the storage in case of local damage scenario.

Tip: The REPAIR STGPOOL command can be used to recover damaged data extents from a replication pair. This is a similar feature to the RESTORE STGPOOL command in DISK and SEQUENTIAL type storage pools.

For local storage protection, you can repair the data from a copy in a container-copy storage pool. Container-copy storage pools are used only to protect the data that is stored in directory-container storage pools. For more information about repairing and recovering data, see Repairing and recovering data in directory-container storage pools.

Q10. Can I issue a REPLICATE NODE command if the target server is a version 7.1.2 or earlier server?: No. A V7.1.2 or earlier server can replicate node data to a V7.1.3 server but a V7.1.3 server cannot replicate to an earlier version server.

Q11. What is inline data deduplication?: Inline data deduplication is a new server-side data deduplication that automatically deduplicates data as it is ingested into the server. Inline data deduplication processes data that is not already deduplicated on the client.

Q12. Does the inline data deduplication use the same data deduplication methods as client-side data deduplication?: Yes, inline server data deduplication uses the same data deduplication algorithms that is used by client-side data deduplication and existing server-side data deduplication with the IDENTIFY process. Thus, the data extents that are identified match the data extents that are found by any previous data deduplication processing. Data extents that are identified by using client-side data deduplication can match data extents that are identified by inline data deduplication.

Q13. Do I need to reorganize the database tables if I use directory-container storage pools?

Possibly. While the likelihood of needing a table reorganization is much lower with the directory-container storage pools, the inventory component of the server might still become fragmented over time. If database space must be reclaimed, you might need to reorganize the following two database tables:

BACKUP_OBJECTS
ARCHIVE_OBJECTS

For more information about database and reorganization see https://www.ibm.com/support/pages/node/410123

Q14. Do I need to run offline index reorganization if I use directory-container storage pools?: No. The server completes inline index reorganizations with a CLEANUP ONLY technique. No other type of index reorganization is required.

Q15. Can I reduce the number of servers I use by migrating data to a directory-container pool? How do I calculate how many servers I need for the hardware that I use?: You might be able to reduce the number of servers that you need. For provisioning and sizing information, see IBM Spectrum Protect Blueprints.

Q16. Why do I see two different types of containers stored in my filesystem(s)?: The directory-container storage pool separates non-deduplicated data extents from the deduplicated data extents. The deduplicated extents are stored in xxxxxxx.dcf type containers and non-deduplicated data extents (encrypted data and metadata) are stored in xxxxxxx.ncf type containers. You do not need to manage containers because the server manages containers automatically.

Q17. Why do directory container sizes vary?

The size of a container depends on several factors. The first one is the type of container that you have. A non-deduplicated container is empty initially and is then immediately expanded as data is written to the container. A non-deduplicated container is always much smaller than a deduplicated container. A deduplicated container is always pre-allocated to a predetermined size.

The second factor is the predetermined size of a container. The predetermined size of a container is based on:

The type of container
The amount of free space that is available in the assigned storage pool directories

There must be a balance between the container size and how many containers can be concurrently opened to avoid using all the free space. The maximum size for any container is 10 GB.

A new container is created whenever the size of data to be stored exceeds the available free space in any existing containers for the storage pool.

Q18. Can I generate node-level data deduplication reports for directory-container storage pools?

Yes. With container storage pools, you can generate detailed data deduplication reports at a node or file space level. For more information, see the following commands:

Q19. Is it possible to get detailed space savings information through the server SUMMARY SQL table for the directory-container storage pools?

Yes, with container storage pools, you can view information about space savings in the SUMMARY table. The following fields were added to the BACKUP and ARCHIVE activities for data that is stored in a container storage pool:

BYTES_PROTECTED: <Bytes that have been protected>
BYTES_WRITTEN: <Actual bytes sent from the client>
DEDUP_SAVINGS: <Savings from deduplication processing>
COMP_SAVINGS: <Savings from client-side compression>

Q20. Is there a background housekeeping process for directory-container storage pools similar to existing deduplicated storage pools (SHOW DEDUPDELETIONINFO)?: Container storage pools use a deletion process that is similar, but operates very differently. All directory-container and cloud-container storage pools use the same deletion process, which examines data in each storage pool and each container, to identify data extents eligible for deletion. This deletion process runs continuously in the background and analyzes deduplicated data extents for eligibility based on reference counts and the value of the REUSEDELAY parameter. When a data extent is eligible for deletion, it is removed from the container storage pool and any servers that protect the data extents are updated during the next PROTECT STGPOOL operation.
Monitor this background deletion process by using the QUERY EXTENTUPDATES command.

Q21. Is there a DELETE CONTAINER command, or some equivalent, for removing storage from a directory-container storage pool?: Yes. To remove storage from a directory-container storage pool, complete the following steps:
1. Issue the AUDIT CONTAINER command and specify the ACTION=MARKDAMAGED parameter to mark all data extents in the container as damaged.
2. Then, issue the AUDIT CONTAINER command again and specify the ACTION=REMOVEDAMAGED parameter to remove the files from the database that reference the damaged data extent.
For more information, see the AUDIT CONTAINER command.

Q22. How does the storage pool REUSEDELAY parameter affect data extents in a container storage pool?

By specifying how long data extents, that are no longer referenced, can be eligible for reuse before they are removed from the storage system. The data deduplication process for directory-container and cloud-container storage pools identifies the last date and time that the data extent was referenced to determine whether the data extent can be used again. If the data extent is already used by another object, or if the data extent is within the reuse delay period, the data extent is reused if the data deduplication process matches the data extent. The REUSEDELAY parameter is set in days, and can be increased if longer reuse periods are required. Longer reuse periods mean that the data in the storage pool is retained longer before it is deleted due to lack of use. While a value of zero days is allowed, it is not recommended in most circumstances. This forces the data deduplication process to store more data than it otherwise would, and can decrease backup, replication, and protection performance. Only use this value if you attempt to clear large amounts of data from the storage pool and you are not completing any of these storage operations.

The REUSEDELAY value should include at least one database backup so that if the database is restored during this time, an audit of the containers is not required. Database backups must be equal to, or more frequent than, the REUSEDELAY value.

Tip: Do not specify a value of 0 for the REUSEDELAY parameter. If you specify the REUSEDELAY=0 parameter on the DEFINE STGPOOL or UPDATE STGPOOL command, all of the deduplicated extents that are no longer referenced are deleted from a directory-container storage pool.

Q23. Do directory-container type storage pools complete any type of container housekeeping?: Yes, it is possible that directory-container storage pools might need to complete container defragmentation. This should be a rare occurrence as the free space handling of containers usually allows for proper reuse through the containers lifecycle. However, there might be a fringe case where a defragmentation process is started to recover space from a container. This is a registered process and can be viewed by using the QUERY PROCESS command.

Q24. Is there a first failure data capture file I can look at if I am experiencing some type of failure?

Yes, the IBM Spectrum Protect server records FFDC information in a rolling log of 10 files. These can be found in the instance directory, typically where the dsmserv.opt file is located, and have the following naming convention:

dsmffdc.log
dsmffdc.log1
dsmffdc.log2

These logs are valuable to the support team and they might also contain information that an administrator can use to solve a problem without the intervention of the support teams.

Q25. Is there any guidance document available to set up a container storage pool? Specifically, documents around provisioning proper system and storage and system tuning?

Yes, we have blueprint documentation available for container type storage pools. For more information, see IBM Spectrum Protect Blueprints.

For configuring a storage pool, see the following information:

Q26. Can I enable at-rest encryption for an existing directory-container storage pool?
Ensure that your database backups include a copy of the master encryption key for the server that is used to encrypt storage pool data. By default, the BACKUP DB command protects the master encryption key. You can view whether database backups include a copy of the server master encryption key by issuing the QUERY DB command.

Use the DEFINE STGPOOL or UPDATE STGPOOL command to define a directory-container storage pool encryption enabled. Updating an existing directory-container storage pool to use encryption encrypts new client data that is ingested by the server, however, existing data that is ingested cannot be encrypted.

Q27. When should I schedule the ENCRYPT STGPOOL operation?
Schedule the ENCRYPT STGPOOL operation around data movement operations such as storage pool protection, node replicatoin, or container movement. The server prevents an ENCRYPT STGPOOL operation and data movement operations from running concurrently. However, overlapping a storage pool encryption operation with other operations such as client ingest has minimal impact on server performance and client ingest performance. Additionally, the storage pool encryption operation does not need to complete in a single run. You can define an administrative schedule to start and stop the operation over time.

Q28. How many processes should be used during the ENCRYPT STGPOOL operation?
The number of processes should match the number of cores that are available on your IBM Spectrum Protect server.

Q29. How does directory-container storage pool encryption affect my replication target storage pool?
The IBM Spectrum Protect Server maintains encrypted data on the target storage pool if the target storage pool supports encryption. Storage pool encryption is available on IBM Spectrum Protect Version 8.1.2 and later. If encryption is disabled on the target storage pool, the source data that is encrypted remains encrypted on the target storage pool. Data that was not encrypted on the source storage pool is not encrypted on the target storage pool.

The following table shows how data is encrypted on V8.1.2 and earlier source and target servers:

Server	V8.1.2 target server with encryption enabled	V8.1.2 target server with encryption disabled	V8.1.1 and earlier target servers
V8.1.2 source server	Encrypted data extents are sent as-is to the target server. Non-encrypted data extents are encrypted on the target server.	Encrypted data extents are sent as-is to the target server. Non-encrypted data extents are sent as-is to the target server.	Server-encrypted data extents are decrypted on the source server before the data extents are sent to the target server. Non-encrypted data extents are sent as-is to the target server.
V8.1.1 and earlier source servers	Data extents are encrypted on the target server.	Data extents are sent as-is to the target server.	Data extents are sent as-is to the target server.

Q30. What are the benefits of server-side at-rest encryption Vs. client-side encryption?

Server-side encryption reduces client-side processing that would be used to encrypt client-side. Server-side encryption also saves space by allowing server-side deduplication and compression to take effect before the data is encrypted.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEQVQ","label":"IBM Spectrum Protect"},"Component":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Tips

Directory-container storage pools FAQs

Question & Answer

Question

Answer

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?