APAR status
Closed as program error.
Error description
A replication job in IBM Spectrum Protect Plus can fail with the following error shown in the job log: CTGGA0583: Exception occurred in post processing for replication. Backup Error. Unable to determine value in getSnapshots The error is more likely to occur on large replication jobs for SLAs that contain a large number of protected resources (virtual machines, databases, etc.). The error can occur after the job has already been running for several hours. Further examination of the Virgo log associated with the job shows that the cause of the failure is an error returned from the vSnap server while trying to collect snapshot information: VSnap Call GET https://<vsnap>:8900/api/volume/<id>/snapshot time Taken 300160 ms reason : org.springframework.web.client.HttpServerErrorException: 500 INTERNAL SERVER ERROR Status: 500 {"error":{"message":"Failed to collect snapshot information","type":"SnapshotInfoError"}} In the vSnap logs, the failure is observed to be caused by a timeout of a "zfs list" command: ERROR pid-xxxxx vsnap.linux.system Timed out (300 seconds) waiting for command to complete: zfs list -t snapshot -o name,guid The problem occurs when the vSnap is under heavy I/O load during a large replication job. SPP makes repeated API calls to vSnap to collect snapshot information. For each API call, the vSnap server tries to query snapshot information from the storage pool. When the pool is under heavy I/O load and when there are a large number of snapshots in the pool, it can take a long time to collect snapshot properties which leads to the timeout.
Local fix
- Modify SLA schedules to avoid overlap of multiple large jobs if possible. - Modify advanced options for the vSnap server and lower the value for option 'Concurrent stream limit for replication'. For example, lower it to 3 from the default value of 5.
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus levels 10.1.6, 10.1.7, 10.1.8. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description. * **************************************************************** * RECOMMENDATION: * * Apply the fixing level when available. This problem is * * projected to be fixed in IBM Spectrum Protect Plus level * * 10.1.8.ifix2 and 10.1.9. Note that this is subject to change * * at the discretion of IBM. * ****************************************************************
Problem conclusion
An improved caching mechanism has been introduced on vSnap servers in an effort to minimize the amount of metadata that must be read from the storage pool. When IBM Spectrum Protect Plus makes a large number of repeated attempts to query snapshot information during replication jobs, responses are returned from the cache thus ensuring that the vSnap can respond quickly without having to repeatedly read the same information from the storage pool. Depending on the release, the caching mechanism can be disabled by default and can be manually enabled on vSnaps using command: vsnap system pref set --name resourceListCacheAutoInit --value true. The vSnap server must be restarted after enabling this setting.
Temporary fix
Comments
APAR Information
APAR number
IT36087
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A16
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-03-02
Closed date
2021-08-27
Last modified date
2021-08-27
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
vSnap ZFS
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A16","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
31 January 2024