APAR status
Closed as program error.
Error description
A copy or archive job to a Spectrum Protect server or cloud object storage server fails with the following error: ERROR,,ddmmyy,hh-mm-ss,2,CTGGA0309,Copy failed for snapshot (ID: 39) from source [server: aaa.bbb.ccc.ddd volume: spp_1037_2166_16d67d06ff1__group0_95_ snapshot: spp_1037_2187_2_16e0e592756] to target [server: eee.fff.ggg.hhh volume: a5737436dc434289af6ee3672128ab26]. Error: TransferError: Transfer failed: Stalled This error can be seen when the offload speed between the source vSnap host and target Spectrum Protect Server is around 15 MB/s. The vSnap offloads about one 16MB object per second but the throttling logic fails to act efficiently and allows several 16 MB objects to be added to the cache within the same one second period which causes the cache to fill up immediately and cause the code to stop filling the cache to have time to offload it. After this "cache filled up/transfer paused to empty the cache" sequence happens for more than 20 times (hard coded value), the offload is aborted because the write requests to the device are too slow. IBM Spectrum Protect Plus Versions Affected: IBM Spectrum Protect Plus 10.1.x Initial Impact: Medium Additional Keywords: SPP, SPPlus, TS002622033
Local fix
There are two possible work arounds: 1. If no replication from the source vsnap to another vsnap is done, set the following to slow down the offload speed and avoid filling up the cache: - vsnap system pref set --name cloudOffloadRate --value 67108864 If replicating from this vsnap (and the network connection with that vsnap is faster than the connection to the SP server), this will also slow down the replication speed. If results are not as expected, reset the CloudOffloadRate back to its default setting using 'vsnap system pref clear --name cloudOffloadRate' 2. Adjust the caching heuristics by running the following on the vsnap which will not affect replication but may not be as effective. - vsnap system pref set --name cloudThrottleObjectsRatio --value 0.5 - vsnap system pref set --name cloudThrottleObjectsPoll --value 15 These parameters can also be reset to their default values using following commands: - vsnap system pref clear --name cloudThrottleObjectsRatio - vsnap system pref clear --name cloudThrottleObjectsPoll
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.4 and 10.1.5. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in IBM Spectrum Protect Plus levels * * 10.1.5 patch1 and 10.1.6. Note that this is subject to * * change at the discretion of IBM. * ****************************************************************
Problem conclusion
When copying data from vSnap to Spectrum Protect (SP) repository server, the data is initially written to a local cache area on the vSnap server and then uploaded to SP. If the network link between vSnap and SP is slow, the local cache area can fill up quickly. In this case vSnap throttles writes into the cache until some data is successfully uploaded and the cache usage drops down again. Due to a bug in the throttling logic, the transfer can remain stuck for an extended period which causes vSnap to think the transfer has been interrupted. This causes job failures. The problem has been resolved b fixing the throttling logic in vSnap to ensure it continues the data transfer in a correct manner.
Temporary fix
Comments
APAR Information
APAR number
IT31282
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A14
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-01-07
Closed date
2020-02-12
Last modified date
2020-04-09
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A14","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
30 January 2024