APAR status
Closed as program error.
Error description
During copy to standard object storage (cloud or repository server), if the job is cancelled or if there are connectivity errors during data upload, SPP attempts to abort the operation by removing the virtual cloud devices that have been mounted on the vSnap server. In some cases, this results in processes hanging on the vSnap server. One or more of the following symptoms can be seen: Other jobs of all types (backup, restore, replication, copy) against the same vSnap server can fail or hang. Running "ps aux" on the vSnap server shows many hung "zpool" or "zfs" processes in "D" state. In /var/log/messages on the vSnap server, stack traces of various hung commands are repeatedly logged. Many of the stack traces contain lines that mention "txg_wait_synced" Initial Impact: High IBM Spectrum Protect Plus Versions Affected: IBM Spectrum Protect Plus 10.1.5 Additional Keywords: SPP SPPlus TS003483077
Local fix
Reboot the vSnap Server.
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.5. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in IBM Spectrum Protect Plus level * * 10.1.5.2204 and 10.1.6. Note that this is subject to change * * at the discretion of IBM. * ****************************************************************
Problem conclusion
When a copy operation to object storage fails or is cancelled/aborted, vSnap attempts to remove the virtual cloud devices that have been mounted on the vSnap server. Depending on the amount and nature of partial I/O operations that were in flight at the time, the unmount of the cloud pool could fail. The hung I/O operations on that cloud pool then resulted in hangs of other I/O operations including some against the main local storage pool. The problem has been resolved by improving the way vSnap removes the cloud pool while aborting an operation. The forcible unmount process has been improved in order to handle any in-flight I/O operations. This ensures that cloud pools are always cleanly removed without causing hangs.
Temporary fix
Comments
APAR Information
APAR number
IT32474
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A15
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-04-06
Closed date
2020-04-09
Last modified date
2020-04-09
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A15","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
31 January 2024