IBM Support

IT32474: FAILURE OR CANCELLATION OF COPY TO OBJECT STORAGE CAUSES HANGS ON VSNAP SERVER

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • During copy to standard object storage (cloud or repository
    server), if the job is cancelled or if there are connectivity
    errors during data upload, SPP attempts to abort the operation
    by removing the virtual cloud devices that have been mounted on
    the vSnap server. In some cases, this results in processes
    hanging on the vSnap server.
    One or more of the following symptoms can be seen:
     Other jobs of all types (backup, restore, replication, copy)
    against the same vSnap server can fail or hang.
     Running "ps aux" on the vSnap server shows many hung "zpool" or
    "zfs" processes in "D" state.
     In /var/log/messages on the vSnap server, stack traces of
    various hung commands are repeatedly logged. Many of the stack
    traces contain lines that mention "txg_wait_synced"
    Initial Impact: High
    IBM Spectrum Protect Plus Versions Affected:
    IBM Spectrum Protect Plus 10.1.5
    Additional Keywords: SPP SPPlus TS003483077
    

Local fix

  • Reboot the vSnap Server.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus level 10.1.5.                      *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in IBM Spectrum Protect Plus level     *
    * 10.1.5.2204 and 10.1.6. Note that this is subject to change  *
    * at the discretion of IBM.                                    *
    ****************************************************************
    

Problem conclusion

  • When a copy operation to object storage fails or is
    cancelled/aborted, vSnap attempts to remove the virtual cloud
    devices that have been mounted on the vSnap server. Depending on
    the amount and nature of partial I/O operations that were in
    flight at the time, the unmount of the cloud pool could fail.
    The hung I/O operations on that cloud pool then resulted in
    hangs of other I/O operations including some against the main
    local storage pool.
    
    The problem has been resolved by improving the way vSnap removes
    the cloud pool while aborting an operation. The forcible unmount
    process has been improved in order to handle any in-flight I/O
    operations. This ensures that cloud pools are always cleanly
    removed without causing hangs.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT32474

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A15

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-04-06

  • Closed date

    2020-04-09

  • Last modified date

    2020-04-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A15","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
31 January 2024