IBM Support

IT38973: LONG DURATION FOR MAINTENANCE JOBS PROCESSING CLOUD OR ARCHIVE SNAPSHOTS DUE TO LONG LASTING DELETION VSNAP API CALLS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Copy to Cloud or Archive was introduced in IBM Spectrum
    Protect Plus version 10.1.3.
    Copy or archive snapshots are to be deleted after the
    maintenance job has expired by running an
    
    During that deletion, a live check happens to ensure there are
    no clones present on any other vSnap.
    
    This live check includes a LIST call to the cloud repository.
    Once the check is complete, the rest of the snapshot deletion
    happens as a background session which does not affect runtime of
    the Maintenance job.
    
    If this LIST call takes a long time, this affects the duration
    of the Maintenance job.
    That case the maintenance job can range from an insignificant
    amount of a few minutes to hours, depending on the LIST call
    duration and the amount of Cloud or Archive snapshots to
    process.
    To see how long it takes to delete a single cloud or archive
    snapshot, review the virgo log found in the job log bundle :
    
    .. Vsnap Call ::DELETE https://<vSnapHost>:8900/api/partner/<Pa
    rtnerId>/snapshot/<SnapshotId>?partner_type=archive&action=dele
    tetime Taken 97950 ms
    ==> 97950 ms = 1 minutes 38 seconds
    
    The same is found on the vSnap in /opt/vsnap/log/uswgi.log :
    
    [pid: 6276 .. DELETE /api/partner/<PartnerId>/snapshot/<Snapsho
    tId>?partner_type=archive&action=delete=> generated xxx bytes in
    
    and more details in /opt/vsnap/log/vsnap.log searching with the
    <snapshotId> :
    
    [<date> 08:50:26,363] INFO pid-6276 vsnap.api    API request
    started: DELETE /partner/<PartnerId>/snapshot/<SnapshotId>?part
    ner_type=archive&action=delete| Body: None
    [<date> 08:50:26,503] INFO pid-6276 vsnap.core    In
    cloud_snapshot_delete with part_id <PartnerId> and version
    <SnapshotId> and vol_id None
    [<date> 08:50:26,503] INFO pid-6276 vsnap.cloud.core    In
    delete snapshot for <SnapshotId>
    [<date> 08:50:26,519] INFO pid-6276 vsnap.common.remote
    Checking if partner id <PartnerId> has dependent clones
    [<date> 08:50:26,648] INFO pid-6276 vsnap.cloud.util
    Retrieving clones for volume id <VolumeId> version <SnapshotId>
    ==> delay of 1 min 38 sec is here <==
    [<date> 08:52:04,303] INFO pid-6276 vsnap.cloud.core    Deleting
    snapshot <SnapshotId> from partner <PartnerId>
    ...
    
    IBM Spectrum Protect Plus Versions Affected:
    IBM Spectrum Protect Plus 10.1.3 and later
    
    Additional Keywords: SPP, SPPLUS, TS007298874, offload
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus versions 10.1.4, 10.1.5, 10.1.6,   *
    * 10.1.7 and 10.1.8                                            *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in IBM Spectrum Protect Plus level     *
    * 10.1.9. Note that this is subject to change at the           *
    * discretion of IBM.                                           *
    ****************************************************************
    

Problem conclusion

  • As part of deletion of cloud/archive recovery points, a live
    check for clones was performed by making a call to the Spectrum
    Protect repository server or cloud endpoint. If the call is slow
    to complete, it introduces a delay during the Maintenance job.
    As the number of recovery points being deleted increases, the
    delays add up. This issue has been resolved by implementing code
    fixes to perform the clone check in a more optimized manner.
    Rather than performing it live during the Maintenance job, the
    check is performed in a background session on the vSnap as part
    of the space reclamation process. The end result is a reduction
    in runtime of the Maintenance job. Note that the reduction in
    job runtime is only noticeable if the original problem was
    present i.e. the Spectrum Protect repository server or cloud
    endpoint was slow to respond to certain queries.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT38973

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A18

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-11-04

  • Closed date

    2021-11-09

  • Last modified date

    2021-11-09

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • vSnap
    

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A18","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
31 January 2024