IBM Support

IT38311: VSNAP SERVER MAKES LARGE NUMBER OF UNNECESSARY CALLS TO IBM COS VAULTS WITH RETENTION ENABLED

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When IBM Spectrum Protect Plus is configured to copy data to IBM
    Cloud Object Storage (ICOS) vaults that has retention enabled
    (also referred to as immutable object storage or WORM), vSnap
    servers can make a large number of batch DELETE API requests to
    the ICOS endpoint in an attempt to delete objects that are
    locked by retention.
    
    The following symptoms can be seen when this problem is present.
    
    
    The vSnap server can show heavy CPU usage even when no jobs are
    active in IBM Spectrum Protec Plus.
    Examining the process activity on the vSnap using "top" shows a
    large number of python3 processes.
    Running "ps -ef" shows that the python3 processes are associated
    with the vsnap-maint service, for example:
    root 8366 30201 14 Apr07 ? 02:30:19 /opt/vsnap/venv/bin/python3
    /opt/vsnap/lib/vsnap/service/maintenance/maint
    root 8377 30201 13 Apr07 ? 02:24:40 /opt/vsnap/venv/bin/python3
    /opt/vsnap/lib/vsnap/service/maintenance/maint
    root 8388 30201 14 Apr07 ? 02:26:12 /opt/vsnap/venv/bin/python3
    /opt/vsnap/lib/vsnap/service/maintenance/maint
    
    On the ICOS side, examination of the access logs show a large
    number of incoming DELETE requests, most of which fail with
    status code 451 which is reported when objects are locked by
    retention and cannot be deleted.
    
    A secondary symptom is that in some cases, the ICOS system can
    be overwhelmed by the large number of delete requests and this
    can result in other PUT or POST requests failing or timing out.
    If this occurs, copy jobs targeting that ICOS endpoint can fail
    with the following error seen in the job log:
    
    ERROR,CTGGA0309,Copy failed for snapshot <details> Error:
    TransferError: Transfer failed: Failed to upload object to
    <endpoint>. Reason: InternalError: We encountered an internal
    error. Please try again. status code: 500.
    
    The problems occur because there are certain metadata objects
    maintained by IBM Spectrum Protect Plus which are updated
    frequently.
    Since objects cannot be updated directly when they are locked, a
    new updated copy of the metadata is uploaded and the previous
    copy becomes a candidate for deletion.
    
    For non-metadata objects, Spectrum Protect Plus keeps track of
    the retention settings on the vault, and it ensures that it only
    expires data after the retention has passed.
    But for metadata objects that are frequently updated as part of
    routine copy operations, the vSnap server attempts to delete the
    older metadata as soon as a newer copy is uploaded.
    If the older copy is still locked by retention, ICOS rejects the
    deletion request, but the vSnap server keeps retrying it on a
    frequent basis.
    
    As the number of pending objects scales higher, especially on
    vaults that have a large retention value like several months,
    the vSnap server ends up with a large backlog of metadata
    objects that are pending for deletion.
    Even though ICOS keeps rejecting the deletion requests, the
    large number of calls results in heavy resource usage on both
    the vSnap server as well as the ICOS system.
    
    IBM Spectrum Protect Versions Affected:
    IBM Spectrum Protect Plus 10.1.3 and later.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus level 10.1.3, 10.1.4, 10.1.5,      *
    * 10.1.6, 10.1.7 and 10.1.8                                    *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in IBM Spectrum Protect Plus level     *
    * 10.1.9. Note that this is subject to change at the           *
    * discretion of IBM.                                           *
    ****************************************************************
    

Problem conclusion

  • The vsnap-maint service has been enhanced to have better
    awareness of retention-enabled vaults in IBM Cloud Object
    Storage. When vsnap-maint detects that the vault has retention
    enabled, it will no longer make frequent attempts to delete
    metadata objects that are locked by retention. Instead, the
    service detects the retention value of the vault and then
    schedules the pending session to be retried only after the
    appropriate number of days has passed.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT38311

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A18

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-09-09

  • Closed date

    2021-09-29

  • Last modified date

    2021-09-29

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • vSnap    Offload
    

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A18","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
31 January 2024