IBM Support

IT33586: COPY TO CLOUD OR REPOSITORY FAILS WITH IMPORT TIMEOUT OR OVERLAPPING BACKUP JOB FAILS WITH "UNABLE TO GET STORAGE VOLUME"

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • During incremental copy to cloud or repository server, the
    existing cloud pool is imported so that incremental changes can
    be written to it. Due to slow reads from the cloud or repository
    server, the import process can time out or fail.
    
    The job log shows any one of the following error messages:
    
    CTGGA0309,Copy failed for snapshot <snapshot details>. Error:
    Exception: Failed to create gateway device: Could not find
    device path for serial <serial>
    CTGGA0309,Copy failed for snapshot <snapshot details>. Error:
    Exception: Failed to create/import offload pool: Command timed
    out: <zpool or zfs command>
    
    Instead of the errors above, an alternative symptom may be seen
    as described below.
    
    Due to slow reads from the cloud or repository server, the
    import process can take a long time. During this step, the
    import process holds a global filesystem lock. For any
    concurrent backup or copy operations running at the same time,
    those operations may perform a "zpool list" command to retrieve
    the list of pools. The list command has to wait for the lock to
    be released by the import process. If the import is slow, the
    list command can time out and cause failure of the backup or
    copy job.
    
    The job log shows any one of the following error messages:
    
    PoolInfoError: Failed to collect pool details for <pool ID>
    Fail to get the volume with volume id <ID>, Unable to get
    storage volume
    
    Further investigation of the vSnap logs shows timeout of "zpool
    list" command, and the stack of the timed out process shows:
    
    [<ffffffffc09cbe21>] spa_open_common+0x61/0x5d0 [zfs]
    [<ffffffffc09cc40d>] spa_get_stats+0x4d/0x330 [zfs]
    [<ffffffffc0a254a9>] zfs_ioc_pool_stats+0x39/0x90 [zfs]
    [<ffffffffc0a2ee9d>] zfsdev_ioctl+0x65d/0x6c0 [zfs]
    [<ffffffff8ce5fbc0>] do_vfs_ioctl+0x3a0/0x5a0
    [<ffffffff8ce5fe61>] SyS_ioctl+0xa1/0xc0
    [<ffffffff8d38dede>] system_call_fastpath+0x25/0x2a
    [<ffffffffffffffff>] 0xffffffffffffffff
    
    Versions Affected: 10.1.*
    

Local fix

  • N/A
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus level 10.1.5 and 10.1.6.           *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See ERROR DESCRIPTION                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply the fixing level when available. This problem is       *
    * currently projected to be fixed in IBM Spectrum Protect      *
    * level 10.1.7. Note that this is subject to change at the     *
    * discretion of IBM.                                           *
    ****************************************************************
    

Problem conclusion

  • IBM Spectrum Protect Plus uses an incremental-forever approach
    to store cloud copies. During an incremental copy operation, the
    previous copy of the cloud pool is mounted and the changed data
    is written to it. Mounting the cloud pool requires reading back
    some amount of data and metadata written to the cloud.
    
    The root cause of the problem described in this APAR is that
    reads from the cloud disk were slow. During mounting of a cloud
    pool for incremental copies, the slow reads resulted in the pool
    taking much longer to import. In some cases, an internal
    filesystem lock held during the import process prevented other
    concurrent operations from running which caused failures for
    other concurrent backup or copy jobs running at the same time.
    
    The performance of the reads from the cloud has been improved
    through the reading of larger chunks and better caching. This
    has created an improvement in read performance.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT33586

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A15

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-07-17

  • Closed date

    2020-11-18

  • Last modified date

    2020-11-18

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A15","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
31 January 2024