IBM Support

IT35884: COPY TO IBM SPECTRUM PROTECT SERVER STOPS WITH 'CTGGA0309 ... COULD NOT FIND DEVICE PATH FOR SERIAL <XXXX>'

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The IBM Spectrum Protect Plus copy to the IBM Spectrum Protect
    repository server can stop with the following messages seen in
    the job log :
    
    SUMMARY,<timestamp>,CTGGA2398,Starting job for policy <SLAName>
                                  id -> <JobID>. IBM Spectrum
                                  Protect Plus version 10.1.7-3043.
    ...
      ERROR,<timestamp>,CTGGA0309,Copy failed for snapshot (ID:
                                  <SnapshotID>) from source
                                  [server: <vSnapAddress> volume:
                                  <SourcevSnapVolume> snapshot:
                                  <SnapshotName>] to target
                                  [server: <vSnapAddress> volume:
                                  <TargetVolumeName>]. Error:
                                  Exception: Failed to create
                                  gateway device: Could not find
                                  device path for serial
                                  <CloudDeviceSerial>
      ERROR,<timestamp>,CTGGA0310,Skipping remaining snapshots for
                                  volume <vSnapAddress>:
                                  <SourcevSnapVolume> due to
                                  unrecoverable error for vSnap
                                  session <OffloadSessionID>
    
    and with these messages in the vSnap replication log :
    
    [<timestamp>] INFO    pid-<xxxx> vsnap.common.model Session
                                     <OffloadSessionID>: message =
                                     Preparing cloud gateway device
    ...
    [<timestamp>] INFO    pid-<xxxx> vsnap.target Creating bdg
                                     backing store named offload_
                                     <OffloadPoolName> with
                                     cfgstring poc@<xxx>@<yyy>@16,
                                     max_data_area_mb=128,hw_block_
                                     size=4096,hw_max_sectors=2048
    ...
    [<timestamp>] INFO    pid-<xxxx> vsnap.linux.system Executing
                                     command: vsnap_targetcli
                                     /loopback/naa.<zzzz>/luns
                                     create /backstores/user:bdg/
                                     offload_<OffloadPoolName>
    ...
    [<timestamp>] INFO    pid-<xxxx> vsnap.cloud.driver Getting
                                     device path by serial,
                                     attempt 5
    [<timestamp>] ERROR   pid-<xxxx> vsnap.linux.system Timed out
                                     (10 seconds) waiting for
                                     command to complete:
                                     /lib/udev/scsi_id --page 0x80
                                     --whitelisted --device
                                     /dev/sd<x>
    ...
    [<timestamp>] WARNING pid-<xxxx> vsnap.cloud.driver Could not
                                     determine serial for sd<x>,
                                     skipping it
    [<timestamp>] WARNING pid-<xxxx> vsnap.cloud.driver Could not
                                     get device path by serial:
                                     Failed to find device with
                                     serial <CloudDeviceSerial>
    
    This occurs during an incremental copy operation when the vSnap
    server tries to mount the virtual cloud device and then imports
    the vSnap cloud pool from it.
    As soon as the vSnap attaches the cloud device at the start of
    the copy operation, the Linux Operating System detects that a
    new disk has been attached and tries to read the partition
    table.
    At the same time, the vSnap offload process tries to perform
    SCSI inquiries to the device to detect its serial number.
    
    If the IBM Spectrum Protect object agent is slow to respond to
    read requests during this time, these SCSI inquiries can time
    out and cause the copy to fail.
    In most cases (but not necessarily all), the slow read
    responses from the IBM Spectrum Protect object agent can be
    confirmed by looking in the gwdriver<ID>.log file associated
    with that offload operation located in the vSnap log directory
    /opt/vsnap/log.
    
    The following type of messages will be seen indicating read
    responses are timing out:
    WARN: ReadPart(<xxxxxx>/<yyyyyyy>/<zzzzzz>:<aaa>:<bbb>) failed,
          reason (RequestCanceled: request context canceled)
    
    The IBM Spectrum Protect server APAR IT35592 addresses the slow
    read responses from the object agent.
    This APAR is to improve the behaviour of the vSnap server when
    the IBM Spectrum Protect agent is slow in responding to the
    requests.
    
    IBM Spectrum Protect Plus Versions Affected:
    IBM Spectrum Protect Plus 10.1.3 and higher
    
    Initial Impact: High
    
    Additional Keywords: SPP, SPPLUS, TS003888479, SP, offload
    

Local fix

  • The problem can be mitigated by increasing the vSnap server
    read timeout from 1 to 4 minutes for cloud objects as follows :
    
    vsnap system pref set --name cloudIOReadTimeout --value 240
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus levels 10.1.3, 10.1.4, 10.1.5,     *
    * 10.1.6 and 10.1.7.                                           *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply the fixing level when available. This problem is       *
    * projected to be fixed in IBM Spectrum Protect Plus level     *
    * 10.1.8. Note that this is subject to change at the           *
    * discretion of IBM.                                           *
    ****************************************************************
    

Problem conclusion

  • A code fix was implemented on vSnap to improve handling of
    timeouts when the cloud endpoint or repository server is slow to
    respond to read requests during the initial stage of the copy
    job. In most cases, this results in copy jobs succeeding instead
    of failing. Note that in extreme cases, copy jobs can still fail
    if the cloud endpoint or repository server continues to be very
    slow to respond.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT35884

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A16

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-02-12

  • Closed date

    2021-03-25

  • Last modified date

    2021-03-25

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A16","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
31 January 2024