IBM Support

IT31840: CTGGA2086 - vSnap service crashes or restarts when deleting a file share or deleting a volume at SPP 10.1.5

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • After upgrading Spectrum Protect Plus server from 10.1.4 to
    10.1.5 the vSnap server seems to crash and the vSnap
    service is being restarted. File shares cannot be deleted at the
    end of a backup job, and volumes cannot be deleted during a
    maintenance job. The same problem also accurs when attempting to
    delete the share or the volume manually from the vSnap CLI.
    The vSnap server does not respond, message 'connection refused'
    is posted.
    
    At the Spectrum Protect Plus job log following can be seen:
    
     INFO,CTGGA0634,Using storage volume spp_1006_2103_12a3456789b
       on controller vsnap.IBM.com for backup
     WARN,CTGGA1924,Resource access exception caught:
       org.springframework.web.client.ResourceAccessException:
       I/O error on DELETE request for "https://vsnap.IBM.com:8900/
       api/share/7771": vsnap.IBM.com:8900 failed to respond; nested
       exception is org.apache.http.NoHttpResponseException:
       vsnap.IBM.com:8900 failed to respond
     WARN,UNABLE_REMOVE_CLIENT_ACCESS_VOLUME
     WARN,CTGGA1924,Resource access exception caught:
       org.springframework.web.client.ResourceAccessException:
       I/O error on GET request for "https://vsnap.IBM.com:8900/
       api/share": Connect to vsnap.IBM.com:8900 [vsnap.IBM.com/
       ww.xxx.yy.zz] failed: Connection refused; nested exception
       is org.apache.http.conn.HttpHostConnect Exception: Connect
       to vsnap.IBM.com:8900 [vsnap.IBM.com/ww.xxx.yy.zz] failed:
       Connection refused (Connection refused)
     WARN,CTGGA1924,Resource access exception caught:
       org.springframework.web.client.ResourceAccessException:
       I/O error on POST request for "https://vsnap.IBM.com:8900/
       api/volume/45/share": Connect to vsnap.IBM.com:8900 [vsnap.
       IBM.com/ww.xxx.yy.zz] failed: Connection refused; nested
       exception is org.apache.http.conn.HttpHostConnectException:
       Connect to vsnap.IBM.com:8900 [vsnap.IBM.com/ww.xxx.yy.zz]
       failed: Connection refused (Connection refused)
     ERROR,CTGGA2086,Storage exception - Failed to create storage
       share on volume spp_1006_2103_12a3456789b null
     ERROR,CTGGA2403,Backup of vm IBM-VM failed target storage
       volume name spp_123456789a_V6000. Error: Failed to create
       storage share on volume spp_1006_2103_12a3456789b null
     ERROR,CTGGA0076,Unprotected vm: IBM-VM. Last error: [Volume
       already created however unable to access storage server] {1}
    
    In the vSnap server log, we see the delete NFS share command
    fails because of an invalid mount point and the next message
    tells the vSnap server restarted :
    
     [<timestamp>] INFO pid-xxxx vsnap.share Deleting share id 7717
     [<timestamp>] WARNING pid-xxxx vsnap.share Detected invalid
       mountpoint for volume id 63: /dev/mapper/centos-root
     [<timestamp>] INFO pid-yyyy vsnap.api
       ========== API server process started ==========
    
    When the underlying zfs dataset becomes unmounted for any
    reason, when the delete share command is started, the code first
    attempts to stop any process that still might be locking the
    share.
    Since the mountpoint has no underlying zfs dataset, it is
    detected as a regular directory.
    Therefore the process kill command also unexpectedly kills some
    processes using the root partition.
    This kills the vSnap service which automatically is then
    restarted as seen in the above vSnap log.
    Since it is not a crash, the /var/crash directory is empty, no
    kernel crash dump is generated as the OS itself continues to
    run.
    
    
    IBM Spectrum Protect Versions Affected:
    IBM Spectrum Protect Plus 10.1.5
    
    | MDVREGR 10.1.5.0-TIV_5737SPLUS |
    
    Initial Impact: Medium
    
    Additional Keywords: SPP, SPPLUS, TS003253326
    

Local fix

  • NA
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus level 10.1.3, 10.1.4, and 10.1.5.  *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in IBM Spectrum Protect Plus level     *
    * 10.1.5 patch1 and 10.1.6. Note that this is subject to       *
    * change at the discretion of IBM.                             *
    ****************************************************************
    

Problem conclusion

  • When a volume or a file share is deleted, vSnap first tries to
    kill any processes using that volume. This is done by invoking
    the Linux 'fuser' command to kill any processes using the volume
    mount point. Under certain conditions, the volume may not be
    mounted at the time of deletion. In this case, there is no
    distinct mount point for the user. The 'fuser' command then
    accidentally kills processes using the parent mount point which
    is the root (operating system) mount point. This can cause some
    services to crash.
    
    The problem has been resolved by adding a check for the volume
    mount point during volume/share deletion. The 'fuser' command is
    skipped if the volume is not mounted, this avoiding the problem
    of accidentally killing unrelated processes.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT31840

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2020-02-27

  • Closed date

    2020-03-25

  • Last modified date

    2020-03-25

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A10","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
30 January 2024