APAR status
Closed as program error.
Error description
After upgrading Spectrum Protect Plus server from 10.1.4 to 10.1.5 the vSnap server seems to crash and the vSnap service is being restarted. File shares cannot be deleted at the end of a backup job, and volumes cannot be deleted during a maintenance job. The same problem also accurs when attempting to delete the share or the volume manually from the vSnap CLI. The vSnap server does not respond, message 'connection refused' is posted. At the Spectrum Protect Plus job log following can be seen: INFO,CTGGA0634,Using storage volume spp_1006_2103_12a3456789b on controller vsnap.IBM.com for backup WARN,CTGGA1924,Resource access exception caught: org.springframework.web.client.ResourceAccessException: I/O error on DELETE request for "https://vsnap.IBM.com:8900/ api/share/7771": vsnap.IBM.com:8900 failed to respond; nested exception is org.apache.http.NoHttpResponseException: vsnap.IBM.com:8900 failed to respond WARN,UNABLE_REMOVE_CLIENT_ACCESS_VOLUME WARN,CTGGA1924,Resource access exception caught: org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://vsnap.IBM.com:8900/ api/share": Connect to vsnap.IBM.com:8900 [vsnap.IBM.com/ ww.xxx.yy.zz] failed: Connection refused; nested exception is org.apache.http.conn.HttpHostConnect Exception: Connect to vsnap.IBM.com:8900 [vsnap.IBM.com/ww.xxx.yy.zz] failed: Connection refused (Connection refused) WARN,CTGGA1924,Resource access exception caught: org.springframework.web.client.ResourceAccessException: I/O error on POST request for "https://vsnap.IBM.com:8900/ api/volume/45/share": Connect to vsnap.IBM.com:8900 [vsnap. IBM.com/ww.xxx.yy.zz] failed: Connection refused; nested exception is org.apache.http.conn.HttpHostConnectException: Connect to vsnap.IBM.com:8900 [vsnap.IBM.com/ww.xxx.yy.zz] failed: Connection refused (Connection refused) ERROR,CTGGA2086,Storage exception - Failed to create storage share on volume spp_1006_2103_12a3456789b null ERROR,CTGGA2403,Backup of vm IBM-VM failed target storage volume name spp_123456789a_V6000. Error: Failed to create storage share on volume spp_1006_2103_12a3456789b null ERROR,CTGGA0076,Unprotected vm: IBM-VM. Last error: [Volume already created however unable to access storage server] {1} In the vSnap server log, we see the delete NFS share command fails because of an invalid mount point and the next message tells the vSnap server restarted : [<timestamp>] INFO pid-xxxx vsnap.share Deleting share id 7717 [<timestamp>] WARNING pid-xxxx vsnap.share Detected invalid mountpoint for volume id 63: /dev/mapper/centos-root [<timestamp>] INFO pid-yyyy vsnap.api ========== API server process started ========== When the underlying zfs dataset becomes unmounted for any reason, when the delete share command is started, the code first attempts to stop any process that still might be locking the share. Since the mountpoint has no underlying zfs dataset, it is detected as a regular directory. Therefore the process kill command also unexpectedly kills some processes using the root partition. This kills the vSnap service which automatically is then restarted as seen in the above vSnap log. Since it is not a crash, the /var/crash directory is empty, no kernel crash dump is generated as the OS itself continues to run. IBM Spectrum Protect Versions Affected: IBM Spectrum Protect Plus 10.1.5 | MDVREGR 10.1.5.0-TIV_5737SPLUS | Initial Impact: Medium Additional Keywords: SPP, SPPLUS, TS003253326
Local fix
NA
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.3, 10.1.4, and 10.1.5. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description. * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in IBM Spectrum Protect Plus level * * 10.1.5 patch1 and 10.1.6. Note that this is subject to * * change at the discretion of IBM. * ****************************************************************
Problem conclusion
When a volume or a file share is deleted, vSnap first tries to kill any processes using that volume. This is done by invoking the Linux 'fuser' command to kill any processes using the volume mount point. Under certain conditions, the volume may not be mounted at the time of deletion. In this case, there is no distinct mount point for the user. The 'fuser' command then accidentally kills processes using the parent mount point which is the root (operating system) mount point. This can cause some services to crash. The problem has been resolved by adding a check for the volume mount point during volume/share deletion. The 'fuser' command is skipped if the volume is not mounted, this avoiding the problem of accidentally killing unrelated processes.
Temporary fix
Comments
APAR Information
APAR number
IT31840
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A10
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2020-02-27
Closed date
2020-03-25
Last modified date
2020-03-25
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A10","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
30 January 2024