APAR status
Closed as program error.
Error description
Intermittently, jobs can stop because of insufficient available memory on the IBM Spectrum Protect Plus vSnap server version 10.1.7. When the workload is too high or the server is sized too small to complete the applied jobs, some jobs might stop with a message indicating that the vSnap server is unreachable. Even when the host has sufficient resources, the problem can surface but less frequently. The following example messages might be seen (here it is a replication job for a database application) : SUMMARY,<timestamp>,CTGGA2398,Starting job for policy <PolicyName> (ID:<PolicyID>). id -> <JobID>. IBM Spectrum Protect Plus version 10.1.7-3043. ... ERROR,<timestamp>,CTGGA1986,Unable to resolve database storage. Reason Could not connect to storage <vSnapHost>. Be sure storage is reachable. ERROR,<timestamp>,CTGGA1847,Unable to determine backup policy name from recovery points. Cannot proceed with job ERROR,<timestamp>,CTGGA1953,Error during copy. Reason: DB_REPLICATION_EXCEPTION_OCCURRED in the virgo log '/opt/virgo/serviceability/logs/log.log', the following corresponding message will be seen for some https request to the vSnap host: [<timestamp>] INFO .. Vsnap Call https://<vSnapHost>:8900/ api/system method GET [<timestamp>] INFO .. VSnap Call GET https://<vSnapHost>:8900/ api/system time Taken 1338 ms [<timestamp>] INFO .. reason : org.springframework.web.client. HttpClientErrorException: 401 UNAUTHORIZED [<timestamp>] INFO .. Status: :: 401 [<timestamp>] ERROR .. Unable to resolve database storage. Reason Could not connect to storage <vSnapHost>. Be sure storage is reachable. in the vSnap log, the root cause, insufficient memory, will be seen : [<timestamp>] ERROR pid-xxxx vsnap.api Traceback (most recent call last): File "/src/workspace/vsnap/api/core/common.py", line 51, in decorated File "/src/workspace/vsnap/common/util.py", line 228, in check_api_priv File "/src/workspace/vsnap/linux/system.py", line 406, in run_shell_command File "/usr/lib64/python3.6/subprocess.py", line 729, in __init__ restore_signals, start_new_session) File "/usr/lib64/python3.6/subprocess.py", line 1295, in _execute_child restore_signals, start_new_session, preexec_fn) OSError: [Errno 12] Cannot allocate memory Depending on the fluctuating workload, when sufficient memory is again available, the api on the vSnap host will again be able to fulfill https requests and jobs will be able to complete. IBM Spectrum Protect Plus messaging should be reporting the actual root cause of the failure. IBM Spectrum Protect Plus Versions Affected: IBM Spectrum Protect Plus 10.1.7 | MDVREGR 10.1.6 5737SPLUS | Initial Impact: Medium Additional Keywords: SPP, SPPLUS, TS004708004, memory, sizing
Local fix
1. Ensure the vSnap server is sized following the best practices listed in the BluePrints : https://www.ibm.com/support/pages/ibm-spectrum-protect-plus-blue prints 2. To recover : Run the following command on the vSnap host as 'serveradmin' user : sudo systemctl restart vsnap-api This command might take a long time before completing. OR Reboot the vSnap host
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.7. * **************************************************************** * PROBLEM DESCRIPTION: * * See Error Description. * **************************************************************** * RECOMMENDATION: * * Apply the fixing level when available. This problem was * * fixed in IBM Spectrum Protect Plus levels 10.1.7 ifix2 and * * 10.1.8. Note that this is subject to change at the * * discretion of IBM. * ****************************************************************
Problem conclusion
A memory leak in the vSnap API process used for handling PAM authentication caused an increase in RAM usage over time. Eventually this went on to cause failure to allocate new memory resulting in authentication failures when the IBM Spectrum Protect Plus server made API requests to the vSnap server. The problem has been resolved by correcting the memory leak.
Temporary fix
Comments
APAR Information
APAR number
IT35569
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A17
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-01-18
Closed date
2021-02-12
Last modified date
2021-02-12
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A17","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
31 January 2024