APAR status
Closed as program error.
Error description
On high work load in the K8s / OCP cluster some REST requests or response between the IBM Spectrum Protect Plus server and the BAAS agent pod might be lost. If this happens too often, the backup or restore operation may hang. It is not possible to cancel the operation at this state. To diagnose this problem, download the job log file (i.e. click on "Download .zip"). Check in the virgo log file for increasing number of callback expections. If the maximum of 10 is reached, it indicates the described problem. The messages look like: [2022-03-30T17:26:28.426Z] INFO pool-181-thread-1 c.catalogic.ecx.remoteexecutor.impl.RemoteExecutorImplRestAgent 1645498341081 Callback exceptions count: 10 Note: K8s is only affected when using Ingress. Affected versions: IBM Spectrum Protect Plus 10.1.10 and later
Local fix
Ensure that the ingress controller reliably route the REST requests and responses between the IBM Spectrum Protect Plus server and the BAAS agent pod by increasing the number of controllers.
Problem summary
**************************************************************** * USERS AFFECTED: * * IBM Spectrum Protect Plus level 10.1.10 * **************************************************************** * PROBLEM DESCRIPTION: * * See ERROR DESCRIPTION * **************************************************************** * RECOMMENDATION: * * Apply fixing level when available. This problem is currently * * projected to be fixed in IBM Spectrum Protect Plus level * * 10.1.10.2 and 10.1.11. * * Note that this is subject to change at the discretion of * * IBM. * ****************************************************************
Problem conclusion
The code has been enhanced to handle the situation more gracefully. However it may still appear that the backup or restore operation hangs. In this case restart the BAAS agent pod. After that Spectrum Protect Plus will finish the operation. To avoid this in general, ensure that the ingress controller reliably route the REST requests and responses between the IBM Spectrum Protect Plus server and the BAAS agent pod by increasing the number of controllers.
Temporary fix
Comments
APAR Information
APAR number
IT40454
Reported component name
SP PLUS
Reported component ID
5737SPLUS
Reported release
A1A
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-03-31
Closed date
2022-04-28
Last modified date
2022-04-28
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Modules/Macros
Agent k8s ocp
Fix information
Fixed component name
SP PLUS
Fixed component ID
5737SPLUS
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A1A","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
01 February 2024