IBM Support

IT40454: CONTAINER BACKUP OR RESTORE MAY HANG DUE TO CONNECTION LOSS

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • On high work load in the K8s / OCP cluster some REST requests or
    response between the IBM Spectrum Protect Plus server and the
    BAAS agent pod might be lost.
    If this happens too often, the backup or restore operation may
    hang. It is not possible to cancel the operation at this state.
    
    To diagnose this problem, download the job log file (i.e. click
    on "Download .zip"). Check in the virgo log file for increasing
    number of callback expections.
    If the maximum of 10 is reached, it indicates the described
    problem. The messages look like:
    [2022-03-30T17:26:28.426Z] INFO  pool-181-thread-1
    c.catalogic.ecx.remoteexecutor.impl.RemoteExecutorImplRestAgent
    1645498341081 Callback exceptions count: 10
    
    Note: K8s is only affected when using Ingress.
    
    Affected versions: IBM Spectrum Protect Plus 10.1.10 and later
    

Local fix

  • Ensure that the ingress controller reliably route the REST
    requests and responses between the IBM Spectrum Protect Plus
    server and the BAAS agent pod by increasing the number of
    controllers.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * IBM Spectrum Protect Plus level 10.1.10                      *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See ERROR DESCRIPTION                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in IBM Spectrum Protect Plus level     *
    * 10.1.10.2 and 10.1.11.                                       *
    * Note that this is subject to change at the discretion of     *
    * IBM.                                                         *
    ****************************************************************
    

Problem conclusion

  • The code has been enhanced to handle the situation more
    gracefully. However it may still appear that the backup or
    restore operation hangs. In this case restart the BAAS agent
    pod. After that Spectrum Protect Plus will finish the operation.
    
    To avoid this in general, ensure that the ingress controller
    reliably route the REST requests and responses between the IBM
    Spectrum Protect Plus server and the BAAS agent pod by
    increasing the number of controllers.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT40454

  • Reported component name

    SP PLUS

  • Reported component ID

    5737SPLUS

  • Reported release

    A1A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2022-03-31

  • Closed date

    2022-04-28

  • Last modified date

    2022-04-28

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Modules/Macros

  • Agent    k8s      ocp
    

Fix information

  • Fixed component name

    SP PLUS

  • Fixed component ID

    5737SPLUS

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSNQFQ","label":"IBM Spectrum Protect Plus"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"A1A","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
01 February 2024