IBM Support

"Phase: Error" status for Event Manager

Troubleshooting


Problem

The Installed Operator for Event Manager reports its high level status, which is determined from NOIFormation, CEMFormation and ASMFormation.
The ideal status is "OK"
However, if Phase is "Error", it can be difficult to determine the root cause of the problem.

Symptom

There are two ways to see the Phase setting
1. On the UI, under
Operators->Installed Operators->IBM Cloud Pak for AIOps Event Manager->All Instances
Look at the Phase setting, under the status column.
2. On the command line, use
oc describe noi
or
oc describe noihybrid
to see the Phase setting
As well as the Phase setting the output shows a message, like the example below:
  Message:  Templating failed 1.6.9/ibm-netcool-prod/charts/ibm-hdm-analytics-dev/charts/ibm-redis/values.yaml:template: values.yaml:91:39: executing "values.yaml" at <include "umbrella-chart.ibm-redis.affinities" .>: error calling include: template: no template "umbrella-chart.ibm-redis.affinities" associated with template "baseibm-netcool-prod"
  Phase:    Error
  Versions:
    Available:
      Versions:
        1.6.2,1.6.3,1.6.3.1,1.6.3.2,1.6.3.3,1.6.4,1.6.5,1.6.5.1,1.6.6,1.6.7,1.6.8,1.6.9
    Reconciled:  1.6.9
The important point to note is that the Message might not be the root cause of the Phase error.
This output is collected by the must gather under PROD_NAMESPACES/releasename/noihybrid or PROD_NAMESPACES/deploymentname/noi in the .desc files.

Cause

The output of the "oc describe" command, for NOI or NOIHybrid displays the Phase status, and in the event of an Error status, will report the first error message from the NOI Operator pod as the cause of the Phase problem.
This is misleading, as the first error message may or may not be the root cause of the problem.
A Phase of Error means that something went wrong with the templating of the deployment artifacts.
Not all error messages are significant in this regard.

Diagnosing The Problem

Resolving The Problem

To find the real root cause of the Phase error, investigate the output for the CEM, ASM and NOI formations, by using the following commands:
oc describe asmformation
oc describe cemformation
oc describe noiformation
This output is collected by the must gather under PROD_NAMESPACES/noihybrid or PROD_NAMESPACES/noi in the .desc files, in the cemformation, asmformation and noiformation folders.
Any errors in the CEM or ASM formation will be fed into NOI.
Sometimes NOI reports an error because of a transient issue in CEM or ASM. In this case, restarting the NOI Operator will resolve the Phase error in NOI.
However, if the error is not transient, the root cause needs to be investigated, by looking at the output for the CEM, ASM amd NOI formations.
To do this, look at the Status for each component.
The output below shows the status of some of the sub components for CEMFormation. It shows that some of these sub phases are OK but some are Reconciling.
The non-OK components are CronJobs and Deployments. The next step is to looks at the logs for the sub components which are failing, under the  PROD_NAMESPACES/releasename directory.
For example, the logs for deployment are collected in the mustgather output under PROD_NAMESPACES/releasename/deployment.apps and for statefulsets look under PROD_NAMESPACES/releasename/statefulset.apps
Status:
  Components:
    Kind:  ClusterRole
    Status:
      Phase:  OK
    Kind:     ClusterRoleBinding
    Status:
      Phase:  OK
    Kind:     ConfigMap
    Status:
      Phase:  OK
    Kind:     CronJobs
    Status:
      Phase:  Reconciling
    Kind:     Deployments
    Status:
      Phase:  Reconciling

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTPTP","label":"Netcool Operations Insight"},"ARM Category":[{"code":"a8m500000008a6cAAA","label":"NOI Netcool Operations Insights"}],"ARM Case Number":"TS013994520","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.6.0;1.6.1;1.6.10;1.6.2;1.6.3;1.6.4;1.6.5;1.6.6;1.6.7;1.6.8;1.6.9"}]

Document Information

Modified date:
07 November 2023

UID

ibm17051235