IBM Support

Encountered error when trying to enable Metric Anomaly Detection on Event Manager on OCP.

Troubleshooting


Problem

I tried to install Metric Anomaly Detection for NOI 1.6.6. I am following the steps from here: https://www.ibm.com/docs/en/noi/1.6.6?topic=analytics-training-metric-data
When I tried to run step 6, it ended up with ERROR:
pod noi/enablemetrictraining terminated (Error)
The step I ran was:
kubectl run enablemetrictraining -it --restart=Never --env=LICENSE=accept --command=true  \
--overrides='{"apiVersion":"v1", "spec":{"imagePullSecrets":[{"name":"noi-registry-secret"}]}}' \
--image=$CONTAINER_IMAGE --image-pull-policy=Always enableTrainingSchedule.sh \
-- -r $RELEASE -a metric-manager-anomaly-detection -t cfd95b7e-3bc7-4006-a4a8-a73a79c71255
The pod status was:
oc get pod | grep enable

enablemetrictraining        0/1  Error      0  10m5s
enablesinglemetrictraining  0/1  Completed  0  15m8s

Diagnosing The Problem

First, verify that the training was scheduled successfully by running step 2 in https://www.ibm.com/docs/en/noi/1.6.6?topic=analytics-training-metric-data:

TRAINER_IP=$(oc get svc|grep ibm-hdm-analytics-dev-trainer|awk '{print $3}')
oc rsh $(oc get po |grep spark-slave|head -1|awk '{print $1}') curl -X GET --header 'Content-Type: application/json' \
 --header 'Accept: application/json' --header 'X-TenantID: cfd95b7e-3bc7-4006-a4a8-a73a79c71255' \
 "http://${TRAINER_IP}:8080/1.0/training/analytics/metric-manager-anomaly-detection/schedule"
If the training is successful, you see the following output:
{"retrainingIntervalMinutes":1440,"enabled":true}

From the output of running step 6, you should see the following message if the training was scheduled successfully:

Schedule Set correctly

Resolving The Problem

If the training was scheduled successfully, you can ignore the pod error. The workaround is to use the following command to remove the problematic pod:

oc delete pod enablemetrictraining  -n <namespace> 

The fix for the pod error is in NOI 1.6.8.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSJGDOB","label":"IBM Watson AIOps"},"ARM Category":[{"code":"a8m3p0000006xevAAA","label":"Watson AIOps-\u003EEvent Manager \/ NOI-\u003EMetric Anomaly Detection (MAD)"}],"ARM Case Number":"TS011829247","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Product Synonym

MAD

Document Information

Modified date:
02 April 2023

UID

ibm16955035