Troubleshooting
Problem
I tried to install Metric Anomaly Detection for NOI 1.6.6. I am following the steps from here: https://www.ibm.com/docs/en/noi/1.6.6?topic=analytics-training-metric-data
When I tried to run step 6, it ended up with ERROR:
pod noi/enablemetrictraining terminated (Error)
The step I ran was:
kubectl run enablemetrictraining -it --restart=Never --env=LICENSE=accept --command=true \
--overrides='{"apiVersion":"v1", "spec":{"imagePullSecrets":[{"name":"noi-registry-secret"}]}}' \
--image=$CONTAINER_IMAGE --image-pull-policy=Always enableTrainingSchedule.sh \
-- -r $RELEASE -a metric-manager-anomaly-detection -t cfd95b7e-3bc7-4006-a4a8-a73a79c71255
The pod status was:
oc get pod | grep enable
enablemetrictraining 0/1 Error 0 10m5s
enablesinglemetrictraining 0/1 Completed 0 15m8s
Diagnosing The Problem
training was scheduled successfully by running step 2 in https://www.ibm.com/docs/en/noi/1.6.6?topic=analytics-training-metric-data:
First, verify that the
TRAINER_IP=$(oc get svc|grep ibm-hdm-analytics-dev-trainer|awk '{print $3}')
oc rsh $(oc get po |grep spark-slave|head -1|awk '{print $1}') curl -X GET --header 'Content-Type: application/json' \
--header 'Accept: application/json' --header 'X-TenantID: cfd95b7e-3bc7-4006-a4a8-a73a79c71255' \
"http://${TRAINER_IP}:8080/1.0/training/analytics/metric-manager-anomaly-detection/schedule"
If the training is successful, you see the following output:
{"retrainingIntervalMinutes":1440,"enabled":true}
From the output of running step 6, you should see the following message if the training was scheduled successfully:
Schedule Set correctly
Resolving The Problem
If the training was scheduled successfully, you can ignore the pod error. The workaround is to use the following command to remove the problematic pod:
oc delete pod enablemetrictraining -n <namespace>
The fix for the pod error is in NOI 1.6.8.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSJGDOB","label":"IBM Watson AIOps"},"ARM Category":[{"code":"a8m3p0000006xevAAA","label":"Watson AIOps-\u003EEvent Manager \/ NOI-\u003EMetric Anomaly Detection (MAD)"}],"ARM Case Number":"TS011829247","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Product Synonym
MAD
Was this topic helpful?
Document Information
Modified date:
02 April 2023
UID
ibm16955035