Troubleshooting
Problem
The related events training algorithm is not deterministic - i.e. two runs with exactly the same alerts can produce different numbers of groups (and policies created)
Symptom
The number of Temporal Grouping and Temporal Patterns policies will not be consistent, for consistent event data in Cassandra.
Diagnosing The Problem
Run training again and again, i.e. on the training pod
curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header 'X-TenantID: cfd95b7e-3bc7-4006-a4a8-a73a79c71255' -d '{ "properties": { "patterns.enabled": "true" , "ea.policies.deploy": false , "patterns.deploy": "false" , "patterns.outputRaw": true, "runner.spark.writeOutToDir": "/opt/spark/work/MissingPatterns1" }}' 'http://172.30.90.33:8080/1.0/training/train/related-events'
{"_executionTime":209,"response":"driver-20230508112211-0158"}
where 172.30.90.33 is the IP of the training service (oc get service | grep train)
See different policy numbers - change cluster_release_name below
export ADMIN_PASSWORD=$(oc get secret cluster_release_name-systemauth-secret -o jsonpath --template '{.data.password}' | base64 --decode)
export POLICY_URL=$(echo "https://$(oc get route | grep policies | awk '{print $2}')$(oc get route | grep policies | awk '{print $3}')"/system)/v1/cfd95b7e-3bc7-4006-a4a8-a73a79c71255/policies/system
env | grep ADMIN_PASSWORD
env | grep POLICY_URL
curl -X GET --header 'Content-Type: application/json' --header 'Accept: application/json' --user system:${ADMIN_PASSWORD} -vk ${POLICY_URL} > ALL_POLICIES.run2
curl -X GET --header 'Content-Type: application/json' --header 'Accept: application/json' --user system:${ADMIN_PASSWORD} -vk ${POLICY_URL} | grep related-events > ALL_RELATED_POLICIES.run2
curl -X GET --header 'Content-Type: application/json' --header 'Accept: application/json' --user system:${ADMIN_PASSWORD} -vk ${POLICY_URL} | grep pattern > ALL_PATTERN_POLICIES.run2
Resolving The Problem
The problem is resolved in NOI 1.6.9
For large data sets, the number of Spark Worker pods should be set to 6. This can be done by scaling the spark-slave deployment and setting the replicas=6.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSTPTP","label":"Netcool Operations Insight"},"ARM Category":[{"code":"a8m0z0000001jZTAAY","label":"NOI Netcool Operations Insights-\u003ECNEA Cloud Native Event Analytics"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
Modified date:
04 September 2023
UID
ibm17028501