About probable cause ranking

This AI algorithm identifies the alert with the greatest probability of being the cause of an incident.

It is enabled by default, so that information about the probable cause of a problem is automatically included in your incident details.

Description

Probable cause is a set of approaches and customizations that analyses alerts and topologies together. It then uses that understanding to try to determine the root cause of a problem. After the probable cause alert is identified, action can be taken on the resource that is generating that alert to quickly resolve the incident.

A number of techniques are used to make a probable cause determination. These are derived from topological relationships and natural language calculations. The natural language model has been built to classify the alerts in the golden signal classes, based on their summary. This information is then combined with the topological information (path calculation between the affected resources on which the alert exists). Then, the path and class are combined together to determine the rank of the alert, within the context of the other alerts in the incident.

  • Keyword in the Summary: For example, if the word "down" appears, it might be more indicative of the root cause than the word "up".

  • Severity: The highest severity normally means it's more likely to be indicative of what to look for in the root cause.

  • First to occur: Set off by default. Often the first alert to arrive in a group of alerts can be seen as the root cause of the problem, but it is not always the case.

  • The ability to inherit previous root causes from IBM Tivoli Network Manager (ITMN) root causes and CauseWeights that exist from scope-based alert grouping in Netcool. (For more information, see the Event Grouping section here.)

  • Scoring config: The following options contribute to probable cause determination:

    • PathCalculationEnabled: This option uses topological data to calculate the path between sources. You must create the topological resource group. For the resource group, set the alert correlation to true. This setting allows alerts to be correlated by toplogical correlation. When a problem is encountered in the system, a path is calculated between all the alerts in the group. This path is the shortest number of hops. After the path is calculated, a path score is calculated based of the edge types in the system.
    • wordEnabled: You can add or remove words from the system with an API. If a specified word is in the summary field of an alert, the word score is added.
      • Another option exists for case sensitivity.
      • The higher the score, the more likely a problem is to be the root cause.
      • The word score is the sum of the score. For example, if 2 words in the summary are found, and one word has a value of 100, and the other has a value of 10, the total word score is 110.
    • SeverityEnabled: The problem severity is accounted for. By default, the score is the severity multiplied by 10.
    • causeScoreEnabled: If the field CauseWeight is present in the details section of an alert then the causeScore is added to the total score in CauseWeight. If this field is not present in the details section, there is no effect. This field cannot be changed.
    • itnmproccessing: If this option is turned on and ITMN marked an alert as the root cause(NmosCauseType=1), then itnmcauseweight is added to the score.
    • firstBoost: If firstBoost is set to true, the amount that is specified in the firstBoostWeight is added to the alert that has the lowest firstoccurrence time.

The techniques work together to determine the root cause. One technique alone does not work in an environment as an alert might, or might not, have an associated topology.

Prerequisites

For topology and natural language probable cause, the alert data must have topological data and must also be topologically co-related. For more information, see Topological grouping.

Viewing rankings

Probable cause rankings from 1 (highest) to 3 (lowest) in order of probability can be viewed in the Incident overview UI.

Probable cause incidents
Figure. Probable cause Incidents Overview

They also display (showing rankings greater than 3) in the Alerts tab of the Incidents viewer.

Probable cause incidents
Probable cause Alerts tab

Rankings can also appear in ChatOps notifications. (For more information, see Probable cause in ChatOps.)

Language support

For information about supported languages for this algorithm, see Language support.