About event grouping
Three AI algorithms are provided that group events together and present the groups within a single incident.
Description
The three algorithms are the following:
Temporal grouping
This AI algorithm is an unsupervised learning algorithm that groups alerts that are discovered to co-occur over time. When a problem arises, there are typically multiple parts of a system or environment that are impacted. When alerts in different areas co-occur, it makes sense to look at them together and treat them as one problem to try to determine what might have happened.
Grouping co-occurring alerts together reduces the number of tickets and incidents opened and the number of people looking at the same problem, thereby significantly reducing noise in your monitoring systems. It helps you to understand the context of an issue so you can prioritize, triage, and resolve it more quickly.
To train this algorithm, complete some initial setup tasks. This AI algorithm also needs at least three months of historical event data to learn alert patterns and discover groups. For more information, see Setting up training for temporal grouping.
This algorithm is enabled so when related alerts are grouped based on when they occur, Site reliability engineers (SREs) and other users responsible for application and service availability will be able to view the details in the event Viewer, as described in About alerts.
Topological grouping
This AI algorithm is an unsupervised learning algorithm that groups alerts based on the resource groups in which those alerts occur. For example if you have a resource group made up of all the resources within a given Kubernetes namespace, then any alerts on pods, microservices, or other resources in that namespace will be grouped together in a single topological group.
Topological grouping helps you understand when alerts are connected based on their topology, providing valuable context for why related alerts might occur together.
This algorithm is enabled so when related alerts are grouped based on their topology, Site reliability engineers (SREs) and other users responsible for application and service availability will be able to view the details in the Alert Viewer, as described in About alerts.
Note:
- Topology resources in the same group do not need to be connected for alert grouping to happen.
- Alerts that target topology resources in the same application (service) are also grouped.
- Alerts must occur within a 15 minute time window to get correlated.
Topological grouping requires that there be a topological group creator.
Scope-based grouping
This AI algorithm is an unsupervised learning algorithm that automatically groups alerts relating to an incident if they have the same defined scope and occur during the same period of time. A scope can be used to identify where events originate based on a common attribute, for example, the location of a server room.
By understanding when alerts are related based on both time and location, you can more quickly diagnose incidents.
This algorithm is enabled so when related alerts are grouped based on their scope, Site reliability engineers (SREs) and other users responsible for application and service availability will be able to view the details in the Alert Viewer, as described in About alerts.
Prerequisites
The three event grouping algorithms require event data. In the case of the Topological grouping algorithm, topological data is also required. This topological data must be correlated with the event data.
For correlations, they must exist within the same topological group or template.
Event data
In most cases your event data will be loaded into IBM Cloud Pak® for AIOps from a IBM® Netcool® Operations Insight®-based data source, using one of the methods listed in the following table.
Data loading method | For more details, see: |
---|---|
Connecting on-premises probes | Connecting on-premises probes to the in-cluster ObjectServer |
Connecting an external ObjectServer | Connecting an external ObjectServer into the system using an XML Gateway |
You can also load event data into the system using the generic custom and Kafka methods.
- For more information on the custom connection, see Creating custom connections.
- For more information on the Kafka connection, see Creating Kafka connections.
Topological data
Load topology data into IBM Cloud Pak® for AIOps by configuring observers. For more information, see Defining observer jobs.
Language support
For information about supported languages for this algorithm, see Language support.