Suppress alerts

Some alerts might not be important for monitoring your organization's environment. For alerts that do not need to be viewed or acted on, use suppression policies to reduce noise for your operations teams by suppressing unwanted alerts. When an alert is suppressed in Cloud Pak for AIOps, the alert.suppressed flag is set to true. The alert is still present in the system and can be viewed in the Alert Viewer, but the alert is filtered out of the view by default. Incident creation and runbook policies do not run for suppressed alerts. Scope-based grouping policies continue to run.

Limitations

The insights conditions have limitations. These limitations depend on the insight.

  • Anomaly insights are available on all metric anomaly alerts and on certain log anomaly alerts.
  • Only a subset of log anomaly alerts identify an exception or error code.
  • The statistical baseline log anomaly algorithm typically identifies these exceptions or error codes.

You can also give feedback to Cloud Pak for AIOps to indicate whether an alert was correctly identified and add a suppression policy to suppress low voted items. For more information about giving feedback, see Alert details.

Enabling X in Y alert rate suppression

Many alerts can represent problems where performance thresholds were breached, for example, a switch that reaches its maximum bandwidth capacity during a peak period. These temporary threshold breaches are often expected, and the business policy might be to ignore them unless the breach is sustained for a prolonged but specific period. However, if such alerts are sustained beyond these specified time periods, operators must take action. In this classic X in Y alert rate scenario, the operator is only interested in alerts that occur more than X times in Y minutes. When the alert rate breaches the threshold, the described alerts should not be suppressed. Until the threshold is breached, the alerts do not need to be acted upon.

In suppression policies, you enable X in Y alert rate suppression under actions in the policy editor. Select Suppress the described alerts until:. Then, specify your values for X (number of alerts) and Y (minutes).

Attention: Please be aware about alert states that contribute to the X count. The X value is incremented for deduplicated problem and resolution alerts. Therefore, if a deduplicated alert flaps 10 times (alternating between open and clear) "X" would contain a value of 20 (not 10 for just the problem alerts).

About this task

In this example, preproduction, development, and test environments are monitored by the same monitoring tools as the production environments. The development and test environments naturally have ongoing changes that might trigger various warnings and notifications all the time. These warnings and notifications are relayed to Cloud Pak for AIOps as alerts. However, these alerts do not require action as they are from resources that change constantly as part of the preproduction work. The monitoring tools are usually set up to avoid sending such alerts to the operations teams. However, if such alert information is sent to Cloud Pak for AIOps, for example, due to a misconfiguration, then the operations team might face unnecessary noise.

To save your operations team from being distracted by alerts from such environments, you can create a suppression policy to suppress alerts that come from the monitored resources that make up these environments.

Important:

  • Unlike other Cloud Pak for AIOps policy templates, suppression policies contain a single condition set.
  • The supported Anomaly insights properties for suppression policies are displayed in the tree view of the Property field. To display the available options, click Add condition > Alert insight and place the cursor in the Property field. For more information, see Supported Anomaly insights properties.

Example

  1. Click the navigation icon to go to the main navigation menu.

  2. In the main navigation menu, click Operate > Automations.

  3. Click Create policy.

  4. Click the Suppress alerts tile.

  5. Enter a name in Policy name, for example, the name can be "Suppress events from preproduction environment". You can also add an explanation of the policy in Description to help you and others understand the purpose of the policy, for example, "Suppress alerts from hosts in the preproduction environment to prevent incidents from being created for development and test hosts".

    User-created suppression policies are always the first user policies to be run. The order is set to 0 and cannot be changed.

  6. To define the conditions for alerts to activate alert suppression, click Add condition and select Alert property or Alert insight attributes. For this example, Alert property is used.

  7. For each attribute, select values for Property, Operator, Matches, and Value. For Alert insights attributes, also select an Insight type value, such as Anomaly or Business criticality.

    For this example, select alert.resource.hostname from the Property list .You can type hostname, and the system shows in the property list all alert properties that contain the text hostname. Then, select alert.resource.hostname.

  8. From the Operator list, select starts with.

  9. From the Matches list, select any of.

  10. In the Value field, enter dev and select String: dev. In the same field, enter test and select String: test.

Suppression policy example
Figure. Supression policy

  1. Select Always suppress the described alerts or Suppress the described alerts until and specify the number of identical alerts or how many alerts occur within a time window of minutes.
  2. Click Create policy.

New and updated policies can take up to 1 - 3 minutes to take effect.

Note: Suppression occurs on the alert level, not the event level. If a suppression policy for a specific condition is enabled, any matching alerts are suppressed. If the policy is removed later, any future matching alerts are not suppressed, and any existing alerts are not unsuppressed. If the suppressed alert is not closed, for example, if the problem is still active, any future events are deduplicated into the suppressed alert until the problem is resolved.