How alerts work

Alerting functions examine the attributes, capacity, and performance of resources. If the conditions that are defined for alerts are met, the actions that are specified for the alert are taken. Typically, the actions include sending a notification. For example, if the status of a SAN Volume Controller storage system changes to Error, an alert is displayed in the Alerts page in the GUI, and an email might be sent to a storage administrator. For alerting on switches, fabrics, hosts, and virtual machines, asset, configuration, status, and performance information is examined.

Triggering conditions for alerts

The conditions that trigger alert notifications depend on the type of resource that you are monitoring. In general, the following types of conditions can trigger alerts:
  • An attribute or configuration of a resource changed
  • The capacity of a resource fell outside a specified range
  • The performance of a resource fell outside a specified range
  • The storage infrastructure was changed, such as a new or removed resource
  • Data is not being collected for a resource

Learn more about the conditions that can trigger alerts for each type of resource.

Note:
  • The condition alert for a storage system is suppressed when a component status alert is generated to avoid duplicate alerts for the same triggering condition. However, the overall health of the storage system is still determined based on the status of its internal components.
  • In the Modern UI, alerts shown in the overall Alerts page are grouped when they share the same component, storage system, alert condition, alert name, and severity. Each group appears as a single row to simplify scanning. Acknowledging or removing a group also acknowledges or removes all alerts within it. For example, performance monitor error alerts generated at different times for the same storage system are grouped into one row on the overall Alerts page. Grouping applies only to the alerts displayed on the current paginated view. For example, if you set the page size to 100 items, only similar alerts within those 100 entries are grouped.
  • A blue marker appears on an alert group when it contains alerts from the last 7 days that you have not yet viewed. The marker clears after you expand and collapse the group and reappears only when a new alert is added to that group.
  • The paginated count reflects the actual number of alerts, not the number of rows shown on the Alerts page. For example, if the page size is set to 50 entries, up to 50 alerts are displayed, but not necessarily 50 rows, because similar alerts are grouped into a single row, reducing the row count

Manage maintenance

You can mark or schedule a storage system for maintenance through the modern UI. During active maintenance, alerts are suppressed, except for ransomware and workload anomaly alerts, which continue to trigger. For storage systems scheduled for maintenance but not yet started maintenance, all alerts remain active. When the maintenance ends or is manually exited, alerts resume based on current alert definitions.

Only users with the Admin role can mark a storage system for maintenance. Users with Monitor role can view the maintenance status of storage systems. To mark a storage system for maintenance in the modern UI, click Main menu, expand Inventory, select Storage systems. Click the vertical dotted menu at the end of the targeted storage system row, and select Manage maintenance.

Options available when you click Manage maintenance:
  1. Start
    • Available only if the selected storage system is not already in maintenance mode.
    • Clicking this opens a confirmation modal to immediately mark the storage system for maintenance.
  2. Schedule:
    • Available only if the selected storage system is not already in maintenance mode and not scheduled for maintenance.
    • Opens a modal where you can specify the start and end time for the maintenance window.
  3. Stop
    • Available when the selected storage system is already in maintenance mode.
    • Allows you to stop the maintenance mode immediately.
  4. Exit maintenance:
    • Available when the selected storage system is scheduled for maintenance.
    • Allows you to exit maintenance schedule immediately.

Multiple selection behavior

You can select multiple storage systems to start, schedule, stop, or exit maintenance mode:
  1. If all the selected systems are not in maintenance or schedule maintenance, the Start maintenance and Schedule maintenance options appear.
  2. If all the selected systems are in maintenance mode, the Stop maintenance option appears.
  3. If all selected systems are scheduled for maintenance, the Start maintenance and Exit maintenance options appear.
  4. If a mixed set (some in, some not in maintenance) is selected, no maintenance options are shown.

Event processing

Conditions that generate alerts are detected when data is collected from storage systems and during event processing. By default, the metadata that is collected from storage systems is refreshed every 24 hours. For some storage systems such as IBM Storage Accelerate and the XIV, events are polled every minute from the resource. For IBM Storage Scale, status change events are polled frequently, typically within minutes. For other resources, events are subscription-based, where the resource itself or a data source such as a CIM agent sends the events to IBM Storage Insights Pro when conditions change on the resource.

Examples of storage systems that use subscription-based event processing include SAN Volume Controller, Storwize V7000, Storwize V7000 Unified, and IBM Storage FlashSystem V9000. For these storage systems, a probe is automatically run when many events are received from the storage system in a short time period. To avoid performance bottlenecks, probes are run only every 20 minutes.

Determining which type of alert to use

To determine whether to define alerts in alert policies, for individual resources, or for the set of resources that are included in an application or general group, follow these guidelines:
Which type of alerts to use? Scenario Learn more icon Learn more
Alerts defined in alert policies You want to manage alert conditions and notification settings for a group of resources of the same type. For example, if you have several SAN Volume Controller storage systems in your environment, you can create an alert policy so that the alert definitions are the same for all of the SAN Volume Controller systems.

If you have some SAN Volume Controller systems in a test environment, and some in a production environment, you can use one alert policy for the test environment, and another for the production environment.

Resource alerts You want to receive alert notifications about changes for a specific resource, or its internal resources. For example, for a storage system, you can alert on the attributes of the system itself, and on the attributes of its volumes, pools, ports, and other internal resources.

If you define an alert for a resource, for example, a performance alert for the ports on a storage system, the alert threshold value applies to all of the ports on the storage system. You cannot apply different alert thresholds to internal resources of the same type on a resource.

Application alerts Use application alerts in the following scenarios:
  • You want to receive alert notifications for all the resources of a certain type in an application. For example, if your application uses multiple storage systems, you can define the storage system alerts once for the application and the alerts apply to all the storage systems. If you later add more storage systems to the application, the existing application alerts apply to those storage systems also.
  • You want to apply different thresholds to internal resources of the same type on a storage system. For example, you have production applications and test applications that use volumes on a SAN Volume Controller. The production applications require response times of 6 milliseconds or less while the test applications can tolerate response times up to 30 milliseconds. You can use application alerts to set separate response time thresholds for volumes used by the different applications, depending on the needs of that application.
General group alerts Use general group alerts in the following scenarios:
  • You want to receive alert notifications about changes for a subset of the resources of a particular type. For example, you can detect when the ports that are used for replication on your SAN Volume Controller have insufficient buffer-to-buffer credit. Alert notifications are not generated for ports that are not used for replication.
  • You want to receive alert notifications about changes for a group of resources that are logically related. You can group all the storage systems at a specific location or all the servers that use a particular operating system. For example, you can receive alert notifications when the used capacity of any of your Linux® servers exceeds 80%.
Tip: If a resource is in both an alert policy and a general group, the alert definitions for both the policy and the group are applied.