Planning an installation of IBM Cloud Pak for AIOps on Linux
Learn about the system requirements for an installation of IBM Cloud Pak for AIOps on Linux.
Licensing
License usage tracking is required. The aiopsctl
tool deploys the IBM Cloud Pak foundational services License Service on your Linux cluster. This background service collects and stores the license usage information for tracking
license consumption and for audit purposes. For more information, see Licensing.
Hardware requirements
IBM Cloud Pak for AIOps can be installed on Red Hat® Enterprise Linux® 9 only. The hardware architecture for installing IBM Cloud Pak for AIOps must be AMD64.
For production-sized deployments, multiple Linux nodes are required, and IBM Cloud Pak for AIOps is installed on this cluster of nodes. The control plane node coordinates the running of IBM Cloud Pak for AIOps across the other nodes, and is the entry point into the product. Worker nodes provide more compute resources to run IBM Cloud Pak for AIOps services. The cluster must be reserved for the sole use of IBM Cloud Pak for AIOps.
You must ensure that the clocks on your Linux cluster are synchronized. Each node in the cluster must have access to an NTP server to synchronize their clocks. Discrepancies between the clocks on the nodes can cause IBM Cloud Pak for AIOps to experience operational issues.
A production sized base deployment of IBM Cloud Pak for AIOps on a Linux cluster requires the following hardware:
Resource | |
---|---|
Node count | 6 |
Total vCPU | 130 |
Total memory (GB) | 310 |
Processing abilities
Expand the following sections to find out about the processing abilities of a production-sized deployment of IBM Cloud Pak for AIOps.
Supported resource number and throughput rates for starter and production deployments
The following table details the number of records, events, Key Performance Indicators (KPIs), and resources that can be processed by a production-sized deployment of IBM Cloud Pak for AIOps. This includes resource and throughput values for the AI algorithms.
Component | Resource | Production-sized deployment |
---|---|---|
Metric anomaly detection | Maximum throughput - (KPIs) for all metric integrations |
120,000 |
Events (through Netcool integration) | Steady state event rate throughput per second Burst rate event throughput per second |
150 250 |
Automation runbooks | Fully automated runbooks run per second | 2 |
Topology management | Maximum number of topology resources | 5,000,000 |
UI users | Active users supported | 20 |
Standing alert count | Number of stored alerts | 200,000 |
Notes:
- Event rates in the preceding table assume a deduplication rate of 10 to 1 (10% unique events). For example, a rate of 100 alerts per second sent to IBM Cloud Pak for AIOps can be the end result of an initial 1,000 alerts per second before deduplication and other filtering is applied.
- For metric anomaly detection, the number of key performance indicators (KPIs) that can be processed for each deployment size is shown, for an aggregation period of 5 minutes and a training period of 4 weeks.
- If you are using additional integrations for metric anomaly detection with IBM Cloud Pak for AIOps, you can use default available policies to further refine the volume of data routed for issue resolution lifecycle actions by your users.
You can also create custom policies tailored for your environment. For instance, you can use custom suppression policies to help determine which anomalies should be raised as alerts for user action. For more information about custom
policies, see Suppress alerts
.
- The events (through Netcool integration) throughput rates represents a refined volume of alerts that corresponds to a worst case scenario where the ratio of IBM Tivoli Netcool/OMNIbus events to IBM Cloud Pak for AIOps alerts has no deduplication, and is essentially a 1:1 mapping of events to alerts. However, in most production deployments, the correlation and deduplication on the IBM Tivoli Netcool/OMNIbus server side reduces the volumes of alert data that requires processing within IBM Cloud Pak for AIOps. As part of further optimizing the workload of data presented to IBM Cloud Pak for AIOps, additional IBM Tivoli Netcool/OMNIbus probe rules can filter out events of no interest to IBM Cloud Pak for AIOps. For instance, typical IBM Tivoli Netcool/OMNIbus maintenance events are filtered out as they are not relevant on the IBM Cloud Pak for AIOps side.
Important:
- If you are using the File observer for more than 600,000 resources, then additional resources are required. For more information, see Configuring the File observer
- For 200,000 stored alerts, it is recommended to set *IR_UI_MAX_ALERT_FETCH_LIMIT* to a maximum value of 10,000 to avoid performance impacts. For more information, see Restricting the number of alerts returned by the data layer to the Alert Viewer
Event, alert, and incident rates
IBM Cloud Pak for AIOps include robust capabilities for managing events from your various applications, services, and devices. If you are integrating IBM Cloud Pak for AIOps with IBM Tivoli Netcool/OMNIbus the benefits that you can leverage for event management are significantly increased. This integration can give you end-to-end alert processing with an on-premises IBM Tivoli Netcool/OMNIbus server so that you can complete part of the event and incident management lifecycle on the IBM Tivoli Netcool/OMNIbus server before events are processed and delivered for action in IBM Cloud Pak for AIOps.
By default, IBM Tivoli Netcool/OMNIbus policies and triggers, such as correlation and deduplication activities, can execute to "pre-process" event workloads, thereby reducing the overall volume of active events on the IBM Tivoli Netcool/OMNIbus server. This overall volume presents a refined (event) workload for subsequent processing within the overall incident resolution (IR) lifecycle. On the IBM Cloud Pak for AIOps side, automation policies run on the remaining events that are flowing from the IBM Tivoli Netcool/OMNIbus server. IBM Cloud Pak for AIOps applies additional suppression and grouping filters to minimize effort, and executes runbooks to automatically resolve events where warranted, and promote the remaining events to Alerts and carefully refined Incidents for ITOps to take action on the most critical concerns.
To help you understand the end-to-end event processing benefits of this deployment pattern in your environment, and where to invest in policies to optimize throughput and response time, review the following event management and impact scenarios:
- As a basic example, a small production IBM Tivoli Netcool/OMNIbus environment with an average incoming event rate of 50 events per second, with a correlation and deduplication ratio of 10:1 raw to correlated events (incidents), can result in a refined volume of 5 Alerts per second being sent to IBM Cloud Pak for AIOps for subsequent processing. With a combination of default available issue resolution (IR) policies and analytics, the alerts can be further reduced (by 90% noise reduction) to less than 1 Incident per second over time on the IBM Cloud Pak for AIOps side.
- As a secondary, larger example, a Production IBM Tivoli Netcool/OMNIbus environment with an average event rate of 500 events per second (with the same correlation and deduplication ratio of 10:1), can in turn present a refined volume of 50 Alerts per second being sent to IBM Cloud Pak for AIOps. By using the same combination of default available issue resolution (IR) policies and analytics, the alerts can be further reduced by 90% noise reduction, with a resultant 5 Incidents per second raised in IBM Cloud Pak for AIOps. Additional issue resolution (IR) policies can be authored to further reduce and refine Incident creation. By leveraging other advanced capabilities within IBM Cloud Pak for AIOps, such as fully automated Runbooks, the volume of actionable incidents that are presented for user interaction can be further reduced.
Storage requirements
You must have local storage configured for your cluster, and it must be distributed across the nodes in your cluster. IBM Cloud Pak for AIOps requires 3000 GB of persistent storage.
If you are installing in an air-gapped environment (offline), you must also ensure that you have adequate space to download the IBM Cloud Pak for AIOps images to the target registry in your offline environment. The IBM Cloud Pak for AIOps images total 118 GB.
IOPS requirements
The following table identifies the input/output operations per second (IOPS) per node that must be supported. Ensure that your hardware can support the expected IOPS per node. A minimum configuration of three nodes for the storage cluster is needed. Each node of the storage solution requires a minimum of one disk (SSD or high-performance storage array). The results for your storage can vary depending on your exact usage, data sets, hardware, storage solution, and more.
Deployment size | Minimum | Recommended |
---|---|---|
Production deployment | 10,000 | 20,000 |
Network requirements
The control plane node requires the following access:
Port number | Direction | Protocol | Description |
---|---|---|---|
80 | Inbound | TCP | Application HTTP port |
8443 | Inbound | TCP | Application HTTPS port |
6443 | Inbound from cluster worker nodes | Control plane server API (HTTPS) |
Worker nodes require the following access:
Port number | Direction | Protocol | Description |
---|---|---|---|
8742 | Inbound from other cluster nodes | UDP | Virtual network |
5001 | Inbound from other cluster nodes | TCP | Distributed registry |
Some Linux distributions require extra firewall rules to avoid potential conflicts or restrictions.
-
Check whether the firewall is enabled.
systemctl status firewalld
If the firewall is enabled, then the command output provides the status of the firewall, including any active rules. If the firewall is disabled, then the command outputs the message No such file or directory.
-
If the firewall is enabled, run the following commands:
firewall-cmd --permanent --add-port=6443/tcp #apiserver firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16 #pods firewall-cmd --permanent --zone=trusted --add-source=10.43.0.0/16 #services firewall-cmd --reload
-
If you are permitting the collection of usage data, then ensure that outbound traffic to https://api.segment.io is allowed. For more information, see Updating usage data collection preferences.