Monitoring the platform
From the IBM Cloud Pak for Data web client, you can monitor the services that are running on the platform, understand how you are using cluster resources, and be aware of issues as they arise. You can also set quotas on the platform, individual services, and on projects to help mitigate unexpected spikes in resource use.
Accessing the Monitoring page
- Required permissions:
- To access the Monitoring page, you must have one of the following
permissions:
- Administer platform
- Manage platform health
- View platform health
- Log in to the Cloud Pak for Data web client.
- From the navigation menu, select .
- See the current resource use (vCPU and memory) for the platform
If you click the View status and use data arrow on the Platform resource overview card, you can see a breakdown by services, service instances, tool runtimes, data planes, physical locations, pods, and projects.
- Review the platform resource use for the last 12 hours
If you click the View historical data arrow on the Platform resource use card, you can see a breakdown by services, service instances, tool runtimes, data planes, physical locations, pods, and projects. You can also view historical data beyond 12 hours. By default, the platform stores up to 30 days of data. However, you can adjust the length of time that data is retained. For details, see Changing the retention period for IBM Cloud Pak for Data monitoring data.
Platform resource use information includes pods from physical locations, if there are physical locations associated with the Cloud Pak for Data instance.
- Access at-a-glance platform monitoring
- View events and alerts
- Configure and enforce quotas
At-a-glance platform monitoring
Available cards | Status information | Get more detailed information |
---|---|---|
Services Services are software that is installed on the platform. Services consume resources as part of their regular operations. |
From the Monitoring page, you can
see:
|
Click the Services card to see:
You can optionally configure the table to show:
You can select a service to see:
|
Service instances Some services can be deployed multiple times after they are installed. Each deployment is called a service instance. Service instances consume resources as part of their normal operations. |
From the Monitoring page, you can
see:
|
Click the Service instances card to see:
You can optionally configure the table to show:
You can select a service to see:
You can click the Options icon (
|
Tool runtimes Runtime environments specify the hardware and software configurations for environments for analytical assets and jobs. Environments consume resources as part of their regular operations. By default, this card is not displayed on the platform. It is displayed only if you install a service that uses environments. |
From the Monitoring page, you can
see:
|
Click the Tool runtimes card to see:
You can select a environment to see the pods that are associated with it. You can
optionally click the Stop runtime instance icon ( |
Pods Services are composed of Kubernetes pods. If a pod is failed or unknown, it can impact the health of the service. If a pod is pending, the service might not be able to process specific requests until the pod is running. |
From the Monitoring page, you can
see:
|
Click the Pods card to see:
You can optionally configure the table to show:
You can click the Options icon (
|
Data planes A data plane is a logical grouping of one or more physical locations. You can deploy workloads to a data plane. The workload will be scheduled on one of the physical locations associated with the data plane. |
From the Monitoring page, you can
see:
|
Click the Data planes card to see:
You can select a data plane to see:
|
Projects Projects are collaborative workspaces where you work with data and other assets to accomplish a particular goal. By default, this card is not displayed on the platform. It is displayed only if you install a service that uses the Cloud Pak for Data common core services. |
From the Monitoring page, you can
see:
|
Click the Projects card to see:
You can optionally configure the table to show:
You can select a project to see:
|
Physical locations A remote physical location is processing infrastructure on a remote cluster. When you set up a remote physical location, you install Cloud Pak for Data agents on a remote cluster. After you set up a remote physical location, you can register the physical location with the instance of Cloud Pak for Data that you want to expand. Then, you can add the physical location to a data plane. You can optionally add the same remote physical location to multiple data planes. |
From the Monitoring page, you can
see:
|
Click the Physical locations card to see:
You can select a physical location to see:
|
Events and alerts
An alert is triggered by an event or a series of events. The severity of an event indicates that an issue occurred or that there is a potential issue.
- The number of critical alerts
- The number of warning alerts
If you click on any of these entries, you are taken to a filtered list of alerts or events based on the entry you selected.
If you click the View all events and alerts arrow on the Alerts card, you can a complete list of events.
You can optionally customize the events that trigger alerts. For details, see Monitoring and alerting in Cloud Pak for Data.
Setting and enforcing quotas
A quota is a way for you to specify the maximum amount of memory and vCPU you want the platform, a specific service, or a project to use. A quota is a target against which you can measure your actual memory and vCPU use. A quota acts as a benchmark to let you know when your vCPU or memory use is approaching or surpassing your target use.
Scaling impacts the overall capacity of a service by adjusting the number of pods in the service. (You can also scale the Cloud Pak for Data control plane.) When you scale a service up, the service becomes more resilient. Additionally, the service might have increased parallel processing capacity.
Setting a quota on a service does not change the scale. Scale and quota are independent settings.
In addition to setting a quota, you can optionally enable quota enforcement. When you enforce quotas, new pods cannot be created if the pods would push your resource use above your quota.
The behavior of the quota enforcement feature depends on whether you set your quotas on pod requests or limits. (For an in-depth explanation of requests and limits, see Managing Resources for Containers in the Kubernetes documentation.)
- Enforcing quotas on pod requests
- A request is the amount of vCPU or memory that the pod expects to use as part of
its normal operations.When you set quotas on pod requests, you have more flexibility in how your resources are allocated:
- If you enforce platform quotas, the control plane and any services that are running on this instance of Cloud Pak for Data are prevented from creating new pods if the requests in the new pod would push the platform over either the platform memory quota or the vCPU quota. These pods remain in the pending state until there are sufficient resources available. However, the existing pods can use more memory or vCPU than the platform quota.
- If you enforce a service quota, the service is prevented from creating new pods if the requests in the new pod would push the service over either the memory quota or the vCPU quota. These pods remain in the pending state until there are sufficient resources available. However, the existing pods can use more memory or vCPU than the service quota.
- If you enforce a project quota, the project is prevented from creating new pods if the requests in the new pods would push the project over either the memory quota or the vCPU quota. The pods remain in the pending state until there are sufficient resources available. However, the existing pods can use more memory or vCPU than the project quota.
- Enforcing quotas on pod limits
- A limit is the absolute maximum amount of vCPU or memory that the pod can use. If
the pod tries to consume additional resources, the pod is terminated. In most cases, the requested
resources (the requests) are less than the limits.When you set quotas on pod limits, you have more control over your resources:
- If you enforce platform quotas, the control plane and any services that are running on this instance of Cloud Pak for Data are prevented from creating new pods if the limits in the new pods would push the platform over either the platform memory quota or the vCPU quota. These pods remain in the pending state until there are sufficient resources available. When you enforce platform quotas on pod limits, the quota is a cap on the total resources that existing pods can use.
- If you enforce a service quota, the service is prevented from creating new pods if the limits in the new pod would push the service over either the memory quota or the vCPU quota. These pods remain in the pending state until there are sufficient resources available. When you enforce service quotas on pod limits, the quota is a cap on the total resources that the existing pods can use.
- If you enforce a project quota, the project is prevented from creating new pods if the limits in the new pod would push the project over either the memory quota or the vCPU quota. These pods remain in the pending stat until there are sufficient resources available. When you enforce project quotas on pod limits, the quota is a cap on the total resources that the existing pods can use.
If you don't enforce quotas, the quota has no impact on the behavior of the platform or services. If you are approaching or surpassing your quota settings, it's up to you whether you want to allow processes to consume resources or whether you want to stop processes to release resources.
Setting the platform quota
To set the platform quota:
- From the Monitoring page, click the Edit platform
quotas icon (
) on the Platform quotas card.
- Select Monitor platform resource use against your target use.
- Specify whether you want to set quotas on pod Requests or Limits.
- Specify your vCPU quota. This is the target maximum amount of vCPU you want the platform to use.
- Specify your vCPU alert threshold. When you reach the specified percent of vCPU in use, the platform alerts you based on your alert settings.
- Specify your Memory quota. This is the target maximum amount of memory you want the platform to use.
- Specify your Memory alert threshold. When you reach the specified percent of memory in use, the platform alerts you.
- If you want to automatically enforce the platform quota settings, select Enforce quotas.
- Click Save.
Setting service quotas
To set service quotas:
- On the Monitoring page, click Services on the Quotas card.
- Locate the service for which you want to edit the quota, and click the
Edit icon (
).
- Select Monitor service resource use against your target use.
- Specify whether you want to set quotas on pod Requests or Limits.
- Specify your vCPU quota. This is the target maximum amount of vCPU you want the service to use.
- Specify your vCPU alert threshold. When you reach the specified percent of vCPU in use, the platform alerts you based on your alert settings.
- Specify your Memory quota. This is the target maximum amount of memory you want the service to use.
- Specify your Memory alert threshold. When you reach the specified percent of memory in use, the platform alerts you.
- If you want to automatically enforce the service quota settings, select Enforce quotas.
- Click Save.
Setting project quotas
To set project quotas:
- On the Monitoring page, click Projects on the Quotas card.
- Locate the project for which you want to edit the quota and click the
Edit icon (
).
- Select Monitor project resource use against your target use.
- Specify whether you want to set quotas on pod Requests or Limits.
- Specify your vCPU quota. This is the target maximum amount of vCPU you want the project to use.
- Specify your vCPU alert threshold. When you reach the specified percent of vCPU in use, the platform alerts you based on your alert settings.
- Specify your Memory quota. This is the target maximum amount of memory you want the project to use.
- Specify your Memory alert threshold. When you reach the specified percent of memory in use, the platform alerts you.
- If you want to automatically enforce the project quota settings, select Enforce quotas.
- Click Save.
Setting data plane quotas
To set data plane quotas:
- On the Monitoring page, click Data planes on the Quotas card.
- Locate the data plane for which you want to edit the quota and click the
Edit icon (
).
- Select Monitor data plane resource use against your target use.
- Specify whether you want to set quotas on pod Requests or Limits.
- Specify your vCPU quota. This is the target maximum amount of vCPU you want the data plane to use.
- Specify your vCPU alert threshold. When you reach the specified percent of vCPU in use, the platform alerts you based on your alert settings.
- Specify your Memory quota. This is the target maximum amount of memory you want the data plane to use.
- Specify your Memory alert threshold. When you reach the specified percent of memory in use, the platform alerts you.
- Click Save.
Setting physical location quotas
To set physical location quotas:
- On the Monitoring page, click Physical locations on the Quotas card.
- Locate the physical location for which you want to edit the quota and click the
Edit icon (
).
- Select Monitor physical location resource use against your target use.
- Specify whether you want to set quotas on pod Requests or Limits.
- Specify your vCPU quota. This is the target maximum amount of vCPU you want the physical location to use.
- Specify your vCPU alert threshold. When you reach the specified percent of vCPU in use, the platform alerts you based on your alert settings.
- Specify your Memory quota. This is the target maximum amount of memory you want the physical location to use.
- Specify your Memory alert threshold. When you reach the specified percent of memory in use, the platform alerts you.
- Click Save.