Monitoring service for foundational services versions 3.8 and later

IBM Cloud Pak® foundational services monitoring service is built on top of the Prometheus stack. It provides pre-configured, self-updated monitoring service for clusters and applications.

This topic covers information about how to use the monitoring service in foundational services versions 3.8 and later. For monitoring service in foundational services versions 3.7 and prior, see Monitoring service for foundational services versions 3.7 and prior.

Features

Metrics visualization

Grafana is installed to query and visualize your metrics. Some built-in dashboards for cluster metrics visualization are created by default. You can also create your custom dashboards.

Multi-tenancy

Monitoring provides Kubernetes namespace level isolation. Grafana Organizations are created automatically per Kubernetes namespace. Users can access dashboards and metrics they are allowed only based on the namespaces to which they have access in Red Hat OpenShift Container Platform.

Alerts

Alerts can be triggered automatically and sent to 3rd-party applications like Slack and PagerDuty.

Customization

Adopters and user can integrate with it easily to query and visualize their application metrics, and create Alerts.

Operators

ibm-monitoring-grafana-operator

Installs Grafana.

IBM Cloud Pak foundational services Monitoring is a single-instance service. Therefore, only one instance of the Grafana pod would be running in your cluster.

Red Hat OpenShift Container Platform monitoring

Important: If you are upgrading foundational services from version 3.7 or prior to version 3.8 or later, consider these scenarios:

Foundational services monitoring installs only Grafana, and configures Prometheus as the datasource for Red Hat® OpenShift® Container Platform monitoring.

Accessing the monitoring dashboard

  1. Log in to the IBM Cloud Pak foundational services console.

    Note: When you log in to the console, you have administrative access to Grafana. Do not create more users within the Grafana dashboard or modify the existing users or org.

  2. To access the Grafana dashboard, click Menu > Monitor Health > Monitoring.

    Alternatively, you can open https://<IP_address>:<port>/grafana, where <IP_address> is the DNS or IP address that is used to access the console. <port> is the port that is used to access the console.

    Note: If you are logged in as a Cluster Administrator, you can access the Monitoring dashboard from the Administration panel dashboard. This dashboard provides Cluster Administrators overviews of clusters. The overview includes key metrics for various services and components. It provides links to open other dashboards, pages, and consoles to administer those services and components. From this Administration panel dashboard, you can view and click Monitoring link on the Welcome widget to access the Grafana dashboard. The Administration panel can be accessed by clicking Home within the main navigation menu. Only Cluster Administrators can access the Administration panel dashboard.

  3. The following default Grafana dashboards are created in the Grafana main-org. You must first grant ibm-common-services namespace access to the user.

    • Namespaces Performance IBM Provided 2.5
      Provides information about namespace performance and status metrics.

    • Performance IBM Provided 2.5
      Provides TCP system performance information about Nodes, Memory, and Containers.

    • Kubernetes Cluster Monitoring
      Monitors Kubernetes clusters that use Prometheus. Provides information about cluster CPU, Memory, and Filesystem usage. The dashboard also provides statistics for individual pods, containers, and systemd services.

    • Kubernetes POD Overview
      Monitors pod metrics such as CPU, Memory, Network pod status, and restarts.

    • NGINX Ingress controller
      Provides information about NGINX Ingress controller metrics that can be sorted by namespace, controller class, controller, and ingress.

    • Node Performance Summary
      Provides information about system performance metrics such as CPU, Memory, Disk, and Network for all nodes in the cluster.

    • Prometheus Stats
      Dashboard for monitoring Prometheus v2.x.x.

Role-based access control (RBAC)

RBAC for monitoring API

A user with role ClusterAdministratorAdministrator or Operator can access monitoring service. A user with role ClusterAdministrator or Administrator can use write operations in monitoring service, including deleting Prometheus metrics data, and updating Grafana configurations.

RBAC for monitoring data

Starting with version 1.2.0, the ibm-icpmonitoring Helm chart introduces an important feature. It offers a new module that provides role-based access controls (RBAC) for access to the Prometheus metrics data.

The RBAC module is effectively a proxy that sits in front of the Prometheus client pod. It examines the requests for authorization headers, and at that point, enforces role-based controls. The general RBAC rules are as follows.

A user with the ClusterAdministrator role can access any resource. A user with any other role can access data in only the namespaces for which that user is authorized.

If metrics data includes the label, kubernetes_namespace, then it is recognized as being in the namespace, which is the value of that label. If metrics data has no such label, then it is recognized as system level metrics. Only users with the role ClusterAdministrator can access system level metrics.

In a IBM Multicloud Manager hub cluster environment, users can access metrics from managed clusters. A user with the role ClusterAdministrator can access data from all managed clusters. A user with any other role can access data from only the managed clusters whose related namespaces that user is authorized.

RBAC for monitoring dashboards

Starting with version 1.5.0, the ibm-icpmonitoring Helm chart offers a new module that provides role-based access controls (RBAC) for access to the monitoring dashboards in Grafana.

In Grafana, users can belong to one or more organizations. Each organization contains its own settings for resources such as data sources and dashboards. For the Grafana running in your product, each namespace in your product has a corresponding organization with the same name. For example, if you create a new namespace that is named test in your product, an organization that is named test is generated in Grafana. If you delete the test namespace, the test organization is also removed. The only exception is the ibm-common-services namespace. The corresponding organization for ibm-common-services is the Grafana default of Main Org.

When you log in to your product, you can access a Grafana organization only if you are authorized to access the corresponding namespace. If you have access to more than one Grafana organization, use the Grafana console to switch to a different organization. Message, UNAUTHORIZED appears when you do not have access to a Grafana organization.

Different users access Grafana organizations by using different organization roles. In the corresponding namespace, if you are assigned the role of ClusterAdministrator or Administrator, you have Admin access to the Grafana organization. Otherwise, you have Viewer access to the Grafana organization.

When you access Grafana as a user of your product, a user with the same name is created in Grafana. If the user in your product is deleted, the corresponding user is not deleted from Grafana. The user account becomes stale. Run the following command to request the removal of stale users:

  curl -k -s -X POST -H "Authorization:$ACCESS_TOKEN" https://<Cluster Master Host>:<Cluster Master API Port>/grafana/check_stale_users

For information about Grafana APIs, see Accessing monitoring service APIs.

Note: Monitoring service does not provide RBAC support for Prometheus and Alertmanager alerts.

Installing monitoring service

Prerequisites

Installing IBM Cloud Pak foundational services

Complete the following steps to install IBM Cloud Pak foundational services. For more information, see Installing IBM Cloud Pak foundational services online.

  1. Create or edit the OperandRequest CR.

    The following example resembles a CR.

    apiVersion: operator.ibm.com/v1alpha1
    kind: OperandRequest
    metadata:
      name: common-service
      namespace: ibm-common-services
    spec:
      requests:
        - operands:
            - name: ibm-monitoring-grafana-operator
          registry: common-service
    
  2. Run the following command to check the status of your pods.

    oc get po -n ibm-common-services | grep monitoring
    

    Your output might resemble the following example, which shows that all pods are Running and all containers are available; for example, 4/4 for Prometheus.

       ibm-monitoring-grafana-5b9bbdcd-495dg              4/4     Running     15         3d21h
       ibm-monitoring-grafana-operator-76bc8bbdc8-5vsns   1/1     Running     0          3d22h
    
    **Note:** Four containers are running in the Grafana pod. 
    

Configuring monitoring service

You can configure the monitoring service by editing the Operand Deployment Lifecycle Manager (ODLM) OperandRequest CR. Following is an example of a default CR.

apiVersion: operator.ibm.com/v1alpha1
kind: OperandConfig
metadata:
  name: common-service
  namespace: ibm-common-services
spec:
  services:
    - name: ibm-monitoring-grafana-operator
      spec:
        grafana: {}

You can update the configuration parameters. For more information, see Configuring IBM Cloud Pak foundational services by using the CommonService custom resource.

Configuring applications to use the monitoring service

You can configure your applications in any namespace to expose metrics to the monitoring service.

  1. Create a Service object and add specified annotations to it.

    • prometheus.io/scrape: 'true'

      Optional.

    • prometheus.io/scheme: 'https'

      Optional. Use this parameter when TLS is enabled for your metrics endpoint. Prometheus is configured to skip certificate verification so you can use any certificate to secure your endpoint. For example, you can use Red Hat OpenShift Container Platform annotation, service.beta.openshift.io/serving-cert-secret-name or the IBM Certificate Manager service.

    • prometheus.io/path

      Optional. Use this parameter when your default value for endpoint is not /metrics.

    • prometheus.io/port

      Optional. Use this parameter to specify the port for metrics.

    The following example illustrates annotations for metrics. It also illustrates how to create certificates by using the Red Hat OpenShift Container Platform service.beta.openshift.io/serving-cert-secret-name annotation.

    apiVersion: v1
    kind: Service
    metadata:
     name: prometheus-metrics-server-demo
     namespace: default
     labels:
       name: prometheus-metrics-server-demo
     annotations:
       ## Generate certificate secret which is used by metrics pod. Only works on OpenShift
       service.beta.openshift.io/serving-cert-secret-name: prometheus-metrics-server-demo
       ## enable cs monitoring metrics scrape
       prometheus.io/scrape: "true"
       ## it uses 8443 port which is https. 
       ## comment it out to use 8080 port
       prometheus.io/scheme: "https"
    spec:
     ports:
     - name: https
       port: 8443
       protocol: TCP
       targetPort: 8443
     - name: http
       port: 8080
       protocol: TCP
       targetPort: 8080
     selector:
       name: prometheus-metrics-server-demo
     type: ClusterIP
    

    You can choose to add the annotations to Pod objects instead of Service objects. However, Service objects are recommended because they support TLS.

  2. Create ServiceMonitor or PodMonitor CRs in the same namespace with your Service object. For more information, see Prometheus design documentation Opens in a new tab.

    Following is an example of a ServiceMonitor CR.

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: prometheus-metrics-server-demo
      namespace: default
    spec:
      selector:
        matchLabels:
          name: prometheus-metrics-server-demo
      endpoints:
      - scheme: https
        port: https
        tlsConfig:
          insecureSkipVerify: true
    

    Following is an example of a PodMonitor CR, which is not recommended because they do not support TLS.

      apiVersion: monitoring.coreos.com/v1
      kind: PodMonitor
      metadata:
        name: prometheus-metrics-server-demo
        namespace: default
      spec:
        selector:
          matchLabels:
           name: prometheus-metrics-server-demo
        podMetricsEndpoints:
        - scheme: http
          targetPort: 9157
    

Managing Grafana dashboards

You can create custom Grafana dashboards by creating MonitoringDashboard CRs. CRs can be created in any namespace and will appear in the corresponding Grafana organization.

Important: Access to the third-party Grafana web user interface is deprecated in Red Hat® OpenShift® Container Platform version 4.10 Opens in a new tab, and removed in Red Hat® OpenShift® Container Platform version 4.11 Opens in a new tab. You can use the Red Hat® OpenShift® Container Platform console to access the dashboards. For more information, see Monitoring overview Opens in a new tab.

Note: You must switch to the Grafana organization before you browse the dashboard. Dashboards that are created directly in Grafana are lost when you restart pods.

  1. Create a dashboard on Grafana, and then generate a JSON string for the dashboard. From the dashboard, click Dashboard Setting > JSON Model. For more information about dashboard files, see Dashboard JSON Opens in a new tab.

  2. Create the MonitoringDashboard CR in following format:

    apiVersion: monitoringcontroller.cloud.ibm.com/v1
    kind: MonitoringDashboard
    metadata:
      name: sample-dashboard
    spec:
      enabled: true
      data: |-
         {
            ...
         }
    
  3. Copy the generated JSON string and use it as the value in the spec.data field of the MonitoringDashboard CR from Step 2.

Note: Remove id and uid fields of the top-level object.

Following is an example of the MonitoringDashboard CR.

   apiVersion: monitoringcontroller.cloud.ibm.com/v1
   kind: MonitoringDashboard
   metadata:
     name: dashboard-demo
     namespace: default
   spec:
     enabled: true
     data: |-
      {
        "annotations": {
          "list": [
            {
              "builtIn": 1,
              "datasource": "-- Grafana --",
              "enable": true,
              "hide": true,
              "iconColor": "rgba(0, 211, 255, 1)",
              "name": "Annotations & Alerts",
              "type": "dashboard"
            }
          ]
        },
        "editable": true,
        "gnetId": null,
        "graphTooltip": 0,
        "links": [],
        "panels": [
          {
            "cacheTimeout": null,
            "colorBackground": false,
            "colorValue": false,
            "colors": [
              "#299c46",
              "rgba(237, 129, 40, 0.89)",
              "#d44a3a"
            ],
            "datasource": "prometheus",
            "format": "none",
            "gauge": {
              "maxValue": 100,
              "minValue": 0,
              "show": false,
              "thresholdLabels": false,
              "thresholdMarkers": true
            },
            "gridPos": {
              "h": 9,
              "w": 12,
              "x": 0,
              "y": 0
            },
            "id": 2,
            "interval": null,
            "links": [],
            "mappingType": 1,
            "mappingTypes": [
              {
                "name": "value to text",
                "value": 1
              },
              {
                "name": "range to text",
                "value": 2
              }
            ],
            "maxDataPoints": 100,
            "nullPointMode": "connected",
            "nullText": null,
            "options": {},
            "postfix": "",
            "postfixFontSize": "50%",
            "prefix": "",
            "prefixFontSize": "50%",
            "rangeMaps": [
              {
                "from": "null",
                "text": "N/A",
                "to": "null"
              }
            ],
            "sparkline": {
              "fillColor": "rgba(31, 118, 189, 0.18)",
              "full": false,
              "lineColor": "rgb(31, 120, 193)",
              "show": false,
              "ymax": null,
              "ymin": null
            },
            "tableColumn": "",
            "targets": [
              {
                "expr": "sum(kube_pod_info{namespace=~\"ibm-common-services\"})",
                "refId": "A"
              }
            ],
            "thresholds": "",
            "timeFrom": null,
            "timeShift": null,
            "title": "Demo Panel",
            "type": "singlestat",
            "valueFontSize": "80%",
            "valueMaps": [
              {
                "op": "=",
                "text": "N/A",
                "value": "null"
              }
            ],
            "valueName": "avg"
          }
        ],
        "schemaVersion": 21,
        "style": "dark",
        "tags": [],
        "templating": {
          "list": []
        },
        "time": {
          "from": "now-6h",
          "to": "now"
        },
        "timepicker": {
          "refresh_intervals": [
            "5s",
            "10s",
            "30s",
            "1m",
            "5m",
            "15m",
            "30m",
            "1h",
            "2h",
            "1d"
          ]
        },
        "timezone": "",
        "title": "Demo Dashboard",
        "version": 0
      }
  1. Save the YAML string as a file and run command oc apply -f <file location>.

  2. Log in to Grafana and switch to the ibm-common-services organization to check the new dashboard.

  3. To delete the dashboard, run command, oc delete monitoringdashboards/dashboard-demo -n default.

Accessing monitoring service APIs ()

You can access monitoring service Grafana APIs. Before you can access the APIs, you must obtain authentication tokens to specify in your request headers. For information about obtaining authentication tokens, see Preparing to run component or management API commands.

After you obtain the authentication tokens, complete the following steps to access the Grafana APIs.

Access the Grafana API at url, https://<Cluster Master Host>:<Cluster Master API Port>/grafana/*, and obtain the sample dashboard.