IBM Cloud Private logging

IBM Cloud Private deploys an ELK stack, referred to as the management logging service, to collect and store all Docker-captured logs. Numerous options are available to customize the stack before you install IBM Cloud Private, including end-to-end TLS encryption. You can deploy and customize additional ELK stacks from the catalog, or deploy other third-party solutions, offering maximum flexibility to manage your logs.

The management logging service offers a wide range of options to configure the stack to suit your needs:

ELK

ELK is an abbreviation for three products, Elasticsearch, Logstash, and Kibana, all developed by Elastic External link icon. Together they comprise a stack of tools that stream, store, search, and monitor data, including logs. A fourth Elastic component named Filebeat is deployed to stream the logs to Elasticsearch.

Configuration

IBM Cloud Private's Elasticsearch deployment is configured to store documents in the /var/lib/icp/logging/elk-data directory of each management node to which it is deployed. You can change this path prior to installation by adding the following parameter to config.yaml. The new path must exist on all management nodes in the cluster.

elasticsearch_storage_dir: <your_path>

Hardware requirements

Elasticsearch is designed to handle large amounts of log data. The more data you choose to retain, the more resources it will require. It is highly recommended that you prototype the cluster and applications prior to full production deployment to measure the impact of log data on your system. For detailed capacity planning information please review the topic IBM® Cloud Private logging and metrics capacity planning.

Note: The default memory allocation for the managed ELK stack is not intended for production use. Actual production usage will likely be much higher. The default values simply provide a starting point for prototyping and other demonstration efforts.

Storage

The minimum required disk size generally correlates to the amount of raw log data generated for a full log retention period. It is also a good practice to account for unexpected bursts of log traffic. As such you should consider allocating an additional 25-50%. If you do not know how much log data will be generated, a good starting point is to allocate 100Gi of storage for each management node.

You can modify the default storage size by adding the following block to the config.yaml file:

elasticsearch_storage_size: <new_size>

Memory

The amount of memory required by each pod differs depending on the volume of logs to be retained. It is impossible to predict exact needs but the following can serve as a guide.

Insufficient memory can lead to excess garbage collection, which can add significant CPU consumption by the Elasticsearch process.

The default memory allocation settings for the managed ELK stack can be modified in config.yaml by adding the following lines and customizing them accordingly. In general the value of heapSize should be approximately half of the overall pod memoryLimit value.

Note: The heap size is specified using JDK units: g|G, m|M, k|K. The pod memory limit is specified in Kubernetes units: G|Gi, M|Mi, K|Ki.

logging:
  logstash:
    heapSize: "512m"
    memoryLimit: "1024Mi"
  elasticsearch:
    client:
      heapSize: "1024m"
      memoryLimit: "1536Mi"
    data:
      heapSize: "1536m"
      memoryLimit: "3072Mi"
    master:
      heapSize: "1024m"
      memoryLimit: "1536Mi"

CPU

CPU usage can fluctuate depending on a variety of factors. Long or complex queries tend to require the most CPU. Plan ahead to ensure you have the capacity needed to handle all of the queries that your organization will need.

Docker integration

Every node in the cluster must configure Docker to use the JSON file driver. Docker will stream the stdout and stderr pipes from each container into a file on the Docker host. For example: if a container has Docker ID abcd, the default location for some platforms to store output from the container is /var/lib/docker/containers/abcd/abcd-json.log. The IBM Cloud Private logging chart deploys a Filebeat daemonset to every node to stream the JSON log files into the ELK stack.

Kubernetes adds its own layer of abstraction on top of each container log. Under the default path /var/log/containers it creates a symlink pointing back to each Docker log file. The symlink file name contains additional Kubernetes metadata that can be parsed to extract four fields:

                   |   1    |   2   |    3    |                               4                                |
/var/log/containers/pod-abcd_default_container-5bc7148c976a27cd9ccf17693ca8bf760f7c454b863767a7e47589f7d546dc72.log
  1. The name of the pod to which the container belongs (stored as kubernetes.pod)
  2. The namespace into which the pod was deployed (stored as kubernetes.namespace)
  3. The name of the container (stored as kubernetes.container_name)
  4. The container's Docker ID (stored as kubernetes.container_id)

Docker images

The IBM Cloud Private ELK stack images are custom built using a process identical to the official Elastic images. This methodology offers strong assurance that the software stack is properly assembled, and that upgrades and fixes will work as intended. It also enables IBM to supply images for amd64 and ppc64le platforms with equivalent levels of support.

Chart instances

You can deploy as many instances of the full ELK stack as hardware capacity permits. The Helm chart used to deploy the management logging service is published to the content catalog as well. Each instance of the chart is deployed as a self-contained stack. When security is enabled each stack will generate a custom set of certificates.

One common scenario is the need to isolate different sets of logs. This can be challenging because containers from multiple namespaces can be deployed to the same node, resulting in logs that are not related stored at a common path. The ibm-icplogging Helm chart offers the option to restrict a particular ELK stack to collecting logs from specific namespaces, specific nodes, or both. Below are some examples of how to use the chart options to restrict the logs collected by an ELK stack.

Namespace

The namespaces parameter identifies one or more namespaces from which logs should be collected.

filebeat:
  scope:
    namespaces:
      - namespace1
      - namespace2

Node

This option defines one or more labels to match against the nodes to which the Filebeat daemonset will be deployed. See this article for information about attaching labels to nodes.

filebeat:
  scope:
    nodes:
      env: production
      os: linux

A guide is also available to update Filebeat node selections after deploying the chart. See Customizing IBM® Cloud Private Filebeat nodes for the logging service.

Processing logs

Filebeat

The ibm-icplogging Helm chart uses Filebeat to stream container logs collected by Docker. As required by IBM® Cloud Private, Docker should be configured to use the JSON file driver. It stores each line of output through stdout and stderr from the container as an individual JSON object. Filebeat parses each JSON object, attaches metadata such as the IP address and host name of the container's node, and then streams the record to Logstash.

Logstash

Logstash performs two roles. First and foremost, it buffers the data between Filebeat and Elasticsearch. This protects against data loss and reduces the volume of traffic to Elasticsearch. Its second role is to further parse the log record to extract metadata and make the data in the record more searchable. These are the default steps taken by the ibm-icplogging Logstash pod:

  1. Parse the log record's datestamp (stored by Docker at the time it was expressed by the container).
  2. Extract the container's name, namespace, pod and container ID into individual fields.
  3. If the container generated a JSON-formatted log entry, parse it and extract the individual fields to the root of the log record.

The record is then stored briefly before Logstash sends it to Elasticsearch.

Elasticsearch

When a log record is sent to Elasticsearch it becomes a document. Each document is stored within a named group called an index. When Logstash sends a record to Elasticsearch it will assign it to an index with the pattern logstash-<YYYY>-<MM>-<dd>. Assigning each record to an index named after the day in which it was submitted makes it easier to track log retention policies.

Elasticsearch itself runs independently across three different pod types. (Many other configurations are possible, this is the configuration chosen in the ibm-icplogging Helm chart.)

Kibana

Kibana provides a browser-friendly query and visualization interface to Elasticsearch. It can be optionally excluded from deployment, though this is not recommended as Kibana is the default tool through which logs can be searched. To disable deployment of Kibana prior to installation add the following lines to config.yaml:

logging:
  kibana:
    install: false

Data retention

A container is deployed as a curator within each ELK stack. The curator removes indexes from Elasticsearch that are older than the configured maximum index age. Care should be taken when storing logs for long periods of time. Each additional day of retained logs increases the memory and storage resources that Elasticsearch will require.

The default values for the managed ELK stack curator can be modified in config.yaml by adding these lines and customizing them accordingly.

logging:
  curator:
    name: log-curator
    image:
      repository: "ibmcom/indices-cleaner"
      tag: "2.0.0"
    # Runs at 23:30 UTC daily
    schedule: "30 23 * * *"
    # Application log retention
    app:
      unit: days
      count: 1
    # Elastcisearch cluster monitoring log retention
    monitoring:
      unit: days
      count: 1
    # X-Pack watcher plugin log retention
    watcher:
      unit: days
      count: 1

Timing

The curator is set to run on UTC time. Using a single time standard makes it easier to coordinate and anticipate curation across geographical regions.

The default launch time is set for half an hour prior to midnight UTC. The purpose is to avoid any risk that lag—perhaps due to congestion or system load—could launch the curator after the midnight boundary and risk storing more logs than expected.

PKI in Elasticsearch

Beginning with version 5.0, the old TLS enablement plug-in was deprecated and replaced with a new plug-in called X-Pack. X-Pack offers a number of additional features marketed to Enterprise users, but a license is required. The features are free for a 30-day limited-use period, after which all X-Pack functions are disabled.

Search Guard is another product that offers security-related plug-ins for the ELK stack. In contrast to X-Pack, some of its features are offered under a community edition category with no limitation on use. As stated by the readme file: Search Guard offers all basic security features for free. The Community Edition of Search Guard can be used for all projects, including commercial projects, at absolutely no cost. TLS encryption with PKI is one of these community edition features.

By default, the IBM Cloud Private ELK stack uses Search Guard to provide PKI. However, if you already have a license for X-Pack, or plan to purchase one, you can specify the following during deployment to configure the ELK stack to use X-Pack's PKI implementation. The customer is responsible for installation of the license after deployment.

   logging:
     security:
       provider: xpack

Securing data-in-transit

Each deployment of the Elasticsearch stack is secured by default with mutual authentication over TLS. The managed ELK stack is also configured to utilize the IBM Cloud Private certificate authority to sign the certificates used by the stack. All other ELK stacks default to create their own certificate authority on deployment. To toggle security on or off use one of the following snippets.

Installation

The following snippet can be added to config.yaml to enable or disable security.

  logging:
    security:
      enabled: true|false

Helm

The following snippet can be added to a values override file for Helm deployment to enable or disable security.

  security:
    enabled: true|false

Custom certificate authority

The default configuration of the managed ELK stack uses the IBM Cloud Private certificate authority. That CA can be found in the cluster-ca-cert secret in the kube-system namespace, and the secret has two fields (tls.crt and tls.key) that contain the actual certificate and its private key. All deployments of the ibm-icplogging Helm chart can utilize an existing certificate authority. Three requirements must be met:

  1. The CA must be stored in a Kubernetes secret.
  2. The secret must exist in the namespace to which the ELK stack is deployed.
  3. The contents of the certificate and its secret key must be stored in separately named fields (or keys) within the Kubernetes secret.

For example, given a sample secret like the following code:

  apiVersion: v1
  kind: Secret
  metadata:
    name: my-ca-secret
  type: Opaque
  data:
    my_ca.crt: ...
    my_ca.key: ...

You must then configure the Helm chart with the following subset of values:

  security:
    ca:
      origin: external
      external:
        secretName: my-ca-secret
        certSecretKey: my_ca.crt
        keySecretKey: my_ca.key

Certificates

All connections to Elasticsearch must be configured to exchange a properly signed certificate when security is enabled. The IBM Cloud Private ELK stack architecture generates a number of certificates to apply to discrete roles. All are stored in the same Kubernetes secret with a name following the pattern <release_name>-ibm-icplogging-certs.

ELK role Description Secret key name Keystore Key format
Initialization Initializes Search Guard settings sgadmin JKS PKCS12
Superuser Elasticsearch administrator superuser PEM PKCS1
Filebeat Client to Logstash filebeat PEM PKCS1
Logstash Server for Filebeat logstash PEM PKCS8
Logstash Client for Elasticsearch log stream logstash-monitoring JKS PKCS12
Logstash Client for Elasticsearch monitoring logstash-elasticsearch JKS PKCS12
Elasticsearch REST API server elasticsearch JKS PKCS12
Elasticsearch Intra-node transport elasticsearch-transport JKS PKCS12
Curator Client to Elasticsearch REST API curator PEM PKCS1
Kibana Client to Elasticsearch REST API kibana PEM PKCS8
Kibana proxy Server for incoming connections kibanarouter PEM PKCS1

Securing data-at-rest

The Elasticsearch stack does not offer data encryption at rest internally. The Elastic company recommends third-party solutions to achieve this goal. IBM Cloud Private has instructions for supported methods of encrypting data on disk.

Role-based access

Version 2.0.0 of the ibm-icplogging Helm chart introduces two important features. First, it supports audit logs that are managed separately from application logs. Second, it offers a new module—for managed ELK stacks only—that provides role-based access controls (RBAC) for all Elasticsearch REST API invocations.

Audit logs are streamed to Elasticsearch from a daemonset pod deployed using a separate chart. They are stored using a separate index prefix. Whereas application logs are stored in indexes prefixed with the logstash- string, audit logs are stored in indexes prefixed by audit-. This enables the RBAC module to differentiate between the security models for each.

The RBAC module is effectively a proxy that sits in front of each Elasticsearch client pod. All connections are required to have certificates signed by the Elasticsearch cluster CA. (By default this is the IBM Cloud Private root CA.) As of this release, individual certificates are not themselves restricted. Instead the RBAC module will examine the request for an authorization header and at that point enforce role-based controls. In general, the rules concerning RBAC are as follows:

  1. A user with the role ClusterAdministrator can access any resource, whether audit or application log.
  2. A user with the role Auditor is only granted access to audit logs in the namespaces for which that user is authorized.
  3. A user with any other role can only access application logs in the namespaces for which that user is authorized.
  4. Any attempt by an auditor to access application logs, or a non-auditor to access audit logs, will be rejected.

Post-deployment notes

Viewing and querying logs

Kibana is the primary tool for interfacing with logs. It offers a "Discovery" view, through which you can query for logs that meet specific criteria. It is possible to collate logs through this view using one or more of the fields automatically added by the ibm-icplogging ELK stack.

You might need to query logs based on other criteria that is not discoverable by the ELK stack. For example, middleware product, application name, or log level. To get the most accuracy from application logs consider JSON formatted output. JSON declares the names of the values in the log file rather than anticipating Elasticsearch to parse it accurately. The Filebeat daemonset deployed by the ibm-icplogging Helm chart is preconfigured to parse JSON-formatted log entries and set the values so they are searchable as top-level elements in Elasticsearch.

Sensitive data

You might be required to mask sensitive data before it reaches Elasticsearch. Logstash deploys with a helpful plugin named Mutate that offers many functions for locating and masking data considered to be sensitive. Adding these masks requires customization of the Logstash configuration, which is typically found in a Configmap resource named <release_name>-ibm-icplogging-logstash-config, where release_name refers to the release name given the specific Helm chart deploy.

Modifications to the Logstash configuration will automatically propagate to the deployed containers after a short delay.

Streaming IBM Cloud Private platform logs off-site

Platform components are deployed into the kube-system namespace by default, unless the name is customized. Given the following prerequisites:

  1. the namespace is used exclusively for the use of IBM Cloud Private services; and
  2. only IBM Cloud Private services will deploy to nodes labeled "master", "management", or "proxy"

then the following steps can be performed:

  1. Modify the Filebeat daemonset definition for the IBM Cloud Private system namespace to specify node affinity only to nodes labeled "master", "management", or "proxy".
  2. Modify the Logstash configuration for the stack deployed to the IBM Cloud Private system namespace to stream logs to an off-platform collection service.
  3. If no longer needed, delete the Elasticsearch and Kibana Deployments and StatefulSets defined in the IBM Cloud Private system namespace.

This will stream all IBM Cloud Private platform logs to an external service. The prerequisites ensure that other ELK stacks (or log collection services) do not accidentally capture platform-level logs.

Elasticsearch APIs

Elasticsearch has a high degree of flexibility and a thoroughly documented API. Secure installation of the ELK stack restricts API access to internal components using mutual authentication over TLS as described in preceding sections. External access to Elasticsearch data is therefore only available through Kibana by authenticated users.

Note: These APIs only work to query or operate on data that is presently tracked in the Elasticsearch data store. They do not have any effect on backups.