IBM Cloud Private logging
IBM Cloud Private deploys an ELK stack, referred to as the management logging service, to collect and store all Docker-captured logs. Numerous options are available to customize the stack before you install IBM Cloud Private, including end-to-end TLS encryption. You can deploy and customize more ELK stacks from the catalog, or deploy other third-party solutions, offering maximum flexibility to manage your logs.
The management logging service offers a wide range of options to configure the stack to suit your needs:
- Memory allocation per pod (for more see IBM® Cloud Private logging and metrics capacity planning)
- Minimum disk size
- TLS encryption
- Filebeat node and namespace scoping
- Data retention policies
- Modifying the data retention policy for logging services
- Manually removing log indices
- Audit log collection
- Audit log collection
- Role-based authentication
- Access to Kibana through the ingress
- Enabling security for logging services
- Managing resource allocation for logging services
- Scaling logging services after IBM Cloud Private installation
- Updating logging service data collection filters
- Enabling Elastic monitoring
- Updating Elastic X-Pack licenses
ELK
ELK is an abbreviation for three products, Elasticsearch, Logstash, and Kibana, all developed by Elastic . Together they comprise a stack of tools that stream, store, search, and monitor data, including logs. A fourth Elastic component that is named Filebeat is deployed to stream the logs to Elasticsearch.
Configuration
The Elasticsearch deployment in IBM Cloud Private is configured to store documents in the /var/lib/icp/logging/elk-data
directory of each management node to which it is deployed. You can change this path before installation
by adding the following parameter to config.yaml
. The new path must exist on all management nodes in the cluster.
elasticsearch_storage_dir: <your_path>
Hardware requirements
Elasticsearch is designed to handle large amounts of log data. The more data that you choose to retain, the more resources it requires. You could prototype the cluster and applications before full production deployment to measure the impact of log data on your system. For detailed capacity planning information, see IBM® Cloud Private logging and metrics capacity planning.
Note: The default memory allocation for the managed ELK stack is not intended for production use. Actual production usage might be much higher. The default values provide a starting point for prototyping and other demonstration efforts.
Storage
The minimum required disk size generally correlates to the amount of raw log data generated for a full log retention period. It is also a good practice to account for unexpected bursts of log traffic. As such, consider allocating an extra 25-50%
of storage. If you do not know how much log data is generated, a good starting point is to allocate 100Gi
of storage for each management node.
Avoid NAS storage because you might experience latency issues, and it can introduce a single point of failure. For more information, see Disks .
You can modify the default storage size by adding the following block to the config.yaml
file:
elasticsearch_storage_size: <new_size>
Memory
The amount of memory that is required by each pod differs depending on the volume of logs to be retained. It is impossible to predict exact memory needs but you can start with the following guidelines:
- Allocate 16, 32 or even 64 GBs of memory for each data pod.
- Allocate 8, 16 or 32 GBs of memory for each client and master pod.
Insufficient memory can lead to excess garbage collection, which can add significant CPU consumption by the Elasticsearch process.
The default memory allocation settings for the managed ELK stack can be modified by adding and customizing the following lines in config.yaml
. In general, heapSize
value equals approximately half of the overall pod memoryLimit
value.
Note: The heap size is specified by using JDK units: g|G, m|M, k|K. The pod memory limit is specified in Kubernetes units: G|Gi, M|Mi, K|Ki.
logging:
logstash:
heapSize: "512m"
memoryLimit: "1024Mi"
elasticsearch:
client:
heapSize: "1024m"
memoryLimit: "1536Mi"
data:
heapSize: "1536m"
memoryLimit: "3072Mi"
master:
heapSize: "1024m"
memoryLimit: "1536Mi"
CPU
CPU usage can fluctuate depending on various factors. Long or complex queries tend to require the most CPU. Plan ahead to ensure that you have the capacity that is needed to handle all of the queries that your organization needs.
Docker integration
Every node in the cluster must configure Docker to use the JSON file driver. Docker streams the stdout
and stderr
pipes from each container into
a file on the Docker host. For example: if a container has Docker ID abcd
, the default location for some platforms to store output from the container is /var/lib/docker/containers/abcd/abcd-json.log
. The IBM Cloud Private
logging chart deploys a Filebeat daemon set to every node to stream the JSON log files into the ELK stack.
Kubernetes adds its own layer of abstraction on top of each container log. Under the default path, /var/log/containers
it creates a symlink that points back to each Docker log file. The symlink file name contains extra Kubernetes metadata
that can be parsed to extract four fields:
| 1 | 2 | 3 | 4 |
/var/log/containers/pod-abcd_default_container-5bc7148c976a27cd9ccf17693ca8bf760f7c454b863767a7e47589f7d546dc72.log
- The name of the pod to which the container belongs (stored as
kubernetes.pod
) - The namespace into which the pod was deployed (stored as
kubernetes.namespace
) - The name of the container (stored as
kubernetes.container_name
) - The container's Docker ID (stored as
kubernetes.container_id
)
Chart instances
You can deploy as many instances of the full ELK stack as hardware capacity permits. The Helm chart used to deploy the management logging service is published to the content catalog as well. Each instance of the chart is deployed as a self-contained stack. When security is enabled, each stack generates a custom set of certificates.
One common scenario is the need to isolate different sets of logs. This can be challenging because containers from multiple namespaces can be deployed to the same node, resulting in logs that are not related stored at a common path. The ibm-icplogging
Helm chart offers the option to restrict a particular ELK stack to collecting logs from specific namespaces, specific nodes, or both. The following examples demonstrate how to use the chart options to restrict the logs collected by an ELK stack.
Namespace
The
namespaces
parameter identifies one or more namespaces from which logs are collected.filebeat: scope: namespaces: - namespace1 - namespace2
Node
This option defines one or more labels to match against the nodes to which the Filebeat daemon set is deployed. For more information, see Attaching label to the node.
filebeat: scope: nodes: env: production os: linux
A guide is also available to update Filebeat node selections after you deploy a chart. See Customizing IBM® Cloud Private Filebeat nodes for the logging service.
Processing logs
Logstash
Logstash performs two roles. First, it buffers the data between Filebeat and Elasticsearch. This buffering protects against data loss and reduces the volume of traffic to Elasticsearch. Its second role is to further parse the log record to extract
metadata and make the data in the record more searchable. The following the default steps are taken by the ibm-icplogging
Logstash pod:
- Parse the log record's datestamp (stored by Docker at the time it was expressed by the container).
- Extract the container's name, namespace, pod ID, and container ID into individual fields.
- If the container generated a JSON-formatted log entry, parse it and extract the individual fields to the root of the log record.
The record is then stored briefly before Logstash sends it to Elasticsearch.
Elasticsearch
When a log record is sent to Elasticsearch, it becomes a document. Each document is stored within a named group that is called an index. When Logstash sends a record to Elasticsearch, it assigns it to an index with the pattern
logstash-<YYYY>-<MM>-<dd>
. Assigning each record to an index named after the day in which it was submitted makes it easier to track log retention policies.
Elasticsearch itself runs independently across three different pod types. Many other configurations are possible. This is the configuration that is chosen in the ibm-icplogging
Helm chart.
- The client pod exposes the REST API endpoints.
- The master pod tracks the state of the overall cluster, and also records metadata on where documents are stored. It plays a role in ensuring efficient storage and retrieval of data.
- The data pod is responsible for storage and retrieval of Elasticsearch documents.
Kibana
Kibana provides a browser-friendly query and visualization interface to Elasticsearch. It can be optionally excluded from deployment, although this is not recommended as Kibana is the default tool through which logs can be searched.
Data retention
A container is deployed as a curator within each ELK stack. The curator removes indexes from Elasticsearch that are older than the configured maximum index age. Take care when you store logs for long periods of time. Each additional day of retained logs increases the memory and storage resources that Elasticsearch requires.
To modify default values for the managed ELK stack curator, add and customize the following lines in your config.yaml
.
logging:
curator:
name: log-curator
image:
repository: "ibmcom/indices-cleaner"
tag: "2.0.0"
# Runs at 23:30 UTC daily
schedule: "30 23 * * *"
# Application log retention
app:
unit: days
count: 1
# Elastcisearch cluster monitoring log retention
monitoring:
unit: days
count: 1
# X-Pack watcher plugin log retention
watcher:
unit: days
count: 1
Timing
The curator is set to run on UTC time. Using a single time standard makes it easier to coordinate and anticipate curation across geographical regions.
The default launch time is set for half an hour before midnight UTC. The purpose is to avoid any risk that lag—perhaps due to congestion or system load—might start the curator after the midnight boundary and store more logs than expected.
PKI in Elasticsearch
Beginning with version 5.0, the old TLS enablement plug-in was deprecated and replaced with a new plug-in called X-Pack. X-Pack offers a number of extra features that are marketed to Enterprise users, but a license is required. The features are free for a 30-day limited-use period, after which all X-Pack functions are disabled.
Search Guard is another product that offers security-related plug-ins for the ELK stack. In contrast to X-Pack, some of its features are offered under a community edition category with no limitation on use. As stated by the readme file: Search Guard offers all basic security features for free. The Community Edition of Search Guard can be used for all projects, including commercial projects, at absolutely no cost.
TLS encryption with PKI is one of these community edition features.
By default, the IBM Cloud Private ELK stack uses Search Guard to provide PKI. If you already have a license for X-Pack, or plan to purchase one, you can specify the following parameters during deployment to configure the ELK stack to use X-Pack's PKI implementation. The customer is responsible for installation of the license after deployment.
logging:
security:
provider: xpack
Securing data-in-transit
Each deployment of the Elasticsearch stack is secured by default with mutual authentication over TLS. The managed ELK stack is also configured to use the IBM Cloud Private certificate authority to sign the certificates used by the stack. All other ELK stacks default to create their own certificate authority on deployment. To toggle security on or off for more ELK stacks, disable security in the catalog UI or the values override file for Helm deployment.
Helm
The following snippet can be added to a values override file for Helm deployment to enable or disable security.
security: enabled: true|false
Custom certificate authority
The default configuration of the managed ELK stack uses the IBM Cloud Private certificate authority (CA). You can find the CA in the cluster-ca-cert
secret in the kube-system
namespace. The secret has two fields (tls.crt
and tls.key
) that contain the actual certificate and its private key. All later deployments of the ibm-icplogging
Helm chart can use an existing certificate authority. Three requirements must be met:
- The CA must be stored in a Kubernetes secret.
- The secret must exist in the namespace to which the ELK stack is deployed.
- The contents of the certificate and its secret key must be stored in separately named fields (or keys) within the Kubernetes secret.
For example, given a sample secret like the following code:
apiVersion: v1
kind: Secret
metadata:
name: my-ca-secret
type: Opaque
data:
my_ca.crt: ...
my_ca.key: ...
You must then configure the Helm chart with the following subset of values:
security:
ca:
origin: external
external:
secretName: my-ca-secret
certSecretKey: my_ca.crt
keySecretKey: my_ca.key
Certificates
All connections to Elasticsearch must be configured to exchange a properly signed certificate when security is enabled. The IBM Cloud Private ELK stack architecture generates a number of certificates to apply to discrete roles. All are stored in
the same Kubernetes secret and use the following naming convention: <release_name>-ibm-icplogging-certs
.
ELK role | Description | Secret key name | Keystore | Key format |
---|---|---|---|---|
Initialization | Initializes Search Guard settings | sgadmin | JKS | PKCS12 |
Superuser | Elasticsearch administrator | superuser | PEM | PKCS1 |
Filebeat | Client to Logstash | filebeat | PEM | PKCS1 |
Logstash | Server for Filebeat | logstash | PEM | PKCS8 |
Logstash | Client for Elasticsearch log stream | logstash-monitoring | JKS | PKCS12 |
Logstash | Client for Elasticsearch monitoring | logstash-elasticsearch | JKS | PKCS12 |
Elasticsearch | REST API server | elasticsearch | JKS | PKCS12 |
Elasticsearch | Intra-node transport | elasticsearch-transport | JKS | PKCS12 |
Curator | Client to Elasticsearch REST API | curator | PEM | PKCS1 |
Kibana | Client to Elasticsearch REST API | kibana | PEM | PKCS8 |
Kibana proxy | Server for incoming connections | kibanarouter | PEM | PKCS1 |
Securing data-at-rest
The Elasticsearch stack does not offer data encryption at rest internally. The Elastic company recommends third-party solutions to achieve this goal. IBM Cloud Private has instructions for supported methods of encrypting data on disk. For more information, see Encrypting volumes that are used by IBM Cloud Private.
Role-based access
Version 2.0.0 of the ibm-icplogging
Helm chart (included in IBM Cloud Private 3.1.0) introduced a new module that provides role-based access controls (RBAC) for all Elasticsearch REST API invocations. The new module is available only
for managed ELK stacks.
The RBAC module is effectively a proxy that sits in front of each Elasticsearch client pod. All connections are required to have certificates signed by the Elasticsearch cluster CA. By default, this is the IBM Cloud Private root CA. The RBAC module
examines the request for an authorization
header and at that point enforces role-based controls. In general, the RBAC rules are as follows:
- A user with the role
ClusterAdministrator
can access any resource, whether audit or application log. - A user with the role
Auditor
is only granted access to audit logs in the namespaces for which that user is authorized. If audit logs are routed to ELK rather than the suggested existing enterprise SIEM tool, see IBM Cloud Private audit logging integration with enterprise SIEM tools. - A user with any other role can access application logs only in the namespaces for which that user is authorized.
- Any attempt by an auditor to access application logs, or a non-auditor to access audit logs, is rejected.
The RBAC rules provide basic data retrieval control for users that access Kibana. The rules do not prevent seeing metadata such as log field names or saved Kibana dashboards.
Post-deployment notes
- Kibana requires several minutes to optimize its plug-ins. You cannot access Kibana during this process. For more information, see Updating & Removing Plugins in the Elastic documentation.
-
Kibana might require some configuration for indexes after you start it. For more information, see Creating an Index Pattern to Connect to Elasticsearch in the Elastic documentation.
- Starting in IBM Cloud Private 3.1.2, a default index pattern is created and set in the logging stack in Kibana. The default index pattern provides an initial view into the logs. Previously, a user would be greeted with a message and has to manually create an index pattern before a useful visualization and searching would be available. Default index pattern creation can take several minutes after initial startup. As a result, the Kibana UI might not be available immediately after installation.
Viewing and querying logs
Kibana is the primary tool for interfacing with logs. It offers a Discovery view, through which you can query for logs that meet specific criteria. It is possible to collate logs through this view by using one or more of the fields that are automatically
added by the ibm-icplogging
ELK stack.
- kubernetes.container_id: A unique identifier that is generated by Docker for each container.
- kubernetes.container_name: The readable name for a container assigned by Kubernetes.
- kubernetes.pod: The name of the pod in which a container is deployed.
- kubernetes.namespace: The namespace into which the container's pod is deployed.
You might need to query logs based on other criteria that is not discoverable by the ELK stack. For example, middleware product, application name, or log level. To get the most accuracy from application logs, consider JSON formatted output. JSON
declares the names of the values in the log file rather than anticipating Elasticsearch to parse it accurately. The Filebeat daemon set that is deployed by the ibm-icplogging
Helm chart is preconfigured to parse JSON-formatted log
entries and set the values so they are searchable as top-level elements in Elasticsearch.
Sensitive data
You might be required to mask sensitive data before it reaches Elasticsearch. Logstash deploys with a helpful plugin named Mutate that offers many functions for
locating data that is considered to be sensitive. Adding these masks requires customization of the Logstash configuration, which is typically found in a configmap
resource named <release_name>-ibm-icplogging-logstash-config
.
release_name
refers to the release name given to a specific Helm chart deployment.
Modifications to the Logstash configuration will automatically propagate to the deployed containers after a short delay.
Modifications to configuration maps are lost if you redeploy the logging chart. For example, if you upgrade to a new version.
Streaming IBM Cloud Private platform logs off-site
Platform components are deployed into the IBM Cloud Private system namespace kube-system
by default. Also by default, only platform components will deploy to nodes labeled master
, management
, or proxy
.
In this scenario, it is possible to configure the managed ELK stack in the IBM Cloud Private system namespace to stream IBM Cloud Private platform logs to an off-platform collection service.
Complete the following steps to stream all IBM Cloud Private platform logs to an external service.
- Modify the Filebeat daemon set definition for the IBM Cloud Private system namespace to specify node affinity only to nodes labeled
master
,management
, orproxy
. - Modify the Logstash configuration for the stack that is deployed to the IBM Cloud Private system namespace to stream logs to an off-platform collection service. For more information, see the Logstash documentation.
- If no longer needed, delete the Elasticsearch and Kibana deployments and StatefulSets defined in the IBM Cloud Private system namespace.
Important notes about this configuration:
- The managed ELK stack will no longer collect any application logs.
- The configuration changes will not persist through an upgrade or rollback of the Logging release.
- Not every possible Logstash configuration change has been tested in the managed ELK stack. Depending on the changes made, it may be necessary to completely delete and recreate the Logging stack to return to the default state, losing previously collected logs and configuration.
- Some platform services may run on separate nodes, and would not have their logs captured. For example,
- Vulnerability Advisor runs on nodes with a separate label and would not be captured.
- Metering, and even Logging itself, utilize daemonsets that run on
worker
nodes. Logs from the components running onworker
nodes would not be captured.
- Additional logging stacks or log collection services can still capture the platform logs if they are configured to collect logs from the labeled cluster nodes.
- Kibana in the managed ELK stack may fail to load or may fail to have access to collected logs depending on configuration changes that are made.
Elasticsearch APIs
Elasticsearch has a high degree of flexibility and a thoroughly documented API. Secure installation of the ELK stack restricts API access to internal components that use mutual authentication over TLS as described in preceding sections. Therefore,
external access to Elasticsearch data is only available to users that are authenticated through Kibana. You can also use the dev tools
panel in the Kibana user interface to access the Elasticsearch API. If more ELK stacks are deployed
in standard mode, Kibana access is not protected by IBM Cloud Private authentication or authorization controls.
Note: These APIs work only to query or operate on data that is presently tracked in the Elasticsearch data store. They do not affect backups.
- Delete by query Delete entire documents (for example, log entries) that match a specific query.
- Update by query Similar to the
delete-by-query
API, but you can either modify the contents of fields, or entirely remove specific fields from documents. For an example of removing fields, see Update API. - Bulk operations. As log data accumulates, certain operations can take longer to complete. The bulk API is designed to improve performance by enabling multiple operations within the context of the same request.