Monitoring Kubernetes

Instana can help you access detailed Kubernetes information, analyze Kubernetes calls, link Kubernetes services and logical services, use built-in or custom health rules for Kubernetes entity alerts, and trace workloads that are deployed on the service meshes.

Supported versions
- Supported managed kubernetes
- Supported service meshes
Installing the Instana agent in kubernetes
Accessing Kubernetes information
Analyzing Kubernetes calls
Analyzing Kubernetes logs
Linking Kubernetes services and logical services
- Single Kubernetes service to multiple logical services
- Single logical service to multiple Kubernetes services
Viewing metrics
- Cluster
- CronJob
- DaemonSet
- Deployment
- Job
- Kubernetes service
- Namespace
- Node
- Pod
- StatefulSet
Health rules
- Built-in
- Custom
Service meshes
Troubleshooting notes
Child topic

Supported versions

Instana supports the current stable versions of Kubernetes. According to the Kubernetes version compatibility guarantee, Instana supports the latest Kubernetes version, the earlier two versions, and the later two versions. However, the earlier two versions are considered as a soft deprecation.

Supported managed kubernetes

IBM Cloud Kubernetes Service Monitoring and Performance Management
Amazon Elastic Container Service for Kubernetes (EKS)
Azure Kubernetes Service (AKS)
Google Kubernetes Engine (GKE)
IBM Cloud Kubernetes Service
VMware Tanzu Kubernetes Grid (TKG) and VMware Tanzu Kubernetes Grid Integration (TKGI), formerly known as Pivotal Container Service (PKS)

Only Linux workers are supported. Workers that run on Windows are not supported.

Supported service meshes

Instana supports the last three stable versions of Istio.

Installing the Instana agent in kubernetes

To monitor Kubernetes with Instana, you need to install the Instana host agent in your Kubernetes cluster.

For more information about the host agent installation steps, see Installing the host agent on Kubernetes.

The installation of Instana agents on VMware Tanzu Kubernetes Grid is fully automated by the Instana Microservices Application Monitoring for VMware Tanzu tile.

Accessing Kubernetes information

After the agent is deployed to your cluster, the Kubernetes sensor reports detailed data about the cluster and the resources that are deployed into it.

Instana automatically detects and monitors all resources that are running in the Kubernetes cluster:

Clusters
CronJobs
Nodes
Namespaces
Deployments
DaemonSets
StatefulSets
Services
Pods

Kubernetes information is easily accessible and deeply integrated in all aspects of your application.

Kubernetes page

Click Platforms > Kubernetes in the navigation menu of the Instana UI, you can see the information of your Kubernetes clusters and namespaces.

Kubernetes dashboards

On the Kubernetes page, click a cluster or a namespace. You can see Kubernetes dashboards that present all the information for a certain Kubernetes entity. The context is always accessible by the context path in the upper left. In the following screenshot, you can see a namespace that is named "robot-shop" in a cluster called "k8s-demo-cluster".

Dashboard overview

Kubernetes dashboards are structured as follows:

Summary shows the most relevant information for a certain entity. This dashboard starts with a status line that shows the status and related information, such as age. In the next section, you can see the CPU, memory, and pod information, which provides the consumed resources, including the pods. Sections like Top Deployments and Top Pods in the following screenshot show potential hotspots, which you might want to have a look at. The Logs section shows you the distribution chart of relevant logs for that entity, which complements the entity metrics. The chart is interactive and allows selection and highlight over all the measured values. You can focus on the selected time period or jump to the Analyze section to continue a troubleshooting journey.
Details shows detailed information like "labels", "annotation", and the "spec".
Events shows all relevant Kubernetes events and links them to the respective dashboards.
Related Entities like "Deployments", "K8s Services", and "Pods" are shown as tabs of Kubernetes dashboards. What is shown depends on the entity that you selected.

CPU and memory usage

For Kubernetes pods, deployments, services, namespaces, and nodes, you can view current CPU and Memory usage as it compares to the CPU and Memory limits and requests set for these resources.

If available, the usage information is calculated from data gathered from the container runtime that is running the containers that make up the resources.

Applications page

Click Applications in the navigation menu of the Instana UI, and then click the Applications or the Services tab. If the service or application is running on a Kubernetes cluster, you can see the respective context information in the Infrastructure tab:

AP Infra Tab

For containers, the pod and namespace are displayed and directly linked; for hosts, the cluster and node are also shown and linked.

Infrastructure page

Click Infrastructure in the navigation menu of the Instana UI. In the Infrastructure map, you can see Kubernetes information in the sidebar for either the host or the container that you select.

AP Infra Tab

You can use Dynamic Focus to filter the data. For example, search for a specific deployment in a cluster. Additionally, the keywords entity.kubernetes.cluster.distribution and entity.kubernetes.cluster.managedBy enable searching for a Kubernetes cluster by distribution and management layer. Supported values for entity.kubernetes.cluster.distribution are gke, eks, openshift, and kubernetes. Supported values for entity.kubernetes.cluster.managedBy are rancher and none.

Analyzing Kubernetes calls

Unbounded Analytics gives you powerful tools to slice and dice every call in your Kubernetes cluster. If you click Analyze Calls from a Kubernetes dashboard, the appropriate filter and grouping is already set. In this case, you can see all calls in the robot-shop namespace that are grouped by pods:

Dashboard overview

Analyzing Kubernetes logs

Unbounded Analytics gives you powerful tools to slice and dice every log in your Kubernetes cluster. If you click Analyze Logs from a Kubernetes dashboard, the appropriate filtering is already set. In this case, you can see all logs in the robot-shop namespace as follows:

Dashboard overview

To provide relevant information without changing context, Instana enriches log messages with infrastructure and Kubernetes metadata, which is displayed in the tag table after the log message is expanded. See the following tag table:

Tag Table overview

Linking Kubernetes services and logical services

Single Kubernetes service to multiple logical services

Multiple logical services can be related to a single Kubernetes service when the service-mapping rules match up and calls are generated on that Kubernetes service. For example, a Kubernetes service with the label selector "service=my-service" might contain pods that have the additional labels "env=dev" and "env=staging" combined with a custom service-mapping configuration in Instana with the following tags kubernetes.container.name, kubernetes.pod.label, and key: env. It results in multiple logical services that are linked to that single Kubernetes service and displayed on the Kubernetes Service dashboard.

Single logical service to multiple Kubernetes services

Multiple Kubernetes services can be related to a single logical service when those Kubernetes services are destroyed and re-created over time. For example, if the Kubernetes service shop-service-a with generated calls is replaced over time by shop-service-b with generated calls, both services are displayed on the logical service dashboard when the period of time selected overlaps the calls that were generated.

Viewing metrics

Instana collects information about the Kubernetes cluster, CronJob, DaemonSet, Deployment, Job, Kubernetes service, namespace, node, and StatefulSet.

Cluster

Metric	Description
Pods Allocation	Ratio of allocated pods to pods capacity
CPU Requests Allocation	Ratio of CPU requests to CPU capacity
CPU Limits Allocation	Ratio of CPU limits to CPU capacity
Memory Requests Allocation	Ratio of memory requests to memory capacity
Memory Limits Allocation	Ratio of memory limits to memory capacity
CPU Requests	Aggregated CPU requests of all running containers
CPU Limits	Aggregated CPU limits of all running containers
CPU Capacity	Aggregated CPU capacity of all nodes
Memory Requests	Aggregated memory requests of all running containers
Memory Limits	Aggregated memory limits of all running containers
Memory Capacity	Aggregated memory capacity of all nodes
Running Pods	Count of all running pods in this cluster
Pending Pods	Count of all pending pods in this cluster
Allocated Pods	Count of all allocated pods in this cluster
Pods Capacity	Aggregated pods capacity of all nodes
Out Of Disk Nodes	Count of out of disk nodes in this cluster
Memory Pressure Nodes	Count of memory pressure nodes in this cluster
Disk Pressure Nodes	Count of disk pressure nodes in this cluster
Kubelet Not Ready nodes	Count of kubelet not ready nodes in this cluster
Available Replicas	Available replicas from all deployments
Desired Replicas	Desired replicas from all deployments
Nodes Count	Number of nodes in this cluster

CronJob

Metric	Description
Last Job Duration	Duration of last job run
Active Jobs	Number of active jobs
Time To Last Scheduled Job	How long ago a job for this cronjob was scheduled

DaemonSet

Metric	Description
Available Replicas	Count of available replicas
Desired Replicas	Count of desired replicas
Unavailable Replicas	Count of unavailable replicas
Misscheduled Replicas	Count of misscheduled replicas
Available to Desired Replica Ratio	Ratio of available to desired replicas

Deployment

Metric	Description
Available Replicas	Count of available replicas
Desired Replicas	Count of desired replicas
Available to Desired Replica Ratio	Ratio of available to desired replicas
Pending Pods	Count of pending pods
Unscheduled Pods	Count of unscheduled pods
Unready Pods	Count of unready pods
Pending Phase Duration	Duration of pending phase
Pods Count	Number of pods for this deployment
Memory Requests	Aggregated memory requests of all running containers for this deployment
Memory Limits	Aggregated memory limits of all running containers for this deployment
CPU Requests	Aggregated CPU requests of all running containers for this deployment
CPU Limits	Aggregated CPU limits of all running containers for this deployment

Job

Metric	Description
Active Pods	Number of active pods in this job
Failed Pods	Number of failed pods in this job
Succeeded Pods	Number of succeeded pods in this job
Job Duration	Duration of job run

Kubernetes service

Metric	Description
CPU Requests	Aggregated CPU requests for this service
CPU Limits	Aggregated CPU limits for this service
Memory Requests	Aggregated memory requests for this service
Memory Limits	Aggregated memory limits for this service

Namespace

Metric	Description
Memory Requests Capacity	Maximum supported memory for memory requests on this namespace
Used Memory Requests	Amount of memory allocated to used memory requests
Memory Limits Capacity	Maximum supported memory for memory limits on this namespace
Used Memory Limits	Amount of memory allocated to used memory limits
CPU Requests Capacity	Maximum supported CPU for CPU requests on this namespace
Used CPU Requests	Amount of CPU allocated to used CPU requests
CPU Limits Capacity	Maximum supported CPU for CPU limits on this namespace
Used CPU Limits	Amount of CPU allocated to used CPU Limits
Used Pods	Number of pods used for this namespace
Pods Capacity	Number of pods the namespace can take
Used Pods Allocation	Ratio of used pods to pods capacity
CPU Requests Allocation	Ratio of CPU requests to CPU capacity
CPU Limits Allocation	Ratio of CPU limits to CPU capacity
Memory Requests Allocation	Ratio of memory requests to memory requests capacity
Memory Limits Allocation	Ratio of memory limits to memory limits capacity
Pods Allocation	Ratio of allocated pods to pod capacity

Node

Metric	Description
Allocated Pods	Count of allocated pods on this node
Pods Capacity	Number of pods the node can take
Memory Requests	Aggregated memory requests of all running containers on this node
Memory Limits	Aggregated memory limits of all running containers on this node
Memory Capacity	Maximum supported memory on this node
CPU Requests	Aggregated CPU requests of all running containers on this node
CPU Limits	Aggregated CPU limits of all running containers on this node
CPU Capacity	Maximum supported CPU on this node
Pods Allocation	Ratio of allocated pods to pod capacity
CPU Requests Allocation	Ratio of CPU requests to CPU capacity
CPU Limits Allocation	Ratio of CPU limits to CPU capacity
Memory Requests Allocation	Ratio of memory requests to memory capacity
Memory Limits Allocation	Ratio of memory limits to memory capacity

Pod

Metric	Description
Containers Count	Number of containers for this pod
CPU Requests	Aggregated CPU requests on all containers of this pod
CPU Limits	Aggregated CPU limits on all containers of this pod
Memory Requests	Aggregated memory requests on all containers of this pod
Memory Limits	Aggregated memory limits on all containers of this pod
Restarts Count	Aggregated restarts on all containers of this pod

StatefulSet

Metric	Description
Available Replicas	Count of available replicas
Desired Replicas	Count of desired replicas
Available to Desired Replica Ratio	Percentage of available to desired replicas

Health rules

Built-in

A couple of built-in health rules exist that trigger an issue for Kubernetes entities.

Cluster
- Kubernetes reports that a Master-Component (apiserver, scheduler, and controller manager) is unhealthy. Due to a bug in Kubernetes, the health is not always reported reliably. Instana tries to filter the health status of the Master-Component, not causing an alert, but showing only the health status on the cluster detail page.
Node
- The requested CPU is approaching max capacity. The ratio of requested CPU to CPU capacity is greater than 80%.
- The requested memory is approaching max capacity. The requested memory to memory capacity ratio is greater than 80%.
- Allocated pods are approaching maximum capacity. The allocated pods to pods capacity ratio are greater than 80%. For a node, pods in the phases 'Running' and 'Unknown' are counted as allocated. For more information about node capacity, see Kubernetes docs.
- The node reports a condition that is not ready for more than one minute, and all conditions for this node are beyond the "Ready" condition. For more information about all node conditions, see Kubernetes docs.
Namespace
- The requested CPU is approaching max capacity. The ratio of requested CPU to CPU capacity is greater than 80%.
- The requested memory is approaching max capacity. The requested memory to memory capacity ratio is greater than 80%.
- Allocated pods are approaching maximum capacity. The allocated pods to pods capacity ratio are greater than 80%. For a namespace, pods in the phases 'Pending', 'Running', and 'Unknown' are counted as allocated. The namespace capacity values are based on ResourceQuotas that can be set per namespace. For more information, see Kubernetes docs.
Deployment
- Available replicas less than desired replicas.
Pod
- A pod must be ready within one minute of being deployed, but if it is not ready within one minute, the reason is not that it has completed its task (PodCondition=Ready, Status=False, Reason!= PodCompleted). For more information about all pod conditions, see Kubernetes docs.

Custom

In addition to the built-in rules, you can also create custom rules on metrics of a cluster, namespace, deployment, and pod. For example, if the threshold for node capacity warnings is too high, you can disable them and create a custom rule with a lower threshold. For more information, see Events and incidents configuration.

Service meshes

OpenShift ServiceMesh

See the OpenShift FAQs on the OpenShift ServiceMesh.

Istio

Using the `agent.serviceMesh.enabled` flag

You can enable the Instana agent JVM monitoring with Istio service mesh by using the agent.serviceMesh.enabled flag. This Kubernetes-native approach uses a single dedicated network port for all Java workloads that are monitored on a single host or node. The default value is set to true. For more information about the configuration parameter, see Helm Chart configuration.

If the Istio configuration is set to REGISTRY_ONLY, additional steps are required for the agent socket service to work properly.

You need to deploy the following resource definition for each individual cluster node. Make sure to define a unique metadata.name property for each host or node. Also, set the value for spec.hosts to <node-ip-address>.instana-agent-headless.instana-agent.svc, where <node-ip-address> is the node's IP address.

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: instana-agent-worker-<node-unique-counter>
spec:
  hosts:
  - <node-ip-address>.instana-agent-headless.instana-agent.svc
  ports:
  - number: 42699
name: agent
    protocol: TCP
  - number: 42666
name: socket
    protocol: TCP
  resolution: DNS
  location: MESH_EXTERNAL

Using service mesh bypass

The alternative legacy approach is to enable service mesh bypass. The default installation of Istio works out-of-the-box with Instana. If you deploy Istio with a default deny policy (mode: REGISTRY_ONLY), you can enable Instana's service mesh bypass by using the following agent configuration:

com.instana.container:
  serviceMesh:
    enableServiceMeshBypass: true

The setting bypass blocked network connectivity in two different ways:

Allow outgoing traffic from the application pod to the host agent (on all IPv4 addresses which the agent listens on, and all ports).
Allow incoming traffic to the application pod from the agent for JVM applications (from all ipv4 addresses which the host agent listens on, all ports).

Debugging the mesh by-pass

To debug the service mesh by-pass, follow the steps:

Verify that the service mesh by-pass is enabled.
Verify that the iptable rules are applied to the container.

Verify enabled

To verify whether the service mesh by-pass is enabled, check in the Instana agent logs by running the following command:

kubectl logs -l app.kubernetes.io/instance=instana-agent -n instana-agent -c instana-agent

If the service mesh by-pass is enabled, you can find the following log lines, which indicate that an inbound or outbound by-pass entry is written for the denoted process:

Inbound by-pass:

2021-04-26T08:13:57.065+0000 | INFO  | -client-thread-2 | DefaultServiceMeshSupport        | 51 - com.instana.agent - 1.1.597 | Applying inbound service mesh bypass for process '764670'

Outbound by-pass:

2021-04-26T08:13:57.140+0000 | INFO  | -client-thread-2 | DefaultServiceMeshSupport        | 51 - com.instana.agent - 1.1.597 | Applying outbound service mesh bypass for process '764670'

Verify iptable rules

The easiest way to verify that the iptable rules is to shell into the Instana agent and list the target container iptables rules as follows. Replace ${PID} with the pid of the JVM process:

kubectl -n instana-agent exec -it ${INSTANA_AGENT_POD} -c instana-agent  -- /bin/bash
nsenter -n -t ${PID} iptables -t nat -n -L INSTANA_OUTPUT

If the chains are applied, you can see an output as follows:

Chain INSTANA_OUTPUT (1 references)
target     prot opt source               destination
ACCEPT     tcp  --  0.0.0.0/0            10.128.15.237
ACCEPT     tcp  --  0.0.0.0/0            10.64.0.1
ACCEPT     tcp  --  0.0.0.0/0            169.254.123.1

Check whether bidirectional communication between the Instana agent and your JVM processes is supported by running the following command:

nsenter -n -t ${PID} iptables -t nat -n -L INSTANA_INBOUND

The result is similar to the following output:

Chain INSTANA_INBOUND (1 references)
target     prot opt source               destination
ACCEPT     tcp  --  10.128.15.237        10.64.0.14
ACCEPT     tcp  --  10.64.0.1            10.64.0.14
ACCEPT     tcp  --  169.254.123.1        10.64.0.14

Depending on when the iptable rules were applied, it can take a few minutes for the process to be instrumented and the data to be visible in Instana's dashboards.

Troubleshooting notes

Why am I not seeing any Kubernetes clusters or namespaces?

If no clusters or namespaces are listed on the Kubernetes page, either no cluster is being actively monitored due to an agent not being installed, or no clusters are being monitored during your selected time frame.

Click Live to check for any clusters and namespaces in live mode, and if none are listed, you need to install the Instana agent in kubernetes.

Missing clusterRole permissions

Monitoring issue type: kubernetes_missing_permissions

The Instana agent requires the appropriate ClusterRole permissions for specific resources to be able to monitor a Kubernetes cluster successfully. If these permissions are missing, corresponding resources are missing on the Instana Kubernetes dashboards. To resolve this issue, install the latest version of the Instana Agent YAML, Helm chart, or Operator. For more information about the latest version of each installation method, see Kubernetes or OpenShift.

Deprecations

Instana deprecated support for the extensions/v1beta1 and apps/v1beta2 API versions for DaemonSet, Deployment, and ReplicaSet in the Kubernetes Sensor. Obsolete API versions were removed in Kubernetes v1.16. For more information, see announcement from Kubernetes.

Child topic

The Instana AutoTrace webhook is a Kubernetes and OpenShift-compatible admission controller mutating webhook. It automatically configures the Instana tracing on Node.js, .NET Core, Ruby, and Python applications that run across the entire Kubernetes or Red Hat OpenShift cluster.

After you install the Instana agent, enable the Instana AutoTrace webhook.

Monitoring Kubernetes

Supported versions

Supported managed kubernetes

Supported service meshes

Installing the Instana agent in kubernetes

Accessing Kubernetes information

Kubernetes page

Kubernetes dashboards

CPU and memory usage

Applications page

Infrastructure page

Analyzing Kubernetes calls

Analyzing Kubernetes logs

Linking Kubernetes services and logical services

Single Kubernetes service to multiple logical services

Single logical service to multiple Kubernetes services

Viewing metrics

Cluster

CronJob

DaemonSet

Deployment

Job

Kubernetes service

Namespace

Node

Pod

StatefulSet

Health rules

Built-in

Custom

Service meshes

OpenShift ServiceMesh

Istio

Using the agent.serviceMesh.enabled flag

Using service mesh bypass

Debugging the mesh by-pass

Verify enabled

Verify iptable rules

Troubleshooting notes

Why am I not seeing any Kubernetes clusters or namespaces?

Missing clusterRole permissions

Deprecations

Child topic

Using the `agent.serviceMesh.enabled` flag