Infrastructure correlation

This page describes how the worlds of application monitoring and infrastructure monitoring are integrated together. While some users may mostly rely on application monitoring, it is at times useful to gain a better understanding of how the logical layer is mapped to the physical layer, and required when it comes to troubleshooting issues detected at the application level, but whose root cause lies in the infrastructure layer.

Instana allows bi-directional navigation in the UI between application and infrastructure monitoring:

From applications and services to infrastructure monitoring
From calls to infrastructure monitoring
From infrastructure monitoring to application monitoring

Infrastructure correlation also plays a significant role in:

application and service mapping
incidents creation

Similarly application monitoring and website monitoring are integrated together as explained in more details here.

Navigation from services
Navigation from calls
Navigation from Infrastructure
How it works
Infrastructure correlation in application and service mapping
Infrastructure correlation and incidents

In its simplest form a service is executed by a single process, e.g. a spring boot application. However it is pretty common that it is executed by multiple processes to support various use cases:

increase throughput and resilience, the more processes the better
to divide up the work by different tenants or configurations
to run different versions of that service, usually in different environments

That means that from a service you can usually navigate to one or multiple underlying processes and related entities such as hosts, containers, Kubernetes pods, etc.

One entry point is the Stack, inside the Infrastructure tab, which tells you everything about what pieces of infrastructure takes care of running the service, including their health and important metrics like the CPU or memory usage.

Stack from Service

Another entry point is the Infrastructure tab of a service dashboard, which gives you a different breakdown than the Stack and in addition provides tracing metrics such as call count, error rate, and mean latency from the host level down to the process level.

Infra tab

Both of the above widgets offer Kubernetes and Pivotal Cloud Foundry as first-class citizens giving you direct access to related entities like Kubernetes services or PCF applications.

Example of a Stack for a service running in Kubernetes:

Stack for Kubernetes

Example of an Infrastructure tab for a service running in Kubernetes:

Infra tab for Kubernetes

Generally traces are linked to infrastructure entities. More specifically a call can be linked to up to 2 processes: the source initiating the call and the destination handling that call. For example a NodeJS process makes an HTTP call to a PHP process.

In the Trace View, you can select any call to open up the Call Details. From both the source and destination sections you are able to know what pieces of infrastructure were involved.

In this example the source is a PHP process while the destination is a spring boot application (Java process):

Call Details

Sometimes the correlation to infrastructure is not possible, which usually means that the underlying process is not monitored by Instana.

This is always the case for the root call of a trace because an external, unmonitored client makes the initial request. Here the shipping service is called from something that is not monitored by Instana:

Call Details root call

Another case is when outgoing calls target an external third-party service. Here the payment service called the external www.paypal.com external service.

Call Details external service

When looking at infrastructure or platform (Kubernetes, Pivotal Cloud Foundry, and vSphere) entities, you can use the Stack to get a list of services it is executing, and applications it is part of along with metrics such as call and error count.

Stack from Infrastructure

Upstream/Downstream gives you access to services and applications which respectively call and are called by the current infrastructure entity.

Upstream/Downstream from Infrastructure

How it works

Before anything, it is important to understand that application and infrastructure monitoring are powered by two distinct data pipelines:

Application monitoring: the data (traces and calls.md) come from the Instana tracers or third-party tracers.
Infrastructure monitoring: the data (tags and metrics.md) come from the Instana sensors.

These 2 worlds merge seamlessly thanks to a mechanism that we call infrastructure linking, where calls are linked to monitored infrastructure entities. Linking occurs when a common identifier on both sides is found.

Instrumented services

Tracers instrument your processes to capture incoming and outgoing calls. These calls are then reported to the Instana backend where we attempt to link the source and destination of those calls to some known infrastructure entities. When the source process (or destination process) is instrumented, it necessarily implies that the source process (or destination process) is also monitored by an Instana sensor, which knows everything about it. Because both the tracer and sensor are co-located they both know the host and process, which makes infrastructure linking possible.

For example, a Python process is instrumented by the Python tracer, which captures all of the incoming and outgoing calls. Meanwhile several sensors got activated on the host where this process is running: the host, process, and python sensors. Both the tracer and sensors send data separately to the Instana backend but they both contain the same identifier to the process. It is therefore possible to link the destination of the incoming calls, and the source of the outgoing calls, to the Python process.

Databases, messaging systems, and cloud services

Instana tracers do not instrument databases, messaging systems, nor cloud services. However processes which call these untraced systems are instrumented and therefore outgoing requests are properly mapped to calls. For example the Java tracer records outgoing requests from a Java process to a MySQL database and these are analyzed into calls with the Java process as the source and MySQL database as the destination. These calls are visible in Instana and their destination is usually linked to the infrastructure entity which receives the call. How is it possible?

On the one hand, Instana does monitor the database or messaging system through one of the Instana sensors and therefore knows about the process, its port, and the host. On the other hand Instana analyzes an outgoing request which may contain enough information to guess the destination process, usually the hostname or IP and port which is carried, e.g. in the connection string.

For example, an outgoing request to a MySQL instance could contain the connection string jdbc:mysql://10.128.0.6:3306.

MySQL connection string

Infrastructure monitoring detected a corresponding MySQL process exposing the port 3306 and runnning on a host which exposes the IP 10.128.0.6.

MySQL instance

Because of both the IP and port match, the calls and the MySQL instance are linked together:

Call linked to MySQL instance

Instana also supports connection strings which contain a Kubernetes service name like jdbc:mysql://mysql-svc. Behind the scenes it will attempt to fully qualify the service name to uniquely identify the service across all namespaces and clusters. The result is a call whose destination is linked to the Kubernetes service, instead of the final process.

For cloud services there are no processes but the idea is the same: find a common identifier shared by the monitored cloud service and the outgoing request to that service. This could be for example a resource identifier like a AWS ARN.

Linking calls to infrastructure is sometimes not possible when the host or IP given in the connection string does not match any of hosts or IPs known from the infrastructure monitoring side. It is usually the case when there is a level of indirection where the process calling the remote database (or messaging service or cloud service) uses a hostname which is:

an entry in the /etc/hosts system file
a DNS CNAME entry
a pointer to a proxy or load balancer
an alias given by a Service Discovery service like Consul or Zookeeper

External services

External services are by definition not monitored by Instana and therefore not even visible on the infrastructure monitoring side. Because we know nothing about them, calls to these services are simply not linked to any known infrastructure entities.

In the Infrastructure tab, you can identify these calls as "Unmonitored":

Infra tab

Infrastructure correlation in application and service mapping

What is the role of infrastructure correlation in application and service mapping?

When the Instana backend analyzes traces and calls, it will first link them to known infrastructure entities, and enrich them with infrastructure tags such a host.name, springboot.name or docker.label. These tags are then used to automatically map these calls to services using pre-defined rules or user-defined rules. For example, a call linked to a spring boot process will be mapped to a service which gets its name from the springboot application name. Or you could define a docker label service-name which could be used to create a custom service mapping rule to name most of your services running in Docker.

Custom service mapping

Same is true for application mapping where you can use these infrastructure tags to define applications, for example using the kubernetes.namespace tag:

Application configuration

When infrastructure linking is not possible, service mapping cannot rely on infrastructure tags and rely instead on so called fallback rules which are defined using call tags, like call.http.host or call.database.schema.

Infrastructure correlation and incidents

An incident groups related events by leveraging the Dynamic Graph. The ability to link calls (and therefore applications and services) to infrastructure entities enriches the dynamic graph with additional connections bridging the two worlds and will therefore result in even more complete incidents and faster root cause analysis.