August 12, 2020 By IBM Instana Team 7 min read

Java has been one of the most popular programming languages for well over a decade. Such prolific usage has created a need for robust Java monitoring tools that can run in production and identify issues across every layer of the stack. This article is a deep dive into the most important aspects of Java, Java Virtual Machine (JVM) and full-stack monitoring that you might not have considered before.

Important Java and JVM background

What is Java?

The answer to this question is not simple. The term Java is used interchangeably to reference multiple elements. Beginning with the actual programming language Java, it’s a language with a syntax heavily borrowed from languages like C or C++. Applications written in Java are compiled into bytecode, an intermediate language, and executed in a specific execution environment.

What’s the difference between Java and JVM?

That question brings us to the second bit, which is referred to as Java. Oftentimes, when people refer to Java, what they really mean is JVM. JVM is the environment used to read, understand and execute the Java bytecode. Some implementations also support just-in-time (JIT) compiling, the “just-in-time” translation of bytecode to actual machine code that’s specific to the system the JVM is running on. While initially developed by Sun Microsystems and later acquired by Oracle, many different vendors and communities provide JVM implementations and development kits, such as Oracle, Azul Systems Zulu and Zing, IBM, Amazon, OpenJDK and AdoptOpenJDK, with a multitude of options in terms of support and features over the officially defined standard.

Can other programming languages run inside of a JVM?

Here’s where we need to discuss the Java ecosystem, which builds on top of the JVM platform and bytecode specification but provides programming language options besides the Java language itself. These additional languages include, but are not limited to, Kotlin, Clojure, JRuby, Jython, Scala and many more. They all compile the source code to Java bytecode and are referred to as JVM or Java platform languages.

Full-stack Java monitoring

With multiple components being referred to as Java, the answer on how to monitor Java isn’t as simple as it sounds. To make things easier, let’s break down the concept of full-stack Java monitoring into three distinct categories:

  • Java or JVM metrics monitoring
  • Distributed tracing across services
  • Java code profiling

Java JVM metrics

When people look for Java monitoring, they most commonly look for a way to monitor the Java platform, the JVM. Java has for a long time offered Java Management Extensions (JMX), which provide information about the runtime state of the JVM itself, the garbage collector and other internal elements. Furthermore, it can be extended by the running application to provide additional metrics to outside metric collectors. With the help of JMX, clients can collect, show and gather metrics. During development, common tools include VisualVM and JDK Mission Control, also simply known as Mission Control. The latter one includes an additional way to profile applications and collect data from the JVM at extremely low overhead, called Java Flight Recorder (JFR).

JBossAS running on the JVM and VisualVM connected presenting JVM metrics.

This setup works for any JVM-based language and application. For application-specific metrics, though, additional monitoring tools like Prometheus, StatsD or Micrometer may be required.

An additional, commonly found alternative is using an open-source tool like Prometheus to gather and collect metrics. Prometheus provides several integrations and wrappers for commonly found Java-based tools and frameworks, which provide a quick way to integrate it with the services in question. It also has support for many other technologies over Java. Every tool listed in this section has a major shortcoming. They all collect and display metrics in isolation without the context of responsiveness or service dependencies. Metrics in isolation can be helpful at times, but metrics with proper context will always help you find the root cause of a problem in the fastest possible time. We’ll explore how to get this context when we talk about how the Instana platform approaches Java monitoring.

Distributed tracing

When using a JVM-based language to implement application services, it’s absolutely necessary to do more than just metric monitoring. Identifying and solving problems with Java applications quickly requires an in-depth understanding of the architecture and the communication between services. This understanding is achieved by performing end-to-end tracing of requests flowing throughout the system.

Open-source APIs like OpenTracing or OpenTelemetry help collect distributed traces of the different services and technologies in use. Unfortunately, it’s up to the user to integrate tracing points into their application, which is a tedious process and must be maintained, and even worse, there’s no context carried between the traces, metrics and dependencies.

To complicate matters further, the infrastructure required to collect, store and analyze data from open-source point solutions needs to be set up and managed by the user. Cost for data storage and computation needs to be accounted for. While good, for example, Prometheus was not designed to scale out, therefore, the setup will grow to multiple instances over time.

The biggest issue is the burden of manually correlating metrics, data and distributed traces across different sources. Due to the nature of open source, different tools store their information chunks differently. As a result, the user is left with a set of disconnected elements, trying to piece together the jigsaw puzzle. Trying to connect the dots during an outage situation, as well as getting to the root cause, is an unnecessarily complicated and lengthy process when using operations support system (OSS) tools.

Java code profiling

Code profiling has been around for decades. It started out as something you would only do in a development or test environment because it used to be very heavy from a CPU and memory perspective. Over the years, production-grade code profilers have been created that are extremely lightweight and, therefore, can be run against a production environment. The purpose of a code profiler is to answer the following question: What code is causing a problem within my running application service?

Full-stack Java monitoring using the Instana platform

Java monitoring from the Instana platform automatically collects and correlates metrics, distributed traces and code profiles with almost no effort.

When using the Instana platform to monitor Java services and applications, the Instana Agent automatically and continuously discovers JVMs and technologies being using inside of it:

  • JVM vendor
  • JVM version
  • Framework
  • Database connectors
  • Upstream and downstream services
  • Many more

Furthermore, the Instana platform automatically discovers Java services being deployed into many different environments like Docker, Kubernetes, the Red Hat® OpenShift® Platform, IBM® Cloud Foundry, or running as plain processes on the host machine. After discovering the JVM, the Instana Agent connects to the running process and analyzes the architecture of the service, discovering additional aspects, such as the framework being used to develop the service like Spring Boot and Dropwizard, application servers like WildFly, IBM WebSphere®, and database connectors. In the last step, the Instana Agent automatically instruments the running process and starts capturing important metrics and distributed traces right away. No process restart is required.

At IBM, we recognize the different perspectives one may have on Java, and use appropriate entities in our dynamic graph—logical model of the stack, services and dependencies—to represent them. So, for example, a Spring Boot application is represented by:

  • A Spring Boot entity
  • Java entity
  • JVM
  • Docker container
  • Operating system process
  • Linux® host

Think of this list as a vertical dependency stack where different metrics need to be collected at every layer. The Instana platform automatically collects these metrics and models the vertical dependencies, as shown in the following screenshot.

But there’s more to an application than individual stacks. The services sitting within those stacks talk to each other, creating dependencies of their own. You can visualize these stacks as horizontal or cross service dependencies. Again, the Instana platform automatically detects all calls between services and creates distributed traces showing the end-to-end flow of every request. You can see this process represented in the screenshot that follows.

How can I know what Java method is causing a problem?

The final piece of the Java monitoring puzzle is production code profiling. We already have full-stack metrics and distributed traces so we can understand if resource contention is an issue and exactly which service is having a problem. What we can’t tell yet is which method within a running application service is causing performance problems. Enter the always-on production Java profiler. Java code profiles will show the exact method or methods that are hogging the CPU or responsible for that long wait time.

Install full-stack Java monitoring in 1 step

How to install the Instana Agent depends on the system to be monitored but it’s always easy. The installation wizard inside the Instana web Interface provides the user with a choice of setup techniques by environment type.

This screenshot shows how to install the Instana web UI installation wizard on Linux.

Apart from Java applications, the Instana Agent will discover more supported technologies and set them up for automatic monitoring, too. The Instana single agent per host implementation keeps the monitoring overhead extremely low and greatly simplifies the overall installation and maintenance process.

Using the Instana platform to collect all important metrics, distributed traces and code profiles stitches together all information to provide a full, end-to-end view of the contextual dependencies and the impact between the different components, including automatic discovery of upstream and downstream services.

With the Instana platform, it’s not up to the user to manually determine which services are part of a degradation or why a specific service seems to be impacted by an issue on another component. The Instana platform automatically generates the necessary relationships between all system components. Furthermore, it understands the system’s architecture and dependencies down to the level of container instances and container hosts a specific request was executed on at a specific point in time. All that information is used to create correlations and provide the necessary evidence in case of incidents to quickly find the culprit—root cause—and help decrease the time to resolution.

See the Instana Java and JVM monitoring by using our interactive sandbox observability environment today.

Ready to try out IBM Instana and see what it can do for you?

Sign up for a free, two-week trial


Was this article helpful?
YesNo

More from IBM Instana

Achieving operational efficiency through Instana’s Intelligent Remediation

3 min read - With digital transformation all around us, application environments are ever growing leading to greater complexity. Organizations are turning to observability to help them proactively address performance issues efficiently and are leveraging generative AI to gain a competitive edge in delivering exceptional user experiences. This is where Instana’s Intelligent Remediation comes in, as it enhances application performance and resolves issues, before they have a chance to impact customers. Now generally available: Instana’s Intelligent Remediation Announced at IBM Think 2024, I’m happy…

Probable Root Cause: Accelerating incident remediation with causal AI 

5 min read - It has been proven time and time again that a business application’s outages are very costly. The estimated cost of an average downtime can run USD 50,000 to 500,000 per hour, and more as businesses are actively moving to digitization. The complexity of applications is growing as well, so Site Reliability Engineers (SREs) require hours—and sometimes days—to identify and resolve problems.   To alleviate this problem, we have introduced the new feature Probable Root Cause as part of Intelligent Incident…

Observe GenAI with IBM Instana Observability

6 min read - The emergence of generative artificial intelligence (GenAI), powered by large language models (LLMs) has accelerated the widespread adoption of artificial intelligence. GenAI is proving to be very effective in tackling a variety of complex use cases with AI systems operating at levels that are comparable to humans. Organisations are quickly realizing the value of AI and its transformative potential for business, adding trillions of dollars to the economy. Given this emerging landscape, IBM Instana Observability is on a mission to…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters