September 2, 2022 By IBM Instana Team 7 min read

When applications produce anomalies such as erroneous calls, problematic traces, a high degree of events or an accumulation of problem messages in logs, observability tools determine how fast you can identify the root cause.

In a best-case scenario, your observability tool uncovers the first signs of something that could slow down or disrupt an application, and you can mitigate it before it impacts customers. Other times, more sudden events can cause an immediate disruption, and you have to triage and resolve a major incident.

In other words, the alert isn’t the end of the observability process; it’s just the beginning. When every second counts, finding the root cause and applying the fix often depends on the UI of the observability tool. Complex search queries, non-intuitive navigation and hidden menus are time killers. Easier query data means faster fixes and happier customers.

The complexity and speed of modern cloud-native architectures make it impossible to manually keep track of dependencies between application components.

As is often the case, the first line of defense is automation.

Automation drives application data collection and analysis

Observability and APM tools record an unwieldy amount of information. The IBM Instana platform captures and stores every trace, gathered at one-second metric granularity with no sampling. The solution also uses Unbounded analytics capabilities to automatically associate or let you associate data in the context you need. Either path directs you to the specific graph, bar chart or text description that lets you drill down to the root cause.

All that information is organized and displayed in the Dynamic Graph to reveal the dependencies between your application components automatically. That means that the conditions that lead to and result from an issue are associated with that issue, so you can look forward or backward for cause and effect.

In fact, the amount of information could be overwhelming without an interface that makes it easy to drill down to specific data.

What’s new: Unbounded analytics just made triage even easier

That’s why we’ve put so much emphasis on ease of use since the inception of IBM Instana—so at the first sign of trouble, you can easily find and fix the issue. Now we’ve made analytics even faster with easier query data. Here’s a look at some enhancements.

Automated query strings and filters

All IBM Instana analytics are built using filters and groups that are combined using a simple query language that identifies the parameters to be viewed. You can then ‘AND’ or ‘OR’ those parameters into a robust logical statement that drills into the specific data combination you want to see. When analytics are presented within a sidebar menu category such as events, the analytics query string is automatically generated for the issue explored. When you alternatively use the Analytics sidebar menu, you can create queries from scratch.

Automated production profiler that continually analyzes

Figure 1

IBM Instana analytics top-level parameters include Application Calls, Traces and Logs, seven Website parameters, four Mobile parameters, and Profiles. Applications, Website and Mobile parameters can then be organized using the filter, group and latency attributes. IBM Instana

AutoProfile is an automated production profiler that continually analyzes Java, Go, Python, Ruby and other code-level performance.

The main Unbounded analytics screen is displayed in Figure 1. At the top is the Analytics navigation menu, which by default displays Application Calls as the display parameter.

More intuitive menus

The menu snippets in the next screenshot show the full range of Analytics menu options, including Traces, Website parameters, Mobile application parameters and Application Profile, which invokes IBM Instana AutoProfile code-level profiling capability:

Figure 2

The Applications menu:

Figure 3

The Websites menu:

Figure 4

The Mobile Apps menu:

Figure 5

First use case—automated analytics

Figure 6: Two events are flagged in the Events icon in the side menu.

Let’s examine analytics for event incidents from within traces, calls, events and other parameters for the IBM Instana RobotShop eCommerce demo environment.

In the previous screenshot, you’ll see that two events are flagged in the Events icon in the side menu.

Clicking on the Events icon opens the Events page.

In the Open Events list, we see the issues, including an increase in latency and the number of erroneous calls. The increase in error events appears to be more serious, so we click on the most serious listing and drill down to see more detail.

The event detail page in the next screenshot displays the Incident Timeline, Triggering Event and any contextual Related Events. To explore, we click on the Triggering Event to obtain more information about the Event cause:

Figure 7:  A spike in erroneous calls in the timeline graph.

Scrolling down on the Event screen, we see a significant spike in erroneous calls in the timeline graph. From here, we can either view the Built-in Event or invoke Unbounded analytics to obtain more information about the precipitating event by clicking on the Analyze Calls button.

Figure 8: IBM Instana automatically generates the appropriate Filter, Group and Chart information for the erroneous call event.

IBM Instana displays an Unbounded analytics screen for Calls and automatically generates the appropriate Filter, Group and Chart information for the erroneous call event we’re examining. From the Filter line, we can see that the IBM Instana solution has combined the ‘Service Name: discount’ and ‘Call Erroneous is true’ parameters using the Unbounded Analytics query language to specifically point to the location of the erroneous calls. It identifies two endpoints that are experiencing a 100% erroneous call rate.

Figure 9: Scrolling down the list of endpoint errors.
Figure 10: The IBM Instana platform combines the ‘Service Name: discount’ and ‘Call Erroneous is true’ parameters to point to the erroneous calls.

We scroll down the list of endpoint errors and examine one to get a better idea of what issue is causing the problem.

The detail screen shows that the erroneous call is a Connect call that is throwing a 500 Internal Server Error—which typically means the service is not available for a connection.

Second use case—build your analytics

In this use case, we’ll show how to construct a query from the Analytics menu that arrives at the same diagnosis screen:

Figure 11: Construct the queries using the Query Builder.

The first step is to define the Filter, Group and Chart parameters that you want to examine and then construct the queries using the Query Builder.

In this screen, you can see that we built the same query as when we selected the Analyze Calls button in our Event investigation in the previous screenshot.

Figure 12: The same query as in the automated use case.

By scrolling down, we can select the same Endpoint that we selected in the Event incident investigation:

Figure 13: Select the endpoint again.

After selecting the same Endpoint parameter, we end up at the same Call 500 error analytics screen that we reached in the Event incident investigation.

Figure 14: The same Call 500 error analytics screen.

Starting at a flagged anomalies origin or from the sidebar Analytics menu selection, you can arrive at the same analysis. Along the way, once you’re in the Analytics dashboard, you can add or subtract other contextual data to the query to provide more granular information and to determine whether other factors contribute to the issue.

Figure 15

Conclusion

Unbounded analytics is the anomaly-hunting tool for the IBM Instana platform. The first objective of enterprise observability is to obtain Metrics, Events, Traces and Log information in granular one-second detail with context and display it so that you instantly see the state of your applications, services and infrastructure in real time.

But then, when issues arise, though, Unbounded analytics shifts into high gear and is the means that lets you instantly drill down to find an issue’s root cause. Using a combination of machine learning and issue detection, it constructs the associations to obtain answers, even in highly complex distributed environments illustrated in the IBM Instana Dynamic Graph.

When mean time to repair is critical, users are complaining about performance, or worse, customers are abandoning transactions on your core enterprise systems—time is not your ally. The IBM Instana solution not only has the information you need to identify issues but also correlates that information so that you can determine the full nature of the issue rapidly. Unbounded analytics is one of the key IBM Instana tools that enables that correlation.

To learn more, sign up for a free, two-week trial
Was this article helpful?
YesNo

More from IBM Instana

Achieving operational efficiency through Instana’s Intelligent Remediation

3 min read - With digital transformation all around us, application environments are ever growing leading to greater complexity. Organizations are turning to observability to help them proactively address performance issues efficiently and are leveraging generative AI to gain a competitive edge in delivering exceptional user experiences. This is where Instana’s Intelligent Remediation comes in, as it enhances application performance and resolves issues, before they have a chance to impact customers. Now generally available: Instana’s Intelligent Remediation Announced at IBM Think 2024, I’m happy…

Probable Root Cause: Accelerating incident remediation with causal AI 

5 min read - It has been proven time and time again that a business application’s outages are very costly. The estimated cost of an average downtime can run USD 50,000 to 500,000 per hour, and more as businesses are actively moving to digitization. The complexity of applications is growing as well, so Site Reliability Engineers (SREs) require hours—and sometimes days—to identify and resolve problems.   To alleviate this problem, we have introduced the new feature Probable Root Cause as part of Intelligent Incident…

Observe GenAI with IBM Instana Observability

6 min read - The emergence of generative artificial intelligence (GenAI), powered by large language models (LLMs) has accelerated the widespread adoption of artificial intelligence. GenAI is proving to be very effective in tackling a variety of complex use cases with AI systems operating at levels that are comparable to humans. Organisations are quickly realizing the value of AI and its transformative potential for business, adding trillions of dollars to the economy. Given this emerging landscape, IBM Instana Observability is on a mission to…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters