December 16, 2022 By Daniel Pittner 3 min read

How to query your log archives directly from object storage.

Today, we are thrilled to announce the release of a new feature that will change the way you access and analyze your log data. With IBM Cloud Data Engine, you can now query your log archives directly from object storage, without the need to download and store them locally.

How to use SQL for analyzing unstructured logs

Log data is typically generated by various systems and applications, and the schema of the data can change over time as new systems and applications are added and existing ones are updated. This makes it difficult to query log data using SQL because the schema is not fixed and can change from one log entry to the next.

To address these challenges, you can now query log archives as text. Let’s look at an example:

WITH logs ( 
SELECT get_json_object(value, "$._source._host") as host,
from_unixtime(get_json_object(value, "$._source._ts") / 1000, 'yyyy-MM-dd HH:mm:ss') as timestamp,
get_json_object(value, "$._source._file") as file,
get_json_object(value, "$._source._line") as line
FROM cos://us-geo/logArchives STORED AS TEXT )
SELECT * FROM logs LIMIT 200

First, the query uses the WITH clause to define a derived table named logs that contains the structured log data extracted from the unstructured log archives. The logs table is defined using the SELECT clause and a series of get_json_object functions, which are used to extract the specific fields of interest from the unstructured log data.

This is a powerful approach because it allows the query to work with structured data, which is much easier to query and analyze than unstructured data. By extracting the specific fields of interest and creating a structured table, the query can use standard SQL syntax to filter, sort and aggregate the data as needed.

One of the key benefits of this new capability is the ability to take advantage of parallel query processing for gzip-compressed data. The secret sauce of speed for the parallel processing of gzip is its ability to divide the data being compressed or decompressed into smaller chunks and process those chunks simultaneously — using multiple processors or cores — to achieve orders of magnitude better query execution times.

What insights can you gain from analyzing log archives with SQL?

Here are some examples of how you can use this new capability to gain insights from your historical IBM Cloud log archives:

  • Identify and troubleshoot issues with your applications: By querying your log data, you can easily identify any errors or issues that may have occurred in the past and may be impacting the performance of your applications. This can help you troubleshoot problems and improve the reliability and stability of your applications.
  • Monitor system performance and usage: By analyzing your log data, you can gain insights into how your systems were being used and how they were performing in the past. This can help you identify trends and patterns and make data-driven decisions to optimize your system performance and improve the user experience.
  • Analyze user behavior and preferences: By querying your log data, you can gain insights into how your users were interacting with your applications and what features they were using in the past. This can help you understand their needs and preferences and make data-driven decisions to improve the user experience and drive engagement.

Check out more examples in the documentation.

Get started with Data Engine and log analysis archiving

Overall, this new capability of IBM Cloud Data Engine provides a powerful and flexible way to query and analyze your historical log data, helping you gain valuable insights and make data-driven decisions. We are excited to see how you will use it to unlock the value of your log data and drive success for your business.

Learn more about IBM Cloud Data Engine.

More from Announcements

Success and recognition of IBM offerings in G2 Summer Reports  

2 min read - IBM offerings were featured in over 1,365 unique G2 reports, earning over 230 Leader badges across various categories.   This recognition is important to showcase our leading products and also to provide the unbiased validation our buyers seek. According to the 2024 G2 Software Buyer Behavior Report, “When researching software, buyers are most likely to trust information from people with similar roles and challenges, and they value transparency above other factors.”  With over 90 million visitors each year and hosting more than 2.6…

Manage the routing of your observability log and event data 

4 min read - Comprehensive environments include many sources of observable data to be aggregated and then analyzed for infrastructure and app performance management. Connecting and aggregating the data sources to observability tools need to be flexible. Some use cases might require all data to be aggregated into one common location while others have narrowed scope. Optimizing where observability data is processed enables businesses to maximize insights while managing to cost, compliance and data residency objectives.  As announced on 29 March 2024, IBM Cloud® released its next-gen observability…

Unify and share data across Netezza and watsonx.data for new generative AI applications

3 min read - In today's data and AI-driven world, organizations are generating vast amounts of data from various sources. The ability to extract value from AI initiatives relies heavily on the availability and quality of an enterprise's underlying data. In order to unlock the full potential of data for AI, organizations must be able to effectively navigate their complex IT landscapes across the hybrid cloud.   At this year’s IBM Think conference in Boston, we announced the new capabilities of IBM watsonx.data, an open…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters