December 8, 2020 By Xiaotong Liu
Anbang Xu
Rama Akkiraju
5 min read

As IT complexity grows and the use of AI technologies expands, enterprises are looking to bring in the power of AI to transform how they develop, deploy and operate their IT.

Our past work on Sentiment Analysis and Entity Recognition has shown that artificial intelligence (AI) models customized with cross-lingual data on top of Language Models outperform those that are trained on general-purpose data alone. We were curious to see if we could replicate similar results while solving problems like anomaly predictions in the IT Operations Management domain. So, we conducted experiments to test this hypothesis. In this article, we share our experimental results in which we note that the anomaly prediction models built with advanced Language Models that are trained with IT data as features outperform the ones built with general-purpose data.  

Introduction

Language Models are critical components in Natural Language Processing (NLP). They can learn to predict the probability of a sequence of words. A 1-gram language model predicts the probably of a single missing word in a sentence. For example, in the sentence “Ana _ to get a book to read,” an English-language-trained Language Model might predict the word ‘went’ to fill in the dash with a probability of 99%.

A 2-gram language model predicts the probability of a sequence of two missing words at a time. For example, in the sentence “Ana _ _ get a book to read,” a trained Language Model might predict the word sequences ‘went to’ or ‘had gone’ — each with a probability of 95%. This can be extrapolated to n-grams.

In order to perform this task, internally, in Language Models, words are converted to real number vector representations because it is easier for mathematical models to operate on numbers. These are called Word Embeddings or Word Vectors. These Word Embeddings are widely used in NLP tasks.

To create Word Embeddings, words or phrases from the vocabulary of a language are mapped to vectors of real numbers, and each word or phrase is associated with a feature vector of a fixed dimension. Typically, Embeddings are pre-trained on large text corpora such as Wikipedia, Twitter tweets, news articles, etc., and are tested on Language Modeling tasks, which assign a probability distribution over sequences of words.

An IT operations environment generates many kinds of data. These include metrics, alerts, events, logs, tickets, application and infrastructure topology, deployment configurations, and chat conversations, among others. Our goal in this experiment is to pre-train Language Models with IT domain vocabulary that occurs in logs, tickets, metrics, alerts, events, and chats — for example, errors, exceptions, messages, service names, server names, pods, container ids, node ids, incidents, tickets, root cause, causal factor and topology, etc. Word Embeddings derived from such IT domain-specific Language Models could serve as richer features for the machine-learning-based AI models in our system.

Applying Language Models to Log Anomaly Prediction in IBM Watson AIOps

In IBM Watson AIOps, there are many AI pipelines for processing different types of data and generating insights from them. For example, application and infrastructure logs and metrics are parsed and processed to predict anomalies early in the process. These are handled by Log Anomaly and Metric Anomaly Prediction models, respectively.

Anomalies that are raised and other events and alerts that may be generated via rules are then grouped into their corresponding incident buckets by leveraging various techniques, including entity linking and spatial, temporal, and topological algorithms to reduce event noise. This is done by Event Grouping AI models. Faults are diagnosed and localized by Fault Localization AI models. The set of impacted components are noted by Blast Radius AI models. Similar incidents from the past incident records are identified and next-best-actions are derived by Incident Similarity AI models.

Each one is an AI model that employs different algorithms. Some are deep-learning algorithms, and some are unsupervised machine-learning algorithms. The features used in all these models could benefit from a deeper understanding of IT domain. Figure 1 shows our approach to using language models for different IT operations management prediction tasks:

Figure 1: An illustration of language models for different IT Operations management prediction tasks.

Anomaly detection from logs is one fundamental IT Operations management task that aims to detect anomalous system behaviors and find signals that can provide clues to the reasons of a system’s failure. In our experiment, we tested whether anomaly detection models built with features derived from Word Embeddings from the Language Models trained on IT data outperform the ones that are built with the general-purpose technologies.  

To pre-train language models in the IT Operations domain, we first process the input IT data into a normalized format using pre-defined rules — extracting the most informative texts, such as log messages, ticket descriptions, and so on. We also remove duplicates of texts, which may be auto-generated multiple times by the system for the same event. Next, we randomly sample data from each data source and use the data samples to learn the vocabulary of the IT Operations domain. After that, we pre-train the Language Model using the sampled data and tune the parameters based on model evaluation. An overview of the pre-training pipeline is shown in Figure 2:

Figure 2: The pipeline of pre-training language models using IT Operations domain data.

We trained a number of anomaly detection models using different pre-trained features. In Table 1, we report the accuracy results of anomaly prediction on two benchmark datasets for two models — one is a machine-learning model trained with fastText Word Embeddings that are trained on general purpose data (e.g., Wikipedia, news articles, etc). The other one is a machine-learning model built using embeddings trained with diverse IT Operations domain data as features. Our experimental results indicate that the fastText model customized with IT domain logs outperforms the AI model built using Language Models with domain-independent, general-purpose data on both the datasets:

Conclusion

As IT complexity grows and the use of AI technologies expands, enterprises are looking to bring in the power of AI to transform how they develop, deploy and operate their IT. IBM Watson AIOps adopts a new approach to leverage advanced Language Models for IT Operations tasks, such as log anomaly prediction. With the power of Watson AIOps, we can accelerate the development of text-based AI models for optimizing IT Operations management tasks at a large scale.

Was this article helpful?
YesNo

More from Cloud

IBM Cloud Virtual Servers and Intel launch new custom cloud sandbox

4 min read - A new sandbox that use IBM Cloud Virtual Servers for VPC invites customers into a nonproduction environment to test the performance of 2nd Gen and 4th Gen Intel® Xeon® processors across various applications. Addressing performance concerns in a test environment Performance testing is crucial to understanding the efficiency of complex applications inside your cloud hosting environment. Yes, even in managed enterprise environments like IBM Cloud®. Although we can deliver the latest hardware and software across global data centers designed for…

10 industries that use distributed computing

6 min read - Distributed computing is a process that uses numerous computing resources in different operating locations to mimic the processes of a single computer. Distributed computing assembles different computers, servers and computer networks to accomplish computing tasks of widely varying sizes and purposes. Distributed computing even works in the cloud. And while it’s true that distributed cloud computing and cloud computing are essentially the same in theory, in practice, they differ in their global reach, with distributed cloud computing able to extend…

How a US bank modernized its mainframe applications with IBM Consulting and Microsoft Azure

9 min read - As organizations strive to stay ahead of the curve in today's fast-paced digital landscape, mainframe application modernization has emerged as a critical component of any digital transformation strategy. In this blog, we'll discuss the example of a US bank which embarked on a journey to modernize its mainframe applications. This strategic project has helped it to transform into a more modern, flexible and agile business. In looking at the ways in which it approached the problem, you’ll gain insights into…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters