In this article, we give a brief view into how entity extraction works in Watson AIOps.

As IT systems are increasingly becoming mission critical, companies need to ensure business continuity with uninterrupted access to their systems to manage their IT Operations smoothly. Automating problem detection, problem resolution and preventing issues in the first place are key to running operations smoothly. Existing solutions analyze various data sources like alerts, logs and metrics in silos without providing the broader context of an incident. This makes the site reliability engineer’s (SRE) life more difficult.

In IBM Cloud Pak® for Watson AIOps, we link signals from multiple sources of data to generate a holistic problem context. This linking is done by extracting common ‘signatures’ from each data type. These signatures are also referred to as ‘entities.’ In this article, we give a brief view into how entity extraction works in Watson AIOps.

Entity extraction in Watson AIOps is the process of identifying key elements from various data sources, such as alerts, logs and incidents. For example, Figure 1 shows some important entities that appear in an alert payload. Entities could be server ids, server names, pod ids, error codes, exception messages, etc.

These entities play a key role in bringing together the disparate data sources in AIOps by breaking down the silos across various data sources in the IT operations lifecycle. Common entities provide the links to piece together the puzzle pieces and enable us to create a holistic problem context to efficiently diagnose a problem for SREs:

Figure 1: Example entities from an alert payload.

Watson AIOps leverages entities for the following use cases

Use case 1: Event grouping and fault localization

Event grouping leverages entities as one of the clues to create ‘incident stories’ with a group of relevant events. These entities identified in a given incident story will then be leveraged to locate the faulty components and identify the blast radius. Figure 2 shows an example of entities extracted from an anomaly detected from logs and an independent alert that has arrived from another system via PagerDuty messaging. As you can see, both these alerts are referring to the same ts-travel2-mango service as the source of anomalies. This information helps us group the two alerts together, as both anomalies are referencing the same service ts-travel2-mango service. Therefore, the likelihood that these alerts are referring to the same incident is high:

Figure 2: Entities role in creating a story.

Use case 2: Action recommendation

In addition to creating a holistic context about the current incident via entity linking, entity extraction also plays an important role in identifying a suitable problem resolution. Reducing the mean time to resolve an ongoing incident is one of the important goals of AIOps.

Extracting insights from the diagnosis and resolution actions of prior relevant incidents can help SREs in deriving a suitable set of next best actions for an ongoing incident. However, given the large amount of information buried in prior incidents, manually processing all of the prior incident ticket data can be a very tedious task for an SRE, even after narrowing down the set of related prior incident tickets to a smaller set.

Entity extraction from an ongoing incident helps in framing a query to find suitable matches from prior incident tickets. Once relevant prior incident tickets are identified, extracting entity-action phrases from those prior incident tickets further helps in retrieving the most relevant problem resolution phrases from what might be a long document filled notes from SREs detailing the full process of incident resolution. This problem resolution phrase extraction saves time for SREs as they don’t have to read the entire text in the retrieved prior incident ticket records to identify what action was taken to fix the problem in the past. Figure 3 shows sample entities extracted from the closing notes of an incident ticket recorded in the ServiceNow ticket management system:

Figure 3: Entity extraction from incident closing notes; Action (Red), Component (Blue).

Entity extraction techniques

The nature of IT operations data is different from human written data. IT operation data is a mix of machine-generated and human-generated data [1]. Due to this, we leverage various techniques, ranging from rules- and dictionary-based techniques to advanced natural language processing techniques to extract entities from IT operations data.

Rule-based entity extraction

The rule-based approach leverages the predefined patterns that can be captured easily using regular expressions and dictionaries. These patterns will be defined by developers in advance and then used at runtime to extract entities. Entity types that can be handled via predefined rules include IP Address, Error Codes, Exception, Stack trace, File name, URL and Date/Time.

In addition to these entity types, cloud-native entity types like Pod and Slot information can be captured in combination with Application/Service names and predefined patterns. For example, in Figure 1, the pod name ts-travel2-service-75df4c5cd6-vxngm can be captured with the application name ts-travel2-service, followed by a specific pattern of alphanumeric characters in a single token. These dictionaries are automatically populated with topological information extracted from the corresponding environments.

While regular expressions can be used to write these rules, at IBM, researchers have developed a more efficient regular-expression execution engine that can scale for Big Data. This work was done under the System-T (Sytem-Text) project [2] at IBM Research. System-T specifies an Annotation Query Language (AQL) and prescribes a specific runtime to efficiently run rules that are written in AQL. This has been codified in IBM Watson’s Natural Language Processing service as a library under the name oneNLP. Our rule-based entity extraction leverages IBM Watson’s oneNLP platform to execute these rules. As mentioned, it provides an efficient rule-based engine to execute the regular expression-based rules in a scalable way and a rule language to write these rules with a language similar to SQL.

Advanced NLP-based entity extraction

While rule-based approaches can tackle entities with predefined patterns, NLP-based techniques can help extract insights from unstructured data, such as incident/closing notes, resolutions and slack conversations written by SREs.

As shown in Figure 3, entity extraction extracts the action and the component(s) the action is being performed on. Extracting these insights requires a deep understanding of the input text.

For this, entity extraction leverages the expanded semantic shallow parsing. Shallow parsing analyzes a sentence by first identifying part of speech tags (nouns, verbs) of a sentence and linking them to higher order units (noun phrases). Entity extraction leverages the Watson NLP expanded shallow semantic parsing (ESSP) [3] for this purpose. Given a sentence, ESSP identifies the Agent, Verb and Theme of a sentence and their interactions. In our problem context, Verb represents an action, Agent represents who performed the action and Theme represents the component action being performed on. Figure 4 shows these components on a resolution text:

Figure 4: Watson NLP expanded shallow semantic parsing output for a resolution text.

Entity extraction uses this output from ESSP and further processes it to identify domain-specific action-components and their linkages using domain-specific component phrase extraction and action word dictionary generation.

Domain-specific component phrase extraction

This step extracts key phrases from the documents by analyzing NLP features, such as part of speech tags. It then leverages various linguistic and statistical features like document-level relevance metrics to find relevant phrases and general domain phrases extracted from various knowledge sources to filter out generic phrases [4].

Action-word dictionary generation

Entity extraction defines an action as the process of changing something that results in a state change (e.g., restart and increase). In order to capture IT operations domain-specific actions, we generate and curate the dictionaries relevant for the domain using domain-specific corpus.

Learn more

Visit the IBM Cloud Pak for Watson AIOps website to learn more.

Read the blog: “Leveraging Log Data for Incident Management in AIOps

References

[1] Rama Akkiraju, Ruchir Puri. “Implications of training machine learning models from machine-generated data and human-authored data.”

[2] Krishnamurthy, Rajasekar & Li, Yunyao & Raghavan, Sriram & Reiss, Frederick & Vaithyanathan, Shivakumar & Zhu, Huaiyu. (2008). “SystemT: A System for Declarative Information Extraction”. SIGMOD Record. 37. 7-13. 10.1145/1519103.1519105.

[3] Zhu, Huaiyu, Yunyao Li, and Laura Chiticariu. “Towards universal semantic representation.” Proceedings of the First International Workshop on Designing Meaning Representations. 2019.

[4] Prateeti Mohapatra, Yu Deng, Abhirut Gupta, Gargi Dasgupta, Amit Paradkar, Ruchi Mahindru, Daniela Rosu, Shu Tao, and Pooja Aggarwal. 2018. Domain Knowledge Driven Key Term Extraction for IT Services. In International Conference on Service-Oriented Computing. Springer, 489–504.

Was this article helpful?
YesNo

More from Cloud

IBM Cloud Virtual Servers and Intel launch new custom cloud sandbox

4 min read - A new sandbox that use IBM Cloud Virtual Servers for VPC invites customers into a nonproduction environment to test the performance of 2nd Gen and 4th Gen Intel® Xeon® processors across various applications. Addressing performance concerns in a test environment Performance testing is crucial to understanding the efficiency of complex applications inside your cloud hosting environment. Yes, even in managed enterprise environments like IBM Cloud®. Although we can deliver the latest hardware and software across global data centers designed for…

10 industries that use distributed computing

6 min read - Distributed computing is a process that uses numerous computing resources in different operating locations to mimic the processes of a single computer. Distributed computing assembles different computers, servers and computer networks to accomplish computing tasks of widely varying sizes and purposes. Distributed computing even works in the cloud. And while it’s true that distributed cloud computing and cloud computing are essentially the same in theory, in practice, they differ in their global reach, with distributed cloud computing able to extend…

How a US bank modernized its mainframe applications with IBM Consulting and Microsoft Azure

9 min read - As organizations strive to stay ahead of the curve in today's fast-paced digital landscape, mainframe application modernization has emerged as a critical component of any digital transformation strategy. In this blog, we'll discuss the example of a US bank which embarked on a journey to modernize its mainframe applications. This strategic project has helped it to transform into a more modern, flexible and agile business. In looking at the ways in which it approached the problem, you’ll gain insights into…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters