When streamlining the ingestion of structured and unstructured data is a big deal, the enrichment capabilities in IBM® Watson® Discovery Service are huge. The reason—its built-in enrichment algorithms can process your data sources and help find signal at the implicit level as well as the explicit level.

This means Discovery is not your traditional keyword-based enrichment tool. Rather, the information you can extract from unstructured text might be something that wasn’t even referenced explicitly in the text.

Most important, this capability is automated—and built into Discovery. That means taking advantage of this power doesn’t require you to write code or install the plumbing. So it takes less time to find signal, saving money and other resources.

Built-in enrichments

As an example, the sentiment analysis API already exists in IBM Watson. It gives you the ability to find out whether people were talking positively or negatively about people, companies or some other entity.

Likewise, other powerful APIs are also available in Watson. The concept tagging API, which can extract an overarching concept that connects explicitly mentioned text, already exists. Watson also includes the entity tagging API, which can tie pronouns back to something that was previously referenced in the text to create a connection that would otherwise not exist.

The data crawler functions like a standalone program that uploads all of the files in a directory, or set of directories—from a local machine or a network file system. Further, Discovery includes connectors that talk to the different data repositories.

The difficulty is that constructing the plumbing with the API calls and associated coding can be time consuming and costly. And the learning curve associated with building the code infrastructure may be daunting. Discovery comes with that plumbing already built—so these enrichments are ready to use.

Additional enrichment tools

But Discovery Service enrichment capabilities are not limited to the automatic natural language processing on your documents. As your solution grows, you can use the Natural Language Understanding service available in IBM Bluemix to enrich content that's outside of your data pipeline.

You can also improve the accuracy and specificity of your enrichments using custom models created from IBM Watson Knowledge Studio. So if you want to train your own custom model based on a specific domain such as legal or finance, now you can.

You would just use Watson Knowledge Studio to create custom models, then publish them into Discovery so they can act on the ingested data. And this additional capability can be applied to Natural Language Understanding enrichments as well. So you can use this capability to improve both enrichment functions.

Other data processing capabilities

Like a print preview on a Microsoft Word document, the preview API returns the results of the enrichment to the user rather than the search index. This gives you a sandbox to evaluate an enrichment and test different configurations. Once you are satisfied with the setup, you can switch to the regular ingestion API and send the results to the search index.

With the enrichments in place, an ingested document can be normalized. As part of normalization, the data structure in Discovery gives your flexibility as to how you’d like to represent the data prior to it being indexed.

This flexibility is because Discovery can represent hierarchal relationships such as parent-child or sibling-sibling in the data structure. This isn’t possible in a traditional flat file structure. And this capability allows query options that otherwise couldn’t be executed.

So let’s revisit our fictitious online reseller to see what this means in practice. Discovery has already ingested multiple Blue Snail Style data sources, including the highly descriptive product listings from the Blue Snail Style website and the customer reviews.

Now the IT staff begins playing with the previously ingested unstructured data sources using the built-in enrichments algorithms, and using the preview API to evaluate the results. And Discovery allows them to see relationships between the different product listings that pure text-based processing couldn’t. For instance:

  • The online catalog reveals 81 percent of the company’s products are machine washable.
  • The hierarchal data structure finds that the product offerings are comprised of 49 percent men’s clothing, 32 percent women’s clothing and 19 percent unisex products.
  • The customer reviews reveal the highest ranked products have red as a color.

Talk to an expert

Talk to an expert