Do you have vast amounts of digital documents (such as PDF files, patents and corporate documents) piled up in your organization that are humanly impossible to read and digest? Wouldn’t it be nice if you could query against these document piles with questions like “List all the materials claimed by company X in the US patent office”? Deep search is an IBM Research® service that automatically analyzes enormous digital libraries and facilitates discovering unknown facts. It implements an AI-based approach to enable intelligent querying against document repositories. This capability has been demonstrated to aid innovation across various industries such as material sciences, insurance and drug discovery.

IBM deep search service

How does deep search work? Initially, as shown in figure 1, the digital documents are segmented into multiple components (heading, introduction, references and so on) using machine learning models and converted into structured data representations (such as HTML or JSON). These supervised learning models are customizable and highly accurate, making use of huge data sets and modern neural network topologies.

The second step of deep search involves using the existing data sources (corporate databases, publicly available data sets and the like) to identify the concepts (such as alloy, material) and relationships that are relevant to the context of knowledge discovery. Finally, a searchable and queryable knowledge base is built by linking the structured data formats of documents to the identified concepts and relationships.

Deep search in action

The document processing techniques coupled with the graph analytics provided by deep search can accelerate novel discoveries from document repositories across industries. The chemical company Nagase & Co has put deep search to extensive use in developing new compounds. ENI, an oil and gas company, is using the service for upstream exploration. Currently, deep search is also aiding drug discovery in COVID-19 research.

Knowledge discovery at scale

In addition to the knowledge engineering techniques described above, automatic analysis of a huge number of documents demands powerful storage, compute and network infrastructure. The deep search platform is currently available as a service through Red Hat® OpenShift® on IBM Cloud®. It can also be set up on your premises in an OpenShift environment on IBM Power Systems as well as Intel x86 servers. The software is designed as a group of cloud-based microservices that can scale along with the number of documents and hardware resources for large search applications. This hardware-software codesigned platform has demonstrated capability to ingest as many as 100,000 pages per day per core.

IBM Systems Lab Services can help your organization make better use of document repositories using the deep search platform. Our experienced consultants help you set up the OpenShift platform, work with your subject matter experts to build the knowledge bases and design queries to help you develop novel insights into your digital libraries.

>> Contact Lab Services today.

Was this article helpful?

More from Artificial intelligence

ServiceNow and IBM revolutionize talent development with AI

4 min read - Generative AI is fundamentally changing the world of work by redefining the skills and jobs needed for the future. In fact, recent research from ServiceNow and Pearson found that an additional 1.76 million tech workers will be needed by 2028 in the US alone.  However, according to the IBM Institute for Business Value, less than half of CEOs surveyed (44%) have assessed the potential impact of generative AI on their workforces. To help customers develop and upskill their workforces to meet…

Responsible AI is a competitive advantage

3 min read - In the era of generative AI, the promise of the technology grows daily as organizations unlock its new possibilities. However, the true measure of AI’s advancement goes beyond its technical capabilities. It’s about how technology is harnessed to reflect collective values and create a world where innovation benefits everyone, not just a privileged few. Prioritizing trust and safety while scaling artificial intelligence (AI) with governance is paramount to realizing the full benefits of this technology. It is becoming clear that…

Taming the Wild West of AI-generated search results

4 min read - Companies are racing to integrate generative AI into their search engines, hoping to revolutionize the way users access information. However, this uncharted territory comes with a significant challenge: ensuring the accuracy and reliability of AI-generated search results. As AI models grapple with "hallucinations"—producing content that fills in gaps with inaccurate information—the industry faces a critical question: How can we harness the potential of AI while minimizing the spread of misinformation? Google's new generative AI search tool recently surprised users by…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters