18 December 2024
In this tutorial, you will use IBM® Docling and open source Granite™ 3.1 to perform document visual question answering for various file types.
Docling is an IBM open source toolkit for parsing documents and exporting them to preferred formats. Input file formats include PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc and Markdown. These documents can be exported to markdown or JSON. Docling also provides OCR (optical character recognition) support for scanned documents. Use cases include scanning medical records, banking documents and even travel documents for quicker processing.
Retrieval augmented generation (RAG) is an architecture for connecting large language models (LLMs) with external knowledge bases without fine-tuning or retraining. Text is embedded, stored in a vector database and finally, is used by the pretrained model to return relevant information for natural language processing (NLP) and machine learning tasks.
When an LLM has a larger context window, the generative AI model can process more information at once. This means that we can use both RAG and models with large context windows to leverage the ability to efficiently process more relevant information at a time. The LLM we use in this tutorial is the IBM Granite-3.1-8B-Instruct model. This model extends to a context window size of 128K tokens. We will access the model locally by using Ollama, without the use of an API. This model is also available on Hugging Face.
This tutorial can be found on our GitHub in the form of a Jupyter Notebook. Jupyter Notebooks are widely used within data science to combine code, text, images and data visualizations to formulate a well-formed analysis.
We first need to set up our environment by fulfilling some prerequisites.
We will work with various document formats in this tutorial. Let's create a helper function to detect document formats by using the file extension.
Next, we can use the DocumentConverter class to create a function that converts any supported document to markdown. This function identifies text, data tables, document images and captions by using Docling. The function takes a file as input, processes it using Docling's advanced document handling, converts it to markdown and saves the results in a Markdown file. Both scanned and text-based documents are supported and document structure is preserved. Key components of this function are:
The QA chain is the heart of our system. It combines several components:
Document loading:
Text splitting:
Vector store:
Language model:
The following setup_qa_chain function sets up this entire pipeline.
Finally, let's create a simple interface for asking questions. This function takes in the chain and user query as parameters.
Let's put it all together and enumerate over our questions for a specific document. The path to this document is stored in doc_path and can be any document you want to test. For our sample document, check out our GitHub. The system maintains conversation history and can handle follow-up questions.
Output:
Converting document: ibmredbook.pdf
Starting conversion...
Exporting to markdown...
Writing markdown to: ibmredbook_converted.md
Question: What is the main topic of this document?
Answer: The main topics covered in this document are building and managing containers using Red Hat OpenShift, deploying applications, and security aspects related to containerization on IBM Power systems. The document also includes an introduction to Red Hat OpenShift, its benefits, core concepts, and implementation on IBM Power. Additionally, it discusses multi-architecture containerization, monitoring tools and techniques, log management, performance tuning, and optimization.
Question: What are the key points discussed?
Answer: This document primarily focuses on implementing Red Hat OpenShift Container Platform on IBM Power systems for managing containers in a hybrid cloud environment. Key points covered include:
Introduction to Red Hat OpenShift: It is an enterprise Kubernetes platform that extends Kubernetes with additional features and tools, enhancing productivity and security for businesses using container technology at scale.
Benefits of Using Red Hat OpenShift for Container Orchestration: The document highlights the advantages of employing Red Hat OpenShift for managing containers, such as its comprehensive solution for hybrid cloud environments, including a container runtime, networking, monitoring, a container registry, authentication, and authorization.
Minimum IBM Power Requirements: Red Hat OpenShift Container Platform 4.15 can be installed on IBM Power 9 or IBM Power 10 processor-based systems.
Deploying Red Hat OpenShift on IBM Power Systems: This involves tailoring the networking infrastructure to leverage the robust capabilities and unique architecture of Power Systems, optimizing network performance for high throughput and low latency, ensuring network security and compliance, and managing network configurations for enterprise-level deployments.
Optimizing Network Performance: The document emphasizes the importance of faster storage, particularly for etcd on control plane nodes, as Red Hat OpenShift Container Platform is sensitive to disk performance.
Multi-Architecture Containerization: The text discusses key concepts in multi-architecture containerization and provides guidelines for implementing it using IBM Power control planes. It also addresses challenges and solutions related to multi-architecture containerization.
Security Aspects: Enterprise-grade security is mentioned as a crucial aspect of Red Hat OpenShift on IBM Power systems, although specific details are not provided in the given context.
Monitoring Tools and Log Management: The document does not explicitly mention monitoring tools or log management; however, it can be inferred that these aspects are covered within the broader context of managing containers using Red Hat OpenShift on IBM Power systems.
Performance Tuning and Optimization: While specific tuning and optimization techniques are not detailed in the provided context, the document implies that performance considerations should be taken into account during deployment and configuration.
Question: Can you summarize the conclusions?
Answer: The document discusses the implementation of Red Hat OpenShift Container Platform on IBM Power systems for managing containers in a hybrid cloud environment. Here are the main conclusions drawn from this document:
Benefits: Red Hat OpenShift provides a comprehensive solution for hybrid cloud environments, encompassing essential components such as a container runtime, networking, monitoring, a container registry, authentication, and authorization. It extends Kubernetes with additional features and tools to enhance productivity and security, making it an ideal choice for businesses looking to leverage container technology at scale.
Minimum Requirements: Red Hat OpenShift Container Platform 4.15 can be installed on IBM Power 9 or IBM Power 10 processor-based systems. For comprehensive guidance and further information on installation and configuration, refer to the IBM Redbooks publication Implementing, Tuning, and Optimizing Workloads with Red Hat OpenShift on IBM Power (SG24-8537) and Red Hat OpenShift Documentation.
Deployment Process: Deploying Red Hat OpenShift on IBM Power Systems involves tailoring the networking infrastructure to fully leverage the robust capabilities and unique architecture of Power Systems. This includes optimizing network performance for high throughput and low latency, ensuring network security and compliance, and managing network configurations to meet enterprise-level demands.
Network Performance Optimization: Faster storage is recommended, particularly for etcd on control plane nodes. On many cloud platforms, storage size and IOPS scale together, so you might need to over-allocate storage volume to obtain sufficient performance.
Multi-Architecture Containerization: Red Hat OpenShift supports multiple architectures (x86 and IBM Power) with RHOS 4.14 or later, simplifying the management of your Red Hat OpenShift environment on both x86 and IBM Power servers.
Security Aspects: The integration of Red Hat OpenShift running on IBM Power servers with existing infrastructure involves strategic networking solutions that bridge on-premises systems with your new cloud environment. This enables organizations to leverage the strengths of both infrastructures for enhanced flexibility, scalability, and resilience while ensuring network security and compliance.
Performance Tuning: The document does not provide specific details about performance tuning; however, it is mentioned that optimizing network performance for high throughput and low latency is essential. For comprehensive guidance on performance tuning, refer to the IBM Redbooks publication Implementing, Tuning, and Optimizing Workloads with Red Hat OpenShift on IBM Power (SG24-8537) and Red Hat OpenShift Documentation.
In summary, this document highlights that implementing Red Hat OpenShift Container Platform on IBM Power systems offers a robust foundation for developing, deploying, and scaling cloud-native applications in a hybrid cloud environment. It emphasizes the importance of optimizing network performance, ensuring security, and leveraging multi-architecture containerization capabilities to create an efficient and flexible solution for managing containers.
Great! The system was able to retrieve relevant information from the document to answer questions. Feel free to test this system with any of your own files and questions!
Using Docling and Granite 3.1, you built a document question answering system compatible with various file types. As a next step, this methodology can be applied to a chatbot with an interactive UI. There are many opportunities to transform this tutorial to apply to specific use cases.
We surveyed 2,000 organizations about their AI initiatives to discover what’s working, what’s not and how you can get ahead.
IBM® Granite™ is our family of open, performant and trusted AI models tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.
Access our full catalog of over 100 online courses by purchasing an individual or multi-user subscription today, enabling you to expand your skills across a range of our products at a low price.
Led by top IBM thought leaders, the curriculum is designed to help business leaders gain the knowledge needed to prioritize the AI investments that can drive growth.
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com