Summarization is the ability to condense long documents into a concise summary that captures the key points of the larger work. From a technology perspective, summarization is challenging because it requires a broad range of capabilities: comprehending long passages of text, identification of key points and topics, and the generation of new text that captures the intent of the larger work. Fortunately, Large Language Models (LLMs) are well-suited to these tasks. Using LLMs, architects can create solutions that help users minimize the burden of having to read long documents in details; resulting in productivity gains and more positive user experiences.

Architecture

Generative AI architecture patterns

The diagram above shows the two forms of the summarization pattern. The simplest form of the pattern is the Stuff variant. In this pattern:

The contents of a document is read and 'stuffed', ie. copied in its entirety, into a LLM prompt.
A prompt template is commonly used to 'wrap' the content with directions and keywords to direct the target model to generate a summary.
The resulting prompt is submitted to a trained LLM which generates a summary in response.

The Stuff approach is great for small documents but it doesn't work for documents too large for the LLM's context window, or for collections of documents. Fortunately we have the Map-Reduce variant for these situations. In the Map phase of the variant, individual documents and/or subsections of documents are stuffed into LLM prompts using the Stuff approach. The summaries returned for the documents and/or chunks are aggregated by the application and then submitted to an LLM (4) to generate an overall summary of the larger work and/or document set. It's possible to use the same LLM can be used for the Map and Reduce phases but more often the Reduce model will need to be fine-tuned to generate aggregate summaries without losing key details.

Conceptually summarization is similar to a machine translation task: we want the LLM to 'translate' a long document into a shorter summary. Thus encoder-decoder models such as BART and T5 are well-suited to summarization solutions. The majority of LLMs suitable for summarization are trained using one or more publicly available training sets drawn from sources such as news stories, Wikipedia, legislation, and scientific publications but will generally require fine-tuning before they can generate acceptable summaries for targeted business processes and input data.

A complex business process will typically require multiple fine-tuned models to generate summaries for different user groups. For example, an insurance claims process would potentially require LLMs fine-tuned for claims summarization and routing, fraud detection and investigation, and for summarization of reports from service provides such as medical or engineering consultants.

Use cases

Summarization is a candidate solution pattern for any business scenario where users must routinely read and understand large documents but don't necessarily require deep knowledge of the document contents until later in the business process.

Candidate uses include:

Insurance claims adjudication. Insurance claims, particularly complex commercial and group health claims, are often read multiple times in the submission and adjudication process. Often, claims are initially read to determine the appropriate department and/or adjuster to handle the claim. Further reading is required to understand and act on independent assessment reports, to determine coverage, and to assess for potential fraud. A summarization solution that extracts the relevant points from a text has the potential to substantially improve these processes.
Contracts. Commercial contracts are often complex and difficult to understand; even for a relatively straightforward transaction. A summarization solution that can summarize the key terms and conditions of a contract in plain language could be a significant boon to business people, lawyers, and para-legals across multiple industries.
Medical abstracts. The compilation of medical abstracts from patient records is an arduous task that requires substantial expertise to perform correctly. A summarization solution that can extract the key elements of a large patient record and assist with coding records (using ICD-10 or other diagnostic coding scheme) would improve both the speed and consistency of the abstracting process.
Product and service support. Customer support staff are often called upon to pick up or jump into problem resolution efforts that can span many interactions between customers and the support team. A summarization solution accurately summarizes a support case can reduce the time needed for support staff to come up to speed on a case and ideally reduce the time required to resolve cases.

Architecture Decisions and Considerations

Summarization solutions require architects to make a number of significant decisions to achieve the solution's functional and non-functional requirements.

Choice of Generation Model

As documented above, many LLMs are capable of performing text summarization 'out of the box'. If the capabilities inherent in the model meet the solution requirements than architects must consider factors such the model's size (which drives infrastructure requirements), quality of responses, and inferencing speed. If fine-tuning is required then architects must also consider the amount of tuning data, and the complexity of the tuning process required to tune a selected base model to their specific needs

Evaluation Metrics

Evaluating the performance of generative AI solutions can be challenging due to the qualitative nature of the their task, ie. how one generated summary 'better' than another. Common metrics include perplexity, fluency, relevancy, and coherence; as well as BLU and ROUGE metrics. An architect must select metrics that align with the solution's functional requirements and overall business goals.

Resources

watsonx.ai summarize

Watch the demo to see how watsonx.ai can help transform dense text into your personalized executive overview, capturing key points from financial reports, meeting transcriptions and more.

IBM's Generative AI Architecture

IBM's Generative AI Architecture is the complete IBM Generative AI Architecture in IBM IT Architect Assistant (IIAA), an architecture development and management tool.

Next steps

Talk to our experts about implementing a hybrid cloud deployment pattern.

More ways to explore

Hybrid Cloud Architecture Center

Diagram tools and templates

IBM Well-Architected Framework

Architecture

Choice of Generation Model

Evaluation Metrics

Contributors

Mihai Criveti, Chris Kirby

Updated: December 15, 2023