My IBM

What is GPT (generative pretrained transformer)?

18 September 2024

Authors

What is GPT (generative pretrained transformer)?

Generative pretrained transformers (GPTs) are a family of large language models (LLMs) based on a transformer deep learning architecture. Developed by OpenAI, these foundation models power ChatGPT and other generative AI applications capable of simulating human-created output.

AI research firm OpenAI introduced the first GPT model, dubbed GPT-1, in 2018. Since then, they have released several advances in the GPT line of AI models. The most recent GPT model is GPT-4, which released in early 2023. In May 2024, OpenAI announced the multilingual and multimodal GPT-4o¹, capable of processing audio, visual and text inputs in real time.

As a foundation model, GPT has undergone subsequent fine-tuning and been adapted to a wide range of downstream specific tasks. Beyond text-based applications, GPT powers artificial intelligence (AI) apps that generate and analyze images through computer vision, write code, process data and more. These apps connect to GPT through application programming interfaces (APIs), which allow them to pass data back and forth.

The latest AI News + Insights  

Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

Subscribe today

Why is GPT important?

GPT models have accelerated generative AI development thanks to their transformer architecture, a type of neural network introduced in 2017 in the Google Brain paper Attention Is All You Need². Transformer models including GPT and BERT have powered many notable developments in generative AI since then, with OpenAI’s ChatGPT chatbot taking center stage.

In addition to OpenAI, other firms have released their own generative AI models, including Anthropic’s Claude, Inflection's Pi and Google’s Gemini, previously known as Bard. Meanwhile, OpenAI powers Microsoft’s Copilot AI service.

AI Academy

Why foundation models are a paradigm shift for AI

Learn about a new class of flexible, reusable AI models that can unlock new revenue, reduce costs and increase productivity, then use our guidebook to dive deeper.

Go to episode

GPT use cases

The flexibility of transformer models such as GPT lends them to a wide range of use cases. GPT’s ability to provide humanlike text generation makes it a widespread choice for:

Chatbots and voice assistants
Content creation and text generation
Language translation
Content summarization and conversion
Data analysis
Coding
Healthcare

Chatbots and voice assistants

GPT-powered chatbots can feel more humanlike than standard automated customer service options. Through APIs, organizations can link GPT with voice apps to create voice assistants able to respond to more complex statements and provide conversational question-answering services.

Content creation and text generation

With effective prompts, GPT models can generate text content ranging from short-form social media copy to complete blog posts and emails. Also, writers can use GPTs to outline or ideate content they then write themselves, streamlining content creation workflows.

Using GPT to generate content directly for publishing might lead to intellectual property concerns—one of the chief risks of using GPT.

Language translation

GPT-powered apps can translate language in real time from both written and audio sources. In a live demo³, GPT-4o demonstrated an ability to translate in real time on its own.

Content summarization and content conversion

GPT can process and summarize lengthy documents, such as legal statements or business reports. It can also rewrite content in the style specified by the user. For example, a user could provide a quarterly report as input data, then request for it to be summarized in witty bullet points.

Data analysis

GPT can process large volumes of data into digestible insights. Through APIs, other apps can use GPT to create charts, graphs and other types of data visualizations. Organizations feeding internal data into GPT might expose themselves to cybersecurity breaches or violate data protection regulations.

Coding

GPT models can learn programming languages and generate code snippets. Users typically enjoy better results when treating GPT as a coding assistant rather than asking it to build complete apps from scratch. All GPT-generated content, including code, should be reviewed before use to help ensure accuracy and fair use.

Healthcare

In February 2024, the US National Library of Medicine released a paper outlining potential GPT applications in the healthcare space. These include consistent access for patients in remote areas as well as personalized care options. However, the paper also covers a range of downsides, such as privacy concerns and knowledge limitations.

How does GPT work?

GPT models work by analyzing an input sequence and applying complex mathematics to predict the most likely output. It uses probability to identify the best possible next word in a sentence, based on all previous words. As a type of deep-learning AI technology, GPTs use natural language processing (NLP) to understand user prompts and generate relevant humanlike responses.

When a user inputs a text-based prompt, GPT creates the most likely response based on its training data comprising billions of publicly available text data sources ranging from famous literary works to open source code.

The vastness of its training datasets is how GPT is able to mimic humanlike language understanding capabilities. Large-scale GPT models apply deep learning to process context and draw knowledge from the relevant text in their training data to predict the optimal response.

The power of GPT models comes from two key aspects:

Generative pretraining that teaches the model to detect patterns in unlabeled data, then apply those patterns to new inputs.
A transformer architecture that enables the model to process all portions of an input sequence in parallel.

Generative pretraining

Generative pretraining is the process of training a large-language model on unlabeled data, teaching the model to recognize various data and honing its ability to create accurate predictions. GPTs generate new data by applying the patterns and structure of their pretraining data to user inputs.

Generative pretraining is a form of unsupervised learning, where the model is fed unlabeled data and forced to make sense of it on its own. By learning to detect patterns in unlabeled datasets, machine learning models gain the ability to draw similar conclusions when exposed to new inputs, such as a user prompt in ChatGPT.

GPT models are trained with billions or even trillions of parameters: internal variables that a model refines over the training process and which determine how it behaves. While OpenAI has yet to reveal precise details about GPT-4, the model is estimated to contain roughly 1.8 trillion parameters⁴ for an increase of more than tenfold over GPT-3.5.

Transformer models

Transformer models are a type of neural network specialized in natural language processing: identifying the intent and meaning in a text-based input. They can dynamically process inputs and hone in on the most important words, no matter where in the sentence they are.

GPT models don’t understand language in the same way humans do. Instead, they process words into discrete units called tokens, with some words being broken up into multiple tokens. By evaluating all tokens at once, transformers excel at establishing long-range dependencies: relationships between distant tokens. GPT relies on its understanding of long-range dependencies to process inputs contextually.

Transformer models process data with two modules known as encoders and decoders, while using self-attention mechanisms to establish dependencies and relationships.

Self-attention mechanisms

Self-attention mechanisms are the signature feature of transformers, empowering them to process an entire input sequence at once. Transformers can self-direct their “attention” to the most important tokens in the input sequence, no matter where they are.

By contrast, older recurrent neural networks (RNNs) and convolutional neural networks (CNNs) assess input data sequentially or hierarchically. Self-attention allows GPTs to process context and reply at length with language that feels natural, rather than merely guessing the next word in a sentence.

Encoders

Encoding is the process of mapping tokens onto a virtual three-dimensional vector space. Tokens encoded nearby in the 3D space are assumed to be more similar in meaning. This mathematical vectorization of an input sequence is known as an embedding.

The encoder blocks in the transformer network assign each embedding a weight, which determines its relative importance. Meanwhile, position encoders capture semantics, enabling GPT models to differentiate between groupings of the same words but in different orders—for example, “The egg came before the chicken” as compared to “The chicken came before the egg.”

Decoders

Decoders predict the most statistically probable response to the embeddings prepared by the encoders. Self-attention mechanisms permit the decoder to identify the most important portions of the input sequence, while advanced algorithms determine the output most likely to be correct.

A history of GPT

Since the release of GPT in 2018, OpenAI has remained at the forefront of the ongoing generative AI conversation. In addition to their flagship product ChatGPT, the company has also pursued image generation with DALL-E as well as generative video through Sora.

GPT-1, 2018

OpenAI releases its debut GPT model. Its performance was impressive for the time, serving as a proof-of-concept for what later developments would accomplish. GPT-1 was able to answer questions in a humanlike way and respond to text generation prompts, highlighting its future use cases in chatbots and content creation.

GPT-1 was comparatively prone to hallucinations or confabulations, where it would present incorrect information as though it was factual. Its answers indicated that OpenAI had not yet honed GPT’s ability to identify long-range dependencies and string together accurate long-form responses.

GPT-2, 2019

OpenAI’s next model boasted 1.5 billion parameters, enhancing its performance. GPT-2 was more successful than its predecessor when it came to maintaining coherency over longer responses, suggesting that its long-range dependency detection was much more established.

GPT-2 was released in stages, with several limited-capacity models made available ahead of the full version. In a statement⁵, OpenAI explained the staggered release as necessitated by a need to mitigate potential misuse and other ethical concerns. OpenAI cited how the model might be used to impersonate others online, generate misleading news items and automate both cyberbullying and phishing content.

Though OpenAI CEO Sam Altman has repeatedly made public calls for governmental regulation of AI, the company also privately lobbied to make the EU’s AI Act less restrictive⁶. The final wording of the legislation, approved by the European Parliament in June 2024, appeared to align with the company’s recommendations.

GPT-3, 2020

With 175 billion parameters—over one hundred times more than its predecessor—GPT-3 emerged as one of the largest LLMs at the time. Its capabilities vastly outstripped those of earlier models in its lineage. The free version of ChatGPT is still powered by GPT-3.5, the most current version of GPT-3.

While GPT-3’s performance reflected its additional power and size, its training demands also skyrocketed. The compute and energy resources required to train such large LLMs drew concern regarding their carbon and water footprints⁷. In response, OpenAI developed novel training methods that increased the efficiency of the training process.

GPT-4, 2023

The current version of GPT is OpenAI’s most powerful yet, outperforming its predecessors in both content quality and bias avoidance. It is behind the premium version of ChatGPT, giving subscribers greater functionality and performance over the GPT-3.5-powered free version of the service.

However, it is also the most resource-intensive model in the GPT family, with one estimate pricing daily operational costs at USD 700,000⁸. As LLMs continue to grow, debates persist about the costs versus potential benefits. A report issued by Goldman Sachs in June 2024⁹ focused on generative AI’s potentially limited use cases as compared to the rising costs to train and maintain models.

GPT-4 Turbo, the current iteration of the model, has a knowledge cutoff of April 2023. This means that its training data or knowledge base does not cover any online content released after that point.

GPT-4o, 2024

Revealed in May of 2024, GPT-4o is multilingual, supporting content in numerous non-English languages. It is also multimodal, able to process image, audio and video prompts while generating text, images and audio content in response. According to OpenAI, GPT-4o is 50% cheaper and twice as fast¹⁰ with text generation as GPT-4 Turbo.

GPT risks

While GPTs and other generative AI models have been widely celebrated in the media, their use is not without risk. Organizations and individuals seeking to incorporate GPTs into their workflows should be aware of the potential risks, including:

Data privacy and confidentiality
Intellectual property violations and ownership conflicts
Inaccurate output
Model bias

Data privacy and confidentiality

Any data entered into GPT is available for it to use when processing other queries and can be used by OpenAI to train other models. Not only does this pose a security risk for confidential data, but it also puts organizations at risk of breaching contractual and legal obligations for data protection.

Intellectual property violations and ownership conflicts

OpenAI trains its models on copyrighted materials. While the company defends this choice as fair use, it has been subjected to legal action, including a lawsuit by The New York Times¹¹ filed in December 2023. AI-generated output can contain copyrighted content, and its use can violate copyright restrictions if not vetted and edited by human beings beforehand.

OpenAI also came under fire when one of its ChatGPT voices was alleged to be based on that of actor Scarlett Johansson¹², who starred as the voice of a futuristic AI in the 2013 film Her. OpenAI has since ceased using that particular voice in its products.

Inaccurate output

GPT-generated output is not guaranteed to be factually correct. Generative AI models are subject to AI hallucinations or confabulations, in which their algorithms detect patterns in the data that don’t exist. Confabulations cause the models to produce inaccurate content that is presented to the user as though it is reliable fact. This tendency as it relates to ChatGPT has been explored at length in a 2024 paper by Hicks and others¹³.

Model bias

Model bias is a divergence between a model’s predictions based on its training data and what happens in the real world. GPT is trained on reams of internet data, and because this content is created by people, it can contain discriminatory views—sometimes intentional, often not. As AI becomes integrated into policing, healthcare and other areas of daily life, AI biases can result in real-world consequences.

How to choose the right foundation model

Learn how to choose the right approach in preparing datasets and employing foundation models.

Resources

Explore IBM Granite

Discover IBM® Granite™, our family of open, performant and trusted AI models, tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.

How to choose the right foundation model

Learn how to select the most suitable AI foundation model for your use case.

Discover the power of LLMs

Dive into IBM Developer articles, blogs and tutorials to deepen your knowledge of LLMs.

The CEO’s guide to model optimization

Learn how to continually push teams to improve model performance and outpace the competition by using the latest AI techniques and infrastructure.

A differentiated approach to AI foundation models

Explore the value of enterprise-grade foundation models that provide trust, performance and cost-effective benefits to all industries.

Unlock the Power of Generative AI and ML

Learn how to incorporate generative AI, machine learning and foundation models into your business operations for improved performance.

AI in Action 2024

Read about 2,000 organizations we surveyed about their AI initiatives to discover what’s working, what’s not and how you can get ahead.

Footnotes

1 Hello GPT-4o, OpenAI, 13 May 2024

2 Attention Is All You Need, Vaswani et al, 12 Jun 2017

3 Live demo of GPT-4o realtime translation, OpenAI, 13 May 2024

4 GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE, Patel & Wong, 10 July 2023

5 Better language models and their implications, OpenAI, 14 February 14 2019

6 Exclusive: OpenAI Lobbied the E.U. to Water Down AI Regulation, Perrigo, 20 June 2023

7 A Computer Scientist Breaks Down Generative AI's Hefty Carbon Footprint, Saenko and others, 25 May 2023

8 Microsoft Readies AI Chip as Machine Learning Costs Surge, Gardizy & Ma, 18 April 2023

9 GenAI: Too Much Spend, Too Little Benefit?, Nathan, Grimberg & Rhodes, 25 June 2024

10 OpenAI Platform, OpenAI

11 Case 1:23-cv-11195, Barron et al, 27 December 2023

12 Scarlett Johansson says a ChatGPT voice is ‘eerily similar’ to hers and OpenAI is halting its use, Grantham-Philips, 21 May 2024

13 ChatGPT is bullshit, Hicks and others, 8 June 2024

What is GPT (generative pretrained transformer)?

18 September 2024

Authors

Ivan Belcic

Cole Stryker

What is GPT (generative pretrained transformer)?

The latest AI News + Insights

Why is GPT important?

Why foundation models are a paradigm shift for AI

GPT use cases

Chatbots and voice assistants

Content creation and text generation

Language translation

Content summarization and content conversion

Data analysis

Coding

Healthcare

How does GPT work?

Generative pretraining

Transformer models

Self-attention mechanisms

Encoders

Decoders

A history of GPT

GPT-1, 2018

GPT-2, 2019

GPT-3, 2020

GPT-4, 2023

GPT-4o, 2024

GPT risks

Data privacy and confidentiality

Intellectual property violations and ownership conflicts

Inaccurate output

Model bias

Resources

Related solutions

Footnotes