Published: 18 September 2024
Contributors: Ivan Belcic, Cole Stryker
Generative pretrained transformers (GPTs) are a family of advanced neural networks designed for natural language processing (NLP) tasks. These large-language models (LLMs) are based on transformer architecture and subjected to unsupervised pretraining on massive unlabeled datasets.
GPT models form the foundation of many generative AI applications such as ChatGPT. Like many forms of AI, GPT is designed to automate tasks with the goal of simulating human-created output.
AI research firm OpenAI introduced the first GPT model, dubbed GPT-1, in 2018. Since then, they have released several advances in the GPT line of AI models. The most recent GPT model is GPT-4, which released in early 2023. In May 2024, OpenAI announced the multilingual and multimodal GPT-4o1, capable of processing audio, visual and text inputs in real time.
As a foundation model, GPT has undergone subsequent fine-tuning and been adapted to a wide range of downstream specific tasks. Beyond text-based applications, GPT powers artificial intelligence (AI) apps that generate and analyze images through computer vision, write code, process data and more. These apps connect to GPT through application programming interfaces (APIs), which allow them to pass data back and forth.
Learn how to choose the right approach in preparing data sets and employing AI models.
The CEO’s guide to generative AI
GPT models have accelerated generative AI development thanks to their transformer architecture, a type of neural network introduced in 2017 in the Google Brain paper Attention Is All You Need2. Transformer models including GPT and BERT have powered many notable developments in generative AI since then, with OpenAI’s ChatGPT chatbot taking center stage.
In addition to OpenAI, other firms have released their own generative AI models, including Anthropic’s Claude, Inflection's Pi and Google’s Gemini, previously known as Bard. Meanwhile, OpenAI powers Microsoft’s Copilot AI service.
The flexibility of transformer models such as GPT lends them to a wide range of use cases. GPT’s ability to provide humanlike text generation makes it a widespread choice for:
GPT-powered chatbots can feel more humanlike than standard automated customer service options. Through APIs, organizations can link GPT with voice apps to create voice assistants able to respond to more complex statements and provide conversational question-answering services.
With effective prompts, GPT models can generate text content ranging from short-form social media copy to complete blog posts and emails. Also, writers can use GPTs to outline or ideate content they then write themselves, streamlining content creation workflows.
Using GPT to generate content directly for publishing might lead to intellectual property concerns—one of the chief risks of using GPT.
GPT-powered apps can translate language in real time from both written and audio sources. In a live demo3, GPT-4o demonstrated an ability to translate in real time on its own.
GPT can process and summarize lengthy documents, such as legal statements or business reports. It can also rewrite content in the style specified by the user. For example, a user could provide a quarterly report as input data, then request for it to be summarized in witty bullet points.
GPT can process large volumes of data into digestible insights. Through APIs, other apps can use GPT to create charts, graphs and other types of data visualizations. Organizations feeding internal data into GPT might expose themselves to cybersecurity breaches or violate data protection regulations.
GPT models can learn programming languages and generate code snippets. Users typically enjoy better results when treating GPT as a coding assistant rather than asking it to build complete apps from scratch. All GPT-generated content, including code, should be reviewed before use to help ensure accuracy and fair use.
In February 2024, the US National Library of Medicine (link resides outside ibm.com) released a paper outlining potential GPT applications in the healthcare space. These include consistent access for patients in remote areas as well as personalized care options. However, the paper also covers a range of downsides, such as privacy concerns and knowledge limitations.
GPT models work by analyzing an input sequence and applying complex mathematics to predict the most likely output. It uses probability to identify the best possible next word in a sentence, based on all previous words. As a type of deep-learning AI technology, GPTs can process natural language prompts to generate relevant humanlike text responses.
When a user inputs a text-based prompt, GPT creates the most likely response based on its training data comprising billions of publicly available text data sources ranging from famous literary works to open source code.
The vastness of its training datasets is how GPT is able to mimic humanlike language understanding capabilities. Large-scale GPT models apply deep learning to process context and draw knowledge from the relevant text in their training data to predict the optimal response.
The power of GPT models comes from two key aspects:
Generative pretraining that teaches the model to detect patterns in unlabeled data, then apply those patterns to new inputs.
A transformer architecture that enables the model to process all portions of an input sequence in parallel.
Generative pretraining is the process of training a large-language model on unlabeled data, teaching the model to recognize various data and honing its ability to create accurate predictions. GPTs generate new data by applying the patterns and structure of their pretraining data to user inputs.
Generative pretraining is a form of unsupervised learning, where the model is fed unlabeled data and forced to make sense of it on its own. By learning to detect patterns in unlabeled datasets, machine learning models gain the ability to draw similar conclusions when exposed to new inputs, such as a user prompt in ChatGPT.
GPT models are trained with billions or even trillions of parameters: internal variables that a model refines over the training process and which determine how it behaves. While OpenAI has yet to reveal precise details about GPT-4, the model is estimated to contain roughly 1.8 trillion parameters4 for an increase of more than tenfold over GPT-3.5.
Transformer models are a type of neural network specialized in natural language processing: identifying the intent and meaning in a text-based input. They can dynamically process inputs and hone in on the most important words, no matter where in the sentence they are.
GPT models don’t understand language in the same way humans do. Instead, they process words into discrete units called tokens, with some words being broken up into multiple tokens. By evaluating all tokens at once, transformers excel at establishing long-range dependencies: relationships between distant tokens. GPT relies on its understanding of long-range dependencies to process inputs contextually.
Transformer models process data with two modules known as encoders and decoders, while using self-attention mechanisms to establish dependencies and relationships.
Self-attention mechanisms are the signature feature of transformers, empowering them to process an entire input sequence at once. Transformers can self-direct their “attention” to the most important tokens in the input sequence, no matter where they are.
By contrast, older recurrent neural networks (RNNs) and convolutional neural networks (CNNs) assess input data sequentially or hierarchically. Self-attention allows GPTs to process context and reply at length with language that feels natural, rather than merely guessing the next word in a sentence.
Encoding is the process of mapping tokens onto a virtual three-dimensional vector space. Tokens encoded nearby in the 3D space are assumed to be more similar in meaning. This mathematical vectorization of an input sequence is known as an embedding.
The encoder blocks in the transformer network assign each embedding a weight, which determines its relative importance. Meanwhile, position encoders capture semantics, enabling GPT models to differentiate between groupings of the same words but in different orders—for example, “The egg came before the chicken” as compared to “The chicken came before the egg.”
Decoders predict the most statistically probable response to the embeddings prepared by the encoders. Self-attention mechanisms permit the decoder to identify the most important portions of the input sequence, while advanced algorithms determine the output most likely to be correct.
Since the release of GPT in 2018, OpenAI has remained at the forefront of the ongoing generative AI conversation. In addition to their flagship product ChatGPT, the company has also pursued image generation with DALL-E as well as generative video through Sora.
OpenAI releases its debut GPT model. Its performance was impressive for the time, serving as a proof-of-concept for what later developments would accomplish. GPT-1 was able to answer questions in a humanlike way and respond to text generation prompts, highlighting its future use cases in chatbots and content creation.
GPT-1 was comparatively prone to hallucinations or confabulations, where it would present incorrect information as though it was factual. Its answers indicated that OpenAI had not yet honed GPT’s ability to identify long-range dependencies and string together accurate long-form responses.
OpenAI’s next model boasted 1.5 billion parameters, enhancing its performance. GPT-2 was more successful than its predecessor when it came to maintaining coherency over longer responses, suggesting that its long-range dependency detection was much more established.
GPT-2 was released in stages, with several limited-capacity models made available ahead of the full version. In a statement5, OpenAI explained the staggered release as necessitated by a need to mitigate potential misuse and other ethical concerns. OpenAI cited how the model might be used to impersonate others online, generate misleading news items and automate both cyberbullying and phishing content.
Though OpenAI CEO Sam Altman has repeatedly made public calls for governmental regulation of AI, the company also privately lobbied to make the EU’s AI Act less restrictive6. The final wording of the legislation, approved by the European Parliament in June 2024, appeared to align with the company’s recommendations.
With 175 billion parameters—over one hundred times more than its predecessor—GPT-3 emerged as one of the largest LLMs at the time. Its capabilities vastly outstripped those of earlier models in its lineage. The free version of ChatGPT is still powered by GPT-3.5, the most current version of GPT-3.
While GPT-3’s performance reflected its additional power and size, its training demands also skyrocketed. The compute and energy resources required to train such large LLMs drew concern regarding their carbon and water footprints7. In response, OpenAI developed novel training methods that increased the efficiency of the training process.
The current version of GPT is OpenAI’s most powerful yet, outperforming its predecessors in both content quality and bias avoidance. It is behind the premium version of ChatGPT, giving subscribers greater functionality and performance over the GPT-3.5-powered free version of the service.
However, it is also the most resource-intensive model in the GPT family, with one estimate pricing daily operational costs at USD 700,0008. As LLMs continue to grow, debates persist about the costs versus potential benefits. A report issued by Goldman Sachs in June 20249 focused on generative AI’s potentially limited use cases as compared to the rising costs to train and maintain models.
GPT-4 Turbo, the current iteration of the model, has a knowledge cutoff of April 2023. This means that its training data or knowledge base does not cover any online content released after that point.
Revealed in May of 2024, GPT-4o is multilingual, supporting content in numerous non-English languages. It is also multimodal, able to process image, audio and video prompts while generating text, images and audio content in response. According to OpenAI, GPT-4o is 50% cheaper and twice as fast10 with text generation as GPT-4 Turbo.
While GPTs and other generative AI models have been widely celebrated in the media, their use is not without risk. Organizations and individuals seeking to incorporate GPTs into their workflows should be aware of the potential risks, including:
Data privacy and confidentiality
Intellectual property violations and ownership conflicts
Inaccurate output
Model bias
Any data entered into GPT is available for it to use when processing other queries and can be used by OpenAI to train other models. Not only does this pose a security risk for confidential data, but it also puts organizations at risk of breaching contractual and legal obligations for data protection.
OpenAI trains its models on copyrighted materials. While the company defends this choice as fair use, it has been subjected to legal action, including a lawsuit by The New York Times11 filed in December 2023. AI-generated output can contain copyrighted content, and its use can violate copyright restrictions if not vetted and edited by human beings beforehand.
OpenAI also came under fire when one of its ChatGPT voices was alleged to be based on that of actor Scarlett Johansson12, who starred as the voice of a futuristic AI in the 2013 film Her. OpenAI has since ceased using that particular voice in its products.
GPT-generated output is not guaranteed to be factually correct. Generative AI models are subject to AI hallucinations or confabulations, in which their algorithms detect patterns in the data that don’t exist. Confabulations cause the models to produce inaccurate content that is presented to the user as though it is reliable fact. This tendency as it relates to ChatGPT has been explored at length in a 2024 paper by Hicks and others13.
Model bias is a divergence between a model’s predictions based on its training data and what happens in the real world. GPT is trained on reams of internet data, and because this content is created by people, it can contain discriminatory views—sometimes intentional, often not. As AI becomes integrated into policing, healthcare and other areas of daily life, AI biases can result in real-world consequences.
Explore the IBM library of foundation models on the watsonx™ platform to scale generative AI for your business with confidence.
Learn more about a next-generation enterprise studio for AI builders to train, validate, tune and deploy AI models.
Redefine how you work with AI for business.
Go from AI pilots to production to impact with AI technologies built for business.
Increase competitiveness in the consumer products industry with generative AI.
Designing experiences with generative AI is enabling more personalization and automation and transforming content creators into content curators.
Can AI ethics tools help? Are the tools themselves biased? Here’s a quick look at the latest research.
1 Hello GPT-4o (link resides outside ibm.com), OpenAI, 13 May 2024
2 Attention Is All You Need (link resides outside ibm.com), Vaswani et al, 12 Jun 2017
3 Live demo of GPT-4o realtime translation (link resides outside ibm.com), OpenAI, 13 May 2024
4 GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE (link resides outside ibm.com), Patel & Wong, 10 July 2023
5 Better language models and their implications (link resides outside ibm.com), OpenAI, 14 February 14 2019
6 Exclusive: OpenAI Lobbied the E.U. to Water Down AI Regulation (link resides outside ibm.com), Perrigo, 20 June 2023
7 A Computer Scientist Breaks Down Generative AI's Hefty Carbon Footprint (link resides outside ibm.com), Saenko and others, 25 May 2023
8 Microsoft Readies AI Chip as Machine Learning Costs Surge (link resides outside ibm.com), Gardizy & Ma, 18 April 2023
9 GenAI: Too Much Spend, Too Little Benefit? (link resides outside ibm.com), Nathan, Grimberg & Rhodes, 25 June 2024
10 OpenAI Platform (link resides outside ibm.com), OpenAI
11 Case 1:23-cv-11195 (link resides outside ibm.com), Barron et al, 27 December 2023
12 Scarlett Johansson says a ChatGPT voice is ‘eerily similar’ to hers and OpenAI is halting its use (link resides outside ibm.com), Grantham-Philips, 21 May 2024
13 ChatGPT is bullshit (link resides outside ibm.com), Hicks and others, 8 June 2024