Are bigger language models always better?

In the race to dominate AI, bigger is usually better. More data and more parameters create larger AI systems, that are not only more powerful but also more efficient and faster, and generally create fewer errors than smaller systems.

The tech companies seizing the news headlines reinforce this trend. “The system that we have just deployed is, scale-wise, about as big as a whale,” said Microsoft CTO Kevin Scott about the supercomputer that powers Chat GPT-5. Scott was discussing the latest version of Open AI’s generative AI chatbot at the company’s recent Build event in late May. “And it turns out you can build a whole hell of a lot of AI with a whale-sized supercomputer.”

Meanwhile, Nvidia’s market capitalization hit the $3 trillion mark in June. The chip maker has been growing at a dizzying pace as its chips power increasingly large language models, supercomputers and the data centers mushrooming across the world.

But is bigger always better? It depends on your perspective. For companies developing large language models, scale is an advantage in most cases. But as enterprises look to separate the hype from where AI can add true value, it’s not clear that increasingly larger language models will always lead to better solutions for businesses.

Going forward, “we won’t need models that are 100x what we have today to extract most of the value,” said Kate Soule, IBM’s program director for Generative AI Research in a recent episode of IBM’s Mixture of Experts podcast. Many companies that are already getting a return on their AI investments are using it for tasks such as classification and summarization, which don’t even use the full capacity of current language models.

A brief history of scaling

“Bigger is better” stems from the data scaling laws that entered the conversation with a 2012 paper by Prasanth Kolachina applying scaling laws to machine learning. Kolachina and his colleagues showed that as models got larger, they generally got more accurate and performed better. In 2017, Hestness et al. displayed that deep learning scaling is predictable empirically too. Then in 2020, Kaplan et al. showed that data scaling laws held true for language models as well.

While these laws are helpful for language model providers striving to create artificial general intelligence, it’s far from clear that businesses need this scale of investment or AI to get most of the value.

“Just because you know the most cost-effective way to train a model of the nth degree in size, will the actual benefits you derive from that model justify the costs?” said IBM’s Soule. “That’s an entirely different question that the scaling laws don’t answer.”

Explore the AI solutions

The tradeoff between cost and size

The cost of data is rising as the high-quality data used to train AI models is becoming increasingly scarce. A paper by Epoch AI, an AI research organization, found that AI models could exhaust all the current high-quality language data available on the internet as soon as 2026.

And so companies are getting creative in terms of accessing new data to train models and manage costs. Open AI’s newest version of Chat GPT, for instance, is offered free to users in exchange for some user and third-party data. Major players are also looking into synthetic data, which is made up of 2D images, 3D data, text and more, that are used with real-world data to train AI.

While the companies developing LLMs shoulder data costs, the climate costs of increasingly large language models, have been largely overlooked. As these models grow in complexity and usage, they consume vast computational resources. Data centers housing the supercomputers powering these models consume a significant amount of energy, creating corresponding carbon emissions.

“It’s not just that there’s big energy impacts here, but also that the carbon impacts of that will bring costs first to people who are not benefiting from this technology,” said Emily Bender a University of Washington Professor of Linguistics who published a paper entitled On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?.

“When we do the cost-benefit analysis, it’s important to think of who’s getting the benefit and who’s paying the cost because they’re not the same people,” said Bender in a University of Washington news release.

Mini powerhouses

One way that companies are balancing costs and benefits is by using bigger models first to address the most challenging business problems. Then, once they get the answer, they switch to smaller models that replicate the findings of the large models but at a lower cost and with decreased latency.

The use of smaller language models is also growing as an alternative to large language models.

“Smaller LLMs offer users more control compared to larger language models like ChatGPT or Anthropic’s Claude, making them more desirable in many instances,” said Brian Peterson, co-founder and chief technology officer of Dialpad, a cloud-based, AI-powered platform told PYMNTS.

“They’re able to filter through a smaller subset of data, making them faster, more affordable, and, if you have your own data, far more customizable and even more accurate.” The race to build larger and more powerful LLMs is unlikely to slow down any time soon. But going forward, most experts agree we will also see a surge of compact but powerful AI models that excel in specific fields and offer an alternative to companies looking to better balance the value and costs of AI.

Download The CEO’s guide to generative AI

Was this article helpful?

YesNo

A brief history of scaling

The tradeoff between cost and size

Mini powerhouses

Tags

More from Artificial intelligence

Taming the Wild West of AI-generated search results

Generative AI meets application modernization

Accelerating responsible AI adoption with a new Amazon Web Services (AWS) Generative AI Competency

IBM Newsletters