September 30, 2024 By Aili McConnon 4 min read

With the launch of its new lightweight Llama 3.2 models last week, Meta became the latest company to bet big on going small, following Apple, IBM, Google, Microsoft and other tech giants that have introduced small language models (SLMs) in the last 18 months.

Yes, SLMs cost less, use less energy and often perform better than their larger counterparts on specialized tasks. But perhaps their biggest draw is that they can be implemented on smartphones and other mobile devices that operate at the edge, like car computers or smart sensors on a factory floor. 

“Smaller models will hugely impact productivity,” says Maryam Ashoori, Director of Product Management at IBM watsonx.ai. “Finally, many of the generative AI use cases will actually be accessible to a larger group of people and enterprises.”

Beyond being able to run on even very modest hardware, SLMs eliminate the need to transmit sensitive proprietary or personal data to off-network servers, which can help improve security and protect privacy.

One size doesn’t fit all

Large language models (LLMs) have started to dramatically transform the consumer and enterprise market. Generative AI can automate information extraction, classification, content generation, question and answering, and summarization, to name only a few applications.

The reality is, however, that traditional LLMs also cost millions of dollars to train and deploy, not to mention the fact that a larger LLM also means a larger GPU and greater energy consumption. Furthermore, individuals and enterprises may not be comfortable sharing their data with large public LLMs that are hosted on the cloud and trained on unstructured internet data. But creating a local large language model can be prohibitively expensive.

Enter SLMs. With approximately 1-3 billion parameters, SLMs can be developed and deployed at a fraction of the cost, making them more accessible to enterprises of all sizes, as well as regular smartphone-toting citizens.

In addition to being lower in cost, SLMs can “deliver much higher accuracy with a much, much smaller footprint,” says Shobhit Varshney, a VP and Senior Partner at IBM Consulting focusing on AI, in a recent Mixture of Experts podcast.

In the past few months, Varshney has seen many IBM clients in manufacturing and government deploy SLMs on local devices in contexts where reliable internet access may be lacking, such as on the factory floor or out in the field.

“When you can fine-tune these [models] and then run them on devices, that opens up a whole lot of use cases for our clients,” says Varshney of the new mini Llama 3.2 models, the smallest Llama models to date.

For regulated industries and sectors, such as healthcare or finance where data security is paramount, SLMs can maximize privacy.

Individuals stand to benefit, too. By November of this year, Apple phone users will be able to leverage AI-powered Apple Intelligence Writing Tools to rewrite, proofread and summarize text when they write on their devices.

As Apple explained in a press release, “Apple Intelligence allows users to choose from different versions of what they have written, adjusting the tone to suit the audience and task at hand. From finessing a cover letter, to adding humor and creativity to a party invitation, Rewrite helps deliver the right words to meet the occasion.”

Since SLMs can work offline, more people around the globe can access them.

“SLMs could be used in rural areas that lack cell service,” says Luis Vargas from Microsoft, which introduced its SLM, Phi-3-mini, in April. “Consider a farmer inspecting crops who finds signs of disease on a leaf or branch. The farmer could take a picture of the crop at issue and get immediate recommendations on how to treat pests or disease.”  

Unlocking value at “the edge”

While the tech sector has snapped up language models large and small, some experts expect more traditional industries, such as manufacturing, to see the greatest benefit from SLMs and smaller AI models, particularly at the edge, which refers to systems or devices performing line of business operations such as on the factory floor.

At the edge, “You don’t have as much compute power or storage, but you do have massive amounts of data,” says Francis Chow, VP and GM for In-Vehicle Operating Systems and Edge Computing at Red Hat. “Currently, only 1-5% of the real-time data available is being used. There is tremendous business potential if you can get value from more of that data.” 

While industries like manufacturing tend to move more slowly than IT, many places are already testing the waters with language models that summarize instruction manuals for technicians so they can ask questions and receive relevant summaries.

Using SLMs and other smaller AI models in edge computing with computer vision is another promising area, says Chow. Currently, computer vision algorithms in automobiles can stop a car if they detect a ball or other object within a certain proximity of the vehicle. As SLMs become more sophisticated, they learn from past experience, detecting patterns and making predictions. For example, if a car can detect and recognize a soccer ball, it might be able to predict that a child will come out to pick up the ball a few seconds later, and react accordingly.

Balancing accuracy and latency

“No size fits all” applies to language models too, says Dr. Christine Ouyang, a Distinguished Engineer and Master Inventor at IBM Consulting’s Center of Excellence (CoE) for generative AI. “Large language models are very powerful, but they can be an overkill for some tasks.”  

The AI CoE is collaborating with IBM Research in creating SLMs for so-called “client zero” use cases. Client zero refers to computers with no local storage. These small models are created by IBM Research using various techniques, including fine-tuning large models before distilling them or distilling larger models before fine-tuning them.

“When it comes to model size, it’s a tradeoff,” says Dr. Ouyang. “For non-mission critical applications, you can sacrifice 2% of accuracy to save significant cost and decrease latency.” Latency refers to the lag that can occur after LLMs communicate with the cloud to retrieve information in response to users’ prompts and when they receive the generated answers.

In the past, Dr. Ouyang’s team worked with IBM Supply Chain Engineering and developed AI and edge solution applications for quality inspection in IBM manufacturing projects. Use cases included defect detection, such as looking for missing screws on the back of servers, or bent or missing connector pins. 

“This type of task would have previously taken a quality control engineer ten minutes,” says Dr. Ouyang. “The AI-powered edge device solution completed this task in less than one minute.”

While SLMs are still a work in progress, promising results such as these suggest that these tiny but mighty models are here to stay.

eBook: How to choose the right foundation model
Was this article helpful?
YesNo

More from Artificial intelligence

AI superintelligence: Hype or reality?

3 min read - OpenAI CEO Sam Altman has reignited one of the tech world's favorite debates: whether or not we will soon see the advent of superintelligent AI. In a recent blog post, he wrote that such a system, surpassing human cognitive abilities, could emerge in “a few thousand days,” ushering in a revolution in global progress. Altman's musings on highly advanced artificial intelligence systems have sparked a flurry of responses from researchers and industry observers. As AI's ability to outperform humans in…

IBM launches Mistral AI with IBM, enabling customers to deploy Mistral AI’s most powerful foundation models on premises with IBM watsonx

2 min read - Foundation modes are trained on billions of parameters of data, but most of this data is general purpose and from the public domain. While useful in some scenarios, enterprises must often train these base foundation models on their own proprietary data, a step called “fine-tuning.” Tuning helps to maximize a model’s productivity in terms of overall accuracy for any specific use case. Given the potentially sensitive nature of this data and an organization’s data security standards, uploading proprietary data to a…

How a solid generative AI strategy can improve telecom network operations

3 min read - Generative AI (gen AI) has transformed industries with applications such as document-based Q&A with reasoning, customer service chatbots and summarization tasks. These use cases have demonstrated the impressive capabilities of large language models (LLMs) in understanding and generating human-like responses, particularly in fields requiring nuanced language understanding and inferencing. However, in the realm of telecom network operations, the data is different. The observability data comes from proprietary sources and encompasses a wide variety of formats, including alarms, performance metrics, probes…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters