August 27, 2024 By Sascha Brodsky 2 min read

The landscape of generative AI is shifting, with tech giants betting on advanced voice assistants as the next frontier.

Google’s recent launch of Gemini Live for Android users marks a significant milestone in this AI marathon, closely following OpenAI’s development of ChatGPT’s Advanced Voice Mode. These next-generation voice assistants represent a leap forward from their predecessors like Apple’s Siri and Amazon’s Alexa.

“Google’s Gemini Live focuses on seamless integration with existing ecosystems and devices, while OpenAI’s GPT-4 emphasizes human-like conversation with a low millisecond response delay,” says Stephen Kowski, Field CTO at SlashNext Email Security+. “Both push boundaries in emotional recognition, contextual understanding and handling interruptions.”

Google’s Gemini Live, available to Gemini Advanced subscribers for $20 per month, aims to become a digital sidekick rather than a simple voice app. It promises deep integration with Google’s ecosystem, allowing users to interact with apps like Gmail, Calendar and Maps through natural conversation. Similarly, OpenAI’s Advanced Voice Mode, currently in alpha testing, boasts human-like interactions and demonstrated musical abilities in earlier versions.

Meanwhile, Apple is gearing up to release a generative AI-powered upgrade to Siri with iOS 18 this fall, promising more natural and contextually relevant interactions. Amazon, too, is reportedly developing a subscription-based, AI-enhanced version of Alexa to compete in this evolving market. And IBM recently introduced new features for its watsonx Assistant that leverage large speech models (LSMs) to enhance speech recognition in phone channels. These advancements, which IBM claims outperform OpenAI’s Whisper model in specific customer service scenarios, aim to transform call center operations by offering more natural and accurate voice interactions.

This push towards more sophisticated voice AI reflects a broader industry trend. Tech companies are betting that voice will become a primary interface for AI interactions, offering a more natural and intuitive way for users to access the power of large language models in their daily lives.

As these assistants become more capable and integrated into our routines, they promise to revolutionize our interactions with technology. From managing schedules and summarizing emails to providing on-the-fly information about locations or videos, these AI companions aim to blend seamlessly into our digital experiences.

However, this rapid advancement raises important questions about privacy, data collection and the ethical implications of increasingly human-like AI interactions. Kowski notes, “As AI voice assistants become more integrated, concerns arise around data collection, storage and potential misuse of personal information. There are also ethical considerations regarding consent, transparency about AI interactions and the potential for manipulation or misinformation.”

Report: Forrester Total Economic Impact study of watsonx Assistant
Was this article helpful?
YesNo

More from Artificial intelligence

Does AI help or hinder creativity? “Nerds” and creatives sound off

4 min read - Has the backlash against AI reached a fever pitch? “I really ... hate generative AI,” said the CEO of illustration company Procreate in a recent video posted on X. The title above the video: “We’re never going there. Creativity is made not generated.” Procreate’s post hit a nerve—it went viral with close to 10 million views in less than a week—though not the same nerve for everyone. “We are going through a unique moment of automation of artistry and craftsmanship,”…

New Telum II Processor and IBM Spyre Accelerator: Expanding AI on IBM Z

3 min read - In 2021, IBM® introduced the IBM Telum® processor, featuring its first advanced on-processor chip AI accelerator for inferencing. The Telum processor’s ability to deliver business outcomes has been a key driver behind the success of the IBM z16™ mainframe program. As client needs evolve, IBM continues to innovate and push the envelope on emerging technologies. Today at the Hot Chips 2024 conference in Palo Alto, California, IBM announced the next generation of enterprise computing for the AI era with the…

An in-depth look at the foundation models and tools used to develop the US Open fan experience

5 min read - For more than three decades, teams of developers and data scientists from IBM Consulting® have collaborated with the United States Tennis Association (USTA) to provide an engaging digital experience for US Open tennis fans. Let’s take a deep dive into this year’s innovations across two generative AI projects that leverage IBM’s versatile family of enterprise-ready Granite™ foundation models, among other models. We’ll also look at how the team used IBM watsonx Code Assistant™ to accelerate code generation and improve productivity…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters