Taming the Wild West of AI-generated search results

Companies are racing to integrate generative AI into their search engines, hoping to revolutionize the way users access information. However, this uncharted territory comes with a significant challenge: ensuring the accuracy and reliability of AI-generated search results. As AI models grapple with “hallucinations“—producing content that fills in gaps with inaccurate information—the industry faces a critical question: How can we harness the potential of AI while minimizing the spread of misinformation?

Google’s new generative AI search tool recently surprised users by suggesting they eat rocks. Despite this, the company told the BBC the feature performs relatively well overall.

“The examples we’ve seen are generally very uncommon queries, and aren’t representative of most people’s experiences,” Google said in a statement to the news service. “The vast majority of AI overviews provide high quality information, with links to dig deeper on the web.”

The incident highlights the search technology’s current limitations. The AI Overviews feature, powered by Gemini, a large language model similar to the one behind OpenAI’s ChatGPT, generates written responses to specific search queries by summarizing online information. While the current AI boom capitalizes on LLMs’ impressive proficiency in text generation, the software can also employ this capability to present inaccuracies or errors convincingly.

Generative AI can simplify online search results by summarizing information. However, it can become risky when sources disagree or when people use AI summaries to make important decisions.

As AI-powered search engines rapidly evolve, researchers are trying to boost their reliability by squashing inaccuracies, fighting bias and increasing transparency. From teaching AI when to admit uncertainty to prioritizing diversity in training data, these experts are exploring innovative solutions to build trust and pave the way for a search experience users can confidently rely on, even as they acknowledge the significant progress still to be made.

Academics and scientists in the field weigh in on how we got to this point—and what that path to progress looks like.

The inherent dilemma: Novelty vs. accuracy

At the heart of the issue lies the fundamental nature of generative AI and its preeminent use case. Researchers have explored the tradeoffs that surface when generative AI is designed to create novel content rather than provide factual answers. “Using ChatGPT to do internet search is like using a drill to hammer a nail: it might work, but why do that?” said Robert Ghrist, an engineering professor at the University of Pennsylvania. When users pose questions to generative AI in the style of a Google search, the AI may generate responses that are not grounded in reality, leading to the phenomenon of hallucinations.

Deming Chen, Professor of Electrical and Computer Engineering at The University of Illinois’ Grainger College of Engineering, warned that an AI model may overgeneralize from its training data and produce answers that are not factual. He also noted that the facts the AI was trained on may have already changed or evolved quickly after the model’s training was completed.

“Another main reason is that due to limited data or no data available for certain niche topics, the AI model may infer information that seems correct but is not based on facts,” he said, adding that such hallucinations can lead to the spread of misinformation, erosion of trust and decision-making errors.

The problem is further compounded by the “suboptimal selection of the architecture and size of the neural network (e.g., LLM), which plays a key role in the model’s ability to process and interpret data,” according to Dmytro Shevchenko, a data scientist from software development firm Aimprosoft. He explained that hallucinations can occur when a model trained on a vast amount of data generates information that seems plausible but is actually incorrect, leading to further issues if users blindly trust the AI’s responses.

Explore the AI solutions

Enhancing AI search accuracy

Tech firms are exploring many avenues to refine AI search, from comparing different query results to fine-tuning the data and formulas that power these systems. Shevchenko suggests that “companies can improve model architectures, use more sophisticated and appropriate model architectures, and methods that allow models to better adapt to a variety of data and usage scenarios.”

One promising approach involves integrating human expertise and context-aware, fact-based knowledge databases into the AI search process. Techniques such as reinforcement learning from human feedback (RLHF) and retrieval-augmented generation (RAG) offer potential solutions for more accurate and trustworthy AI-generated search results. Shevchenko emphasizes the importance of expert involvement, stating that “including experts in the process of evaluating and refining model results is essential.”

Researchers are tackling inaccuracies and unclear results to boost AI search reliability. A recent study found 32 different ways to reduce hallucinations in AI language models, all created in just the last few years. This includes teaching the AI when it should avoid giving an answer if it’s unsure.

Bias is another issue. Shahan Ali Memon and Jevin D. West from the University of Washington recently wrote about the significance of projects to decrease bias in AI systems. One noteworthy example is an initiative called Latimer or Black GPT, which aims to emphasize diversity and inclusion when training the AI.

“Generative AI has the potential to transform our current information ecosystem into a contemporary sibling of the Library of Babel,” Memon and West wrote. “In this new version, there would be an infinite array of texts, blending truth and fabrication, but worse, they would be stripped of their covers, thereby obscuring the provenance and sources of information. Our door to the internet, the search engine—akin to a digital librarian—tends to hallucinate and generate fabricated tales. Finding reliable information in this world is like a wild goose chase, except that the geese are on roller skates.”

As technology companies continue to navigate the challenges of AI search accuracy, the journey toward fully realizing the potential of generative AI is far from over. However, each failure serves as a learning opportunity. As Ghrist noted, “The AI you use today is the worst AI you will have available to you starting soon.”

Download the eBook: Generative AI + ML for the enterprise

Was this article helpful?

YesNo

The inherent dilemma: Novelty vs. accuracy

Enhancing AI search accuracy

Tags

More from Artificial intelligence

Are bigger language models always better?

Generative AI meets application modernization

Accelerating responsible AI adoption with a new Amazon Web Services (AWS) Generative AI Competency

IBM Newsletters