UPDATED 12:00 EST / DECEMBER 11 2024

AI

AIMon raises $2.3M to combat AI hallucinations

Aimon Labs Inc., the creator of an autonomous “hallucination” detection model that improves the reliability of generative artificial intelligence applications, said today it has closed on a $2.3 million preseed funding round.

The money comes from Bessemer Venture Partners and Tidal Ventures, who co-led the round, along with a number of angel investors, including Thumbtack Inc. Chief Executive Marco Zapacosta and Sumo Logic Inc. co-founder Kumar Saurabh.

AIMon, as the startup likes to be known, is trying to tackle the extremely difficult-to-solve problem of AI hallucinations, and to do this it’s relying on generative AI itself to monitor and safeguard other generative AI applications. It has created a specialized proprietary model called HDM-1, or Hallucination Detection Model-1, and it says its performance far exceeds most LLMs in terms of its detection capabilities.

AI hallucinations are one of the biggest challenges faced by LLM developers. IBM Corp. defines them as a phenomenon in which LLMs “perceive patterns or objects that are nonexistent or imperceptible to human observers, creating outputs that are nonsensical or altogether inaccurate.” They have drawn a lot of attention in the last couple of years, ever since ChatGPT exploded into the public consciousness, with numerous documented instances of AI models going astray.

The hallucinatory behavior of AI models can take many forms, such as leaking sensitive trading data, exhibiting bias, falling victim to prompt injection attacks, which is where malicious actors attempt to manipulate AI models into performing unintended actions. In some extreme cases, AI models can completely lose the plot, such as when a beta version of Microsoft Corp.’s chatbot Sydney professed its love for The Verge journalist Nathan Edwards, and later falsely confessed to murdering one of its developers.

What’s alarming is how common AI hallucinations can be, with some studies estimating that they make up anywhere between 3% and 10% of all LLMs’ responses to user prompts.

For businesses wanting to adopt AI, these hallucinations present a serious problem, because they simply cannot afford to unleash generative AI applications that are so unreliable. As AIMon co-founder and CEO Puneet Anand explains, if you have an AI model that’s intended to diagnose medical issues automatically from computed tomography scans, you simply cannot afford to make mistakes.

“It is critical to ensure that the AI is actually doing what it is supposed to do,” Anand said. “That is exactly what AIMon brings to the table. We are laser-focused on improving reliability for LLMs.”

AIMon’s HDM-1 is a commercially available AI monitoring tool that’s designed to automate the improvement of LLMs, helping developers to discover when their models are hallucinating, troubleshoot the cause of those hallucinations, and find ways to fix them.

With this approach, a specialized model is deployed to evaluate the text and outputs of other LLMs, such as GPT-4. These evaluations check different aspects in the model’s outputs, like tone, coherence and factual accuracy, among others. In the case of AIMon’s HDM-1, it’s tasked with assessing whether the content created by the LLMs it monitors meets specific criteria. Arguably, the most important reason this technique got so popular is because it automates away what humans (usually engineers) need to do.

The startup, which participated in the Microsoft for Startups and Nvidia Inception programs, says HDM-1 has shown it can dramatically outperform OpenAI’s GPT4-o mini, GPT4-Turbo and other notable models in terms of hallucination detection. With HDM-1, companies can gain more insights into their LLM-based applications, both in the preproduction process and after they have been deployed. It’s useful not only for developers, but also governance, risk and compliance teams, the company said.

AIMon isn’t alone in pursuing the idea of using an AI model to solve AI’s hallucinatory problems. Another startup, called Patronus AI Inc., debuted a similar model called Lynx during the summer, publishing benchmarks that showed it was 8.3% more accurate than GPT-4o in terms of detecting medical inaccuracies. In its own benchmarks, AIMon claims to be 7.8% better than Lynx.

Tidal Ventures’ Nicholas Muy said AIMon’s approach to increasing trust in AI is one of the most effective he has come across. “It’s absolutely critical for builders of innovative generative AI applications,” he said.

Image: SiliconANGLE/Freepik

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.