UPDATED 22:03 EDT / JUNE 06 2024

AI

AI accuracy startup Galileo’s new Evaluation Foundation Model suite is designed to evaluate LLMs

Generative artificial intelligence evaluation startup Galileo Technologies Inc. said today it’s launching the industry’s first family of “evaluation foundation models,” which have been customized to evaluate the performance of large language models such as OpenAI’s GPT-4o and Google LLC’s Gemini Pro.

The Luna EFM models were developed by Galileo in response to the AI industry’s experimentations with using AI to evaluate AI. In the past couple of years, a lot of research has been published on the practicality of using models such as GPT-4 to assess the responses of other LLMs, and the progress has been encouraging, the company said.

Given those developments, Galileo thought it might make more sense to create a set of dedicated LLMs, trained specifically to evaluate the outputs of other generative AI models, and the Luna EFM family is the result of that work.

The startup explains in a paper published on Arxiv that each of its Luna EFMs has been fine-tuned to perform a very specific evaluation task, such as detecting “hallucinations,” which is when an AI system fabricates its responses. Others are designed to spot data leakages, context quality errors and malicious prompts.

Galileo, which develops tools for enhancing AI model accuracy, claims that its Luna EFM models are much faster, more cost-effective and more accurate than using either GPT-4, or standard “vibe checks” by humans, and can give businesses the confidence they need to deploy generative AI chatbots at scale.

In a blog post, Galileo Chief Executive Vikram Chatterji explained that enterprises need the ability to evaluate hundreds, if not thousands of AI responses in close to real time for problems such as hallucinations, toxicity and security risks. Having worked with many enterprises to try and solve this challenge, the company concluded that human evaluations and traditional LLM-based evaluations were too expensive and slow, he said.

“We set out to solve that, and with Galileo Luna we’re setting new benchmarks for speed, accuracy and cost efficiency,” Chatterji promised. “Luna can evaluate millions of responses per month 97% cheaper, 11x faster, and 18% more accurately than evaluating using OpenAI GPT3.5.”

The startup put the Luna EFMs through their paces in a series of benchmark tests that aimed to compare their performance with other AI evaluation tools, and the results were extremely promising.

According to Chatterji, the Luna family’s performance exceeded all existing evaluation models in terms of overall accuracy by up to 20%, including beating the company’s own Chainpoll LLM that’s designed to detect hallucinations.

What’s more, the Luna EFMs are much cheaper, the benchmarks showed, with evaluation compute costs said to be as much as 30 times lower than GPT-3.5. The evaluations are much faster too, with its results delivered in just milliseconds. In addition, the Luna EFMs are much more customizable than other solutions, as they can quickly be fine-tuned to spot very specific problems with generative AI outputs, the CEO said.

In addition, the Luna EFMs also come out on top in terms of explainability, providing users with explanations of its evaluations. In doing this, it can help to streamline root-cause analysis and debugging operations, the startup claims.

Galileo’s benchmarks are backed by a stamp of approval from early adopters of the Luna EFMs.

Alex Klug, head of product, data science and AI at the personal computer giant HP Inc., said accurate model evaluation tools are essential for delivering safe, reliable and production-grade AI applications. “Until now, existing evaluation methods such as human evaluations or using LLMs as a judge have been very costly and slow,” Klug said. “With Luna, Galileo is overcoming enterprise teams’ biggest evaluation hurdles.”

The startup said the Luna EFMs are available now in its Galileo Project and Galileo Evaluate platforms now, and are already being used extensively by a number of Fortune 10 banks and Fortune 50 companies.

Images: Galileo Technologies

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU