UPDATED 09:00 EST / OCTOBER 17 2023

AI

Refuel.ai debuts a large language model to label the data needed to train other LLMs

Data labeling startup Refuel.ai Inc. today announced the launch of its Refuel Cloud platform, which uses a large language model that’s purpose-built to label datasets and get them ready for artificial intelligence training.

Refuel.ai, which raised $5.2 million in seed funding in June, uses LLMs to generate high-quality training data for AI models. It might seem odd to use AI to help train newer AI models, but in an age where every enterprise seems to be racing to develop AI, it’s an approach that makes a lot of sense.

The number of enterprises looking to leverage AI has swelled this year. Many of them are looking to develop highly customized models that can perform very specific tasks relating to their own business processes. But the task of creating these models is very challenging, and every new project begins with the same process – cleaning and labeling data.

Most companies have tons of data at their disposal, but most of that information is not training-ready. Before it can be used to train AI models, it has to be cleaned and annotated. Traditionally, this task was performed by human data scientists, but it’s a manual process that can take weeks or even months. Of course, this simply doesn’t scale, hence the need for an accelerant.

With Refuel Cloud, Refuel.ai says it’s able to automate the cleaning, labeling and enrichment of data at scale by using state-of-the-art LLMs that have been customized for the purpose. It claims its LLM can create enormous clean and accurate datasets that are ready for AI training in a matter of minutes.

Data teams can spell out in natural language exactly how they want their dataset labeled, and Refuel Cloud will complete the job rapidly. Then, domain experts can check the freshly cleaned and labeled dataset and provide feedback to ensure everything is exactly how they want.

While Refuel.ai’s platform obviously seems superior to manual processes, it isn’t the only startup that has hit upon the idea of using AI to label training data. The field of competitors include Labelbox Inc., which has attracted much more funding. In 2021, Labelbox closed on a $40 million Series C round, bringing its total amount raised to $79 million. Other rivals include Datasaur Inc., which launched with $4 million in seed funding in August, plus DataLoop Ltd., Appen Ltd. and SuperAnnotate AI Inc.

Asked how Refuel Cloud differs from those rival platforms, Refuel.ai co-founder and Chief Executive Rishabh Bhargava said the platform is the first in the industry that’s purpose-built to use LLMs as the actual annotator. “This means that all of the workflows and interfaces are designed for humans to provide instructions and feedback, and for LLMs to do the actual work of labeling,” he explained.

As such, Refuel Cloud’s performance is extremely impressive, Bhargava said. Benchmarks have shown Refuel Cloud to be eight-times faster than existing LLMs such as ChatGPT. What’s more, it costs 10% less than other LLMs, he added. He also pointed to its accuracy and its ability to scale.

“The data quality and accuracy of labels produced by Refuel is also better than human quality,” Bhargava added. “And since it’s LLMs that are doing the labeling and not humans, Refuel can scale to any data volume, intelligently switching to smaller models over time, so costs go down as customers label more data.”

Another advantage of Refuel Cloud is security, since customers can deploy the platform within their own cloud or on-premises environment, the CEO said. “This means that their data labeling needs can be serviced without data leaving their premises, or being handled by non-employees,” he explained. “This is a critical consideration for larger enterprises.”

Andy Thurai, vice president and principal analyst at Constellation Research Inc., said Refuel’s model is a refined version of the Llama-v2-13b base model developed by Meta Platforms Inc. He said the company has fine-tuned this model to handle tasks such as classification, entity resolution, matching, reading comprehension, and information extraction in specific domains such as finance, healthcare, and e-commerce. “The benchmark results seem to perform better both in terms of accuracy and cost,” Thurai said. “But this field is getting crowded, with many established players and new entrants all trying to capture the market.”

Refuel, whose founding team has worked at companies such as Google LLC’s DeepMind, Meta Platforms Inc., Apple Inc. and Amazon.com Inc., said its platform is already being used by dozens of companies in industries such as financial services, insurance, e-commerce and education.

One of its biggest fans is Enigma Technologies Inc., a business intelligence provider that provides insights into the financial health of small and medium-sized private businesses. Charles Zhu, Enigma’s vice president of product management, said data labeling is a critical task for Enigma, which uses AI to generate its business insights.

“We compared Refuel against our existing manual data labeling solution, and not only did Refuel’s LLM produce more accurate labels as validated by human teams, but their speed for large datasets was astounding – one day compared with five weeks,” Zhu said.

Image: Refuel.ai

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU