UPDATED 09:00 EST / OCTOBER 17 2023

Refuel.ai debuts a large language model to label the data needed to train other LLMs

Data labeling startup Refuel.ai Inc. today announced the launch of its Refuel Cloud platform, which uses a large language model that’s purpose-built to label datasets and get them ready for artificial intelligence training.

Refuel.ai, which raised $5.2 million in seed funding in June, uses LLMs to generate high-quality training data for AI models. It might seem odd to use AI to help train newer AI models, but in an age where every enterprise seems to be racing to develop AI, it’s an approach that makes a lot of sense.

The number of enterprises looking to leverage AI has swelled this year. Many of them are looking to develop highly customized models that can perform very specific tasks relating to their own business processes. But the task of creating these models is very challenging, and every new project begins with the same process – cleaning and labeling data.

Most companies have tons of data at their disposal, but most of that information is not training-ready. Before it can be used to train AI models, it has to be cleaned and annotated. Traditionally, this task was performed by human data scientists, but it’s a manual process that can take weeks or even months. Of course, this simply doesn’t scale, hence the need for an accelerant.

With Refuel Cloud, Refuel.ai says it’s able to automate the cleaning, labeling and enrichment of data at scale by using state-of-the-art LLMs that have been customized for the purpose. It claims its LLM can create enormous clean and accurate datasets that are ready for AI training in a matter of minutes.

Data teams can spell out in natural language exactly how they want their dataset labeled, and Refuel Cloud will complete the job rapidly. Then, domain experts can check the freshly cleaned and labeled dataset and provide feedback to ensure everything is exactly how they want.

While Refuel.ai’s platform obviously seems superior to manual processes, it isn’t the only startup that has hit upon the idea of using AI to label training data. The field of competitors include Labelbox Inc., which has attracted much more funding. In 2021, Labelbox closed on a $40 million Series C round, bringing its total amount raised to $79 million. Other rivals include Datasaur Inc., which launched with $4 million in seed funding in August, plus DataLoop Ltd., Appen Ltd. and SuperAnnotate AI Inc.

Asked how Refuel Cloud differs from those rival platforms, Refuel.ai co-founder and Chief Executive Rishabh Bhargava said the platform is the first in the industry that’s purpose-built to use LLMs as the actual annotator. “This means that all of the workflows and interfaces are designed for humans to provide instructions and feedback, and for LLMs to do the actual work of labeling,” he explained.

As such, Refuel Cloud’s performance is extremely impressive, Bhargava said. Benchmarks have shown Refuel Cloud to be eight-times faster than existing LLMs such as ChatGPT. What’s more, it costs 10% less than other LLMs, he added. He also pointed to its accuracy and its ability to scale.

“The data quality and accuracy of labels produced by Refuel is also better than human quality,” Bhargava added. “And since it’s LLMs that are doing the labeling and not humans, Refuel can scale to any data volume, intelligently switching to smaller models over time, so costs go down as customers label more data.”

Another advantage of Refuel Cloud is security, since customers can deploy the platform within their own cloud or on-premises environment, the CEO said. “This means that their data labeling needs can be serviced without data leaving their premises, or being handled by non-employees,” he explained. “This is a critical consideration for larger enterprises.”

Andy Thurai, vice president and principal analyst at Constellation Research Inc., said Refuel’s model is a refined version of the Llama-v2-13b base model developed by Meta Platforms Inc. He said the company has fine-tuned this model to handle tasks such as classification, entity resolution, matching, reading comprehension, and information extraction in specific domains such as finance, healthcare, and e-commerce. “The benchmark results seem to perform better both in terms of accuracy and cost,” Thurai said. “But this field is getting crowded, with many established players and new entrants all trying to capture the market.”

Refuel, whose founding team has worked at companies such as Google LLC’s DeepMind, Meta Platforms Inc., Apple Inc. and Amazon.com Inc., said its platform is already being used by dozens of companies in industries such as financial services, insurance, e-commerce and education.

One of its biggest fans is Enigma Technologies Inc., a business intelligence provider that provides insights into the financial health of small and medium-sized private businesses. Charles Zhu, Enigma’s vice president of product management, said data labeling is a critical task for Enigma, which uses AI to generate its business insights.

“We compared Refuel against our existing manual data labeling solution, and not only did Refuel’s LLM produce more accurate labels as validated by human teams, but their speed for large datasets was astounding – one day compared with five weeks,” Zhu said.

Image: Refuel.ai

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Refuel.ai debuts a large language model to label the data needed to train other LLMs

Image: Refuel.ai

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Microsoft Ignite 2025

Refuel.ai debuts a large language model to label the data needed to train other LLMs

Image: Refuel.ai

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Microsoft Ignite 2025

Cookies