UPDATED 21:10 EST / JANUARY 28 2025

AI

Hugging Face wants to reverse-engineer DeepSeek’s R1 reasoning model

Researchers from Hugging Face Inc. say they’re attempting to recreate Chinese startup DeepSeek’s R1 “reasoning model.”

The initiative comes after R1 stunned the artificial intelligence community by matching the performance of the most capable models built by U.S. firms, despite being built at a fraction of the cost. Hugging Face researchers say the Open-R1 project aims to create a fully open-source duplicate of the R1 model and make all of its components available to the AI community.

Elie Bakouch, one of the Hugging Face engineers leading the project, told TechCrunch that though DeepSeek claims R1 is open-source because it can be used without any restrictions, the truth is that it doesn’t meet the standard definition of open software. That’s because many of the components used to build it, and also the data it was trained on, have not been made publicly available.

The lack of information about what goes into DeepSeek means that it’s really just another “black box,” similar to proprietary models such as OpenAI’s GPT series, making it impossible for the AI community to build on or improve, he said.

DeepSeek, which is operated by Hangzhou DeepSeek Artificial Intelligence Co. Ltd. and Beijing DeepSeek Artificial Intelligence Co. Ltd., hit the headlines last week when it made its two primary reasoning models – DeepSeek-R1-Zero and DeepSeek-R1 – available on Hugging Face. At the same time, it also published a paper on arxiv.com outlining the development process behind the models.

The R1 model has caused intense excitement with its apparent ability to match the performance of advanced LLMs like OpenAI’s GPT-4o and Anthropic PBC’s Claude, even though it was built at a total cost of just $5.6 million, according to its developer. In contrast, OpenAI and other American firms like Google LLC and Meta Platforms Inc. have spent billions of dollars on developing their own models.

DeepSeek’s model demonstrates that it’s possible to make the same kind of progress without breaking the bank, and the revelation caused chaos in the financial markets earlier this week, with the stocks of U.S. companies involved in AI development tanking on Monday. The AI chipmaker Nvidia Corp. saw its stock fall 15%, while Broadcom Inc.’s shares were down 16% and Taiwan Semiconductor Manufacturing Corp. dropped 14%.

At the same time, DeepSeek’s iOS chatbot application, which provides free access to the R1 model, emerged from nowhere to become the No. 1 productivity app on the Apple App Store this week.

The Chinese company claims that it developed R1 with fewer, and much less advanced graphics processing units than the ones that were used to develop models like GPT-4o and Llama 3, raising questions about whether the multibillion-dollar investments being made in AI are really necessary. On a number of benchmarks, R1 has shown it’s able to match or even surpass the performance of OpenAI’s o1 reasoning model.

Reasoning models are notable for their ability to “fact-check” their responses before they output them, helping to avoid the “hallucinations” that plague more standard large language models. They generally take a little longer to generate their responses, as these accuracy checks take a little time, but it makes them much more reliable in areas such as physics, science and math.

Hugging Face says it’s attempting to replicate R1 to benefit the AI research community, and it intends to do so in just a few weeks. To do this, it will leverage the company’s dedicated research server, the “Science Cluster,” which is powered by 768 Nvidia H100 GPUs. The plan is to try to reverse engineer the R1 model to try and understand what data was used to train it, and which components were used in its creation.

The Open-R1 project is seeking assistance from the broader AI research community to try and recreate the training datasets used by DeepSeek, and it has garnered a lot of interest so far, with its associated GitHub page getting more than 100,000 stars just three days after its launch.

Despite the initial enthusiasm from the AI community, it may be difficult for Hugging Face’s researchers to pull this off and make a version of R1 that’s close to the real thing, analyst Holger Mueller of Constellation Research Inc. told SiliconANGLE.

“Hugging Face wants to reverse engineer DeepSeek’s model because it has all of the attention right now, and if it can do this, it will increase transparency and improve confidence for users,” Mueller said. “But without the underlying datasets used by DeepSeek, it will be challenging for them to do this. Still, Hugging Face’s researchers are good at what they do, so let’s wait and see what they come up with.”

Bakouch said the project is not a zero-sum game, but rather the start of something that will hopefully be much more beneficial for the wider AI industry. He said he hopes that whatever they manage to build will eventually become the foundation of a new generation of even more advanced open-source reasoning models. If they can recreate R1, the entire AI community will be able to look at how it works and try to improve on it, he explained.

“Open-source development immediately benefits everyone, including the frontier labs and the model providers, as they can all use the same innovations,” he said.

Image: SiliconANGLE/Dreamina

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU