UPDATED 18:36 EDT / DECEMBER 26 2024

DeepSeek open-sources new AI model with 671B parameters

Chinese artificial intelligence developer DeepSeek today open-sourced DeepSeek-V3, a new large language model with 671 billion parameters.

The LLM can generate text, craft software code and perform related tasks. DeepSeek says it outperforms two of the most advanced open-source LLMs on the market across more than a half-dozen benchmark tests.

DeepSeek-V3 is based on a so-called mixture of experts, or MoE, architecture. It comprises multiple neural networks that are each optimized for a different set of tasks. When DeepSeek-V3 receives a prompt, a component known as a router sends the request to the neural network best-equipped to answer it.

The MoE architecture’s main benefit is that it reduces hardware costs. Sending a prompt to DeepSeek-V3 doesn’t activate the entire LLM, but only the specific neural network to which the request is routed. Each such neural network has 34 billion parameters, which means it requires a relatively limited amount of infrastructure to run.

Alongside its benefits, the MoE architecture also introduces certain challenges. During the training process, some of a MoE model’s neural networks receive more training data than the others, which can create inconsistencies in the LLM’s output quality. DeepSeek says it has developed a new method of mitigating this challenge and implemented it in DeepSeek-V3.

The LLM was trained on 14.8 trillion tokens’ worth of information. One token corresponds to a few letters or numbers. The training process took 2.788 million graphics processing unit hours, which means it used relatively little infrastructure. The industry’s most advanced AI clusters have tens of thousands of GPUs or more that can complete such a training project in a few days.

Alongside its MoE architecture, DeepSeek-V3 is equipped with several optimizations designed to boost its output quality.

LLMs use a technique called attention to identify the most important details in a sentence. DeepSeek-3 implements multihead latent attention, an improved version of the technique that allows it to extract key details from a text snippet several times rather than only once. This makes the LLM less likely to overlook important information.

DeepSeek-V also features a so-called multitoken prediction feature. Language models usually generate text one token at a time. DeepSeeek-V3, in contrast, generates several at once, which speeds up inference.

DeepSeek put its algorithm to the test by comparing it with three other open-source LLMs: the previous-generation DeepSeek-V2, Llama 3.1 405B and Qwen2.5 72B. DeepSeek-V3 achieved higher scores across all nine of the coding and math benchmarks that were used in the evaluation. It also proved better at a range of text processing tasks.

The code for DeepSeek-V3 is available on Hugging Face.

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

DeepSeek open-sources new AI model with 671B parameters

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Understanding Today's Digital Business With Dynatrace

theCUBE + NYSE Wired: MedTech Unplugged Series

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

RECENT CUBE EVENTS

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

DeepSeek open-sources new AI model with 671B parameters

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

Understanding Today's Digital Business With Dynatrace

theCUBE + NYSE Wired: MedTech Unplugged Series

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

Cookies