AI
AI
AI
Chinese artificial intelligence developer DeepSeek today released a new series of open-source large language models.
V4, as the algorithm family is called, comprises two LLMs on launch. There’s the flagship V4-Pro and a smaller model called V4-Flash that trades off some output quality for lower hardware usage.
Both algorithms are based on a mixture of experts, or MoE, architecture. That means they comprise multiple neural networks rather than a single set of artificial neurons. V4-Pro has 1.6 trillion parameters and activates a subset of its neural networks with 49 billion parameters when answering user prompts. V4-Flash, in turn, contains 284 billion parameters and activates 13 billion at any given time.
One of the new architectural features in the LLM series is a so-called hybrid attention mechanism. An LLM’s attention mechanism ranks the data points in a user prompt based on their importance. The model takes the most relevant data points into consideration when generating responses and discards irrelevant details, which boosts output quality.
Attention mechanisms don’t process prompts in their original form, but rather use a mathematical representation called a KV cache. V4’s hybrid attention architecture uses two different compression methods to reduce the size of the KV cache, which lowers memory requirements. As a result, the model family’s KV cache uses 90% less memory during inference than the one in DeepSeek’s previous-generation LLMs.
Many of the other new features in the V4 lineup were added to optimize its training workflow.
A neural network comprises artificial neuron collections called layers that process data in a specific order. Prompts enter the first layer, which carries out a series of calculations and transmits the results to the second layer. The second layer then performs calculations of its own, sends the results to the third layer and so forth.
Data regularly moves between an LLM’s layers during training. V4 includes a feature called mHC that enables data to travel directly between distant layers without going through the intermediate neuron clusters between them. That approach reduces training errors, which in turn boosts AI output quality.
The neuron clusters between the first and last layers of an LLM are known as its hidden layers. V4 uses a software module called Muon to optimize the hidden layers. It helps speed up training runs and reduce the associated infrastructure requirements.
DeepSeek carried out V4’s initial training using a dataset that compromised about 27 trillion tokens. It then applied a two-step post-training workflow. The first step separately optimized the neural networks that make up each V4 model, while the second improved their ability to coordinate their work.
DeepSeek evaluated V4-Pro, the most capable LLM in the series, using about two dozen benchmarks. It then compared the model’s results against the scores of several other frontier models including Claude Opus 4.6. V4 bested all the competing LLMs across three of the benchmarks. Additionally, there were several cases where V4 completed a benchmark better than some of the other LLMs but not all of them.
V4-Pro and V4-Flash are available in preview on Hugging Face.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.