AI
AI
AI
Nvidia Corp. today launched a reference architecture that hardware makers can use to build storage equipment for artificial intelligence clusters.
The BlueField-4 STX made its debut at the company’s GTC developer event.
“AI systems that reason across massive context and continuously learn require a new class of storage,” said Nvidia Chief Executive Officer Jensen Huang. “NVIDIA STX reinvents the storage stack, providing a modular foundation for AI-native infrastructure that keeps AI factories operating at peak performance.”
The architecture’s first building block is the BlueField-4 data processing unit, or DPU, that Nvidia unveiled in January. A DPU offloads infrastructure management tasks from a server’s main processor to leave more computing capacity for applications. The BlueField-4 handles tasks such as processing data traffic between GPUs and flash storage.
According to Nvidia, the BlueField-4 STX also includes its Spectrum-X Ethernet switches and ConnectX-9 SuperNICs. Usually, the data that a server fetches from storage has to pass through its central processing unit and operating system. Spectrum-X and ConnectX-9 support a technology called RDMA that skips those pit stops, which speeds up the flow of traffic.
Nvidia says that BlueField-4 STX can process tokens, units of data used by AI models, up to five times faster than earlier storage architectures. The company also expects a fourfold improvement in energy efficiency.
The first rack-scale implementation of the BlueField-4 STX architecture is a storage system design called CMX. It’s optimized to hold key-value caches, data structures that large language models use to store information.
LLMs include an attention mechanism that analyzes each prompt, determines which of its elements are most important and prioritizes them. Along the way, the attention mechanism works turns the contents of the prompt into mathematical objects called vectors. It uses two main types of vectors: keys that help the LLM find information and values that hold the information.
CMX stores an AI cluster’s key-value cache in high-speed flash storage. BlueField-4 chips offload key data management tasks from the host cluster’s CPUs to boost performance.
According to Nvidia, CMX also speeds up AI workloads in other ways.
Storage systems use multiple hardware-intensive algorithms to reduce the risk of data loss. CMX doesn’t run those algorithms on the KV cache that it holds, which avoids the associated hardware overhead. The system can skip that step because a KV cache often doesn’t require the same data loss protection as standard business records. The information in a KV cache is relatively easy to recover and is usually retained for only a short amount of time before it’s deleted.
Nvidia expects partners to start shipping BlueField-4 STX systems in the second half of 2026. The company says that more than a half-dozen customers are already planning to use the technology. Oracle Corp., Mistral AI SAS and CoreWeave Inc. are among the early adopters.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.