

Retrieval-Augmented Generation helps artificial intelligence stay accurate and relevant, which is why storage companies are making enterprise RAG a reality.
Facilitating the flow of data into models is only one component of supporting an enterprise-grade workflow. The demand for generative AI has put a lot of pressure on existing infrastructures, creating the need for faster, more powerful storage solutions.
Industry experts from Supermicro, Nvidia, Vast, Solidigm, Graid and Voltage Park discuss AI factories.
“You need to have more memory bandwidth and capacities, you need to have a fast neural speed for interconnect, and you need to decide whether it is in a cloud or on premises,” said Ben Lee (pictured, front row, right), director of solution architect and business development at Super Micro Computer Inc. “We’re seeing a paradigm shift for enterprise infrastructures … now you need to build enterprise AI factories.”
Lee spoke with theCUBE’s Rob Strechay at the Supermicro Open Storage Summit interview series, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. Also joining the interview were Phil Manez (front row, left), GTM execution lead at Vast Data Inc.; Saurabh Giri (front row, center), chief product and technology officer at Voltage Park Inc.; Nave Algarici (back row, left), generative AI product manager at Nvidia Corp.; Kelley Osburn (back row, center), senior director business development at Graid Technology Inc.; and Tahmid Rahman (back row, right), director of product and partner marketing at Solidigm, a trademark of SK Hynix NAND Products Solutions Corp. They discussed their collaboration and developing the infrastructure for enterprise RAG pipelines. (* Disclosure below.)
Supermicro has built what it calls a “blueprint” for creating AI factories and that includes five layers for integrating enterprise RAG pipelines: accelerated compute, high-performance storage, low-latency networking, AI software and then the models themselves. Managing all that is a tough task and Voltage Park, a cloud service provider for AI workloads, works with other storage experts, including Supermicro and Vast, to make it possible. The margin for error is slim, according to Giri.
“Accurate retrieval and augmentation without hallucinating is where it gets tricky, especially as the size of these systems, the data corpus, scales out,” he said. “It’s extremely important to put evaluations in place across the entire workflow in the agentic system to make sure that we can measure accuracy and know when the agent got it wrong.”
Part of ensuring accuracy via RAG pipelines is capturing data in real time. The Vast AI Operating System does just that, combining all of the company’s services in a way that allows users to capture both structured and unstructured events, and then contextualize it for AI, according to Manez.
“We founded our whole company on this disaggregated, shared-everything architecture,” he said. “We have to be able to support hundreds of thousands of GPUs in a single cluster, and we are literally seeing exabyte-scale deployments in single clusters now as we’re bringing together all of this data.”
Nvidia, already a powerhouse in the AI game, is a part of both the networking and AI software components in Supermicro’s five-layer blueprint. This year, the company released the NeMo Retriever, a set of microservices for embedding large-scale enterprise data into users’ RAG systems.
“Data gravity is becoming more and more apparent in the enterprise,” Algarici said. “The data needs to be secure; it needs to be maintained continuously. With that in mind, Nvidia is thinking, ‘How can we help accelerate that adoption?’”
One of Nvidia’s partners, Graid Technology, specializes in accelerating AI adoption by reducing latency throughout the network. Its newest product is SupremeRAID, a tool for removing bottlenecks in data flows for enterprise pipelines.
“With software RAID, your CPU becomes extremely busy dealing with mathematical calculations for parity … and then it becomes a bottleneck,” Osburn said. “What we’ve done is we’ve moved that stack onto a GPU, which is much better at mathematical calculations.”
For the future of AI — and the planet — power-efficient RAG pipelines are essential, according to the experts. Solidigm contributes to that effort through its high-density solid state drives, which goes a long way toward reducing costs and resources for enterprise customers. The numbers speak for themselves, according to Rahman.
“We were able … to offload RAG databases and model weights onto the SSD and we got almost 70% increased query per second compared to a traditional memory-based solution on a 1 million dataset,” he said. “We did it by reducing the memory footprint by 50%, so there’s cost saving there.”
Here’s a short clip from our interview, part of SiliconANGLE’s and theCUBE’s coverage of the Supermicro Open Storage Summit:
(* Disclosure: TheCUBE is a paid media partner for the Supermicro Open Storage Summit. Neither Super Micro Computer Inc., the sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.