AI
AI
AI
The conversation around high-performance computing has changed dramatically in less than four years.
What used to be a focus on the latest processors — “speeds and feeds” — has now evolved into whole new software stacks powered by advanced chipsets from vendors such as Nvidia Corp. and others. AI is rewriting the rules of infrastructure and more data center capacity is being built to meet inferencing demand.
“It’s a data center love fest here,” said John Furrier, executive analyst at theCUBE Research, during the keynote analysis at SC25. “Four years ago, it was just the basic chips were coming out. That was the beginning of the large-scale clusters. Now, all the top hyperscalers, neoclouds and anyone who’s got serious data center chops [are] loving life. It’s definitely an AI tsunami on the infrastructure.”
Furrier spoke with fellow analysts Dave Vellante, Jackie McGuire and Savannah Peterson at SC25, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They talked about AI infrastructure pressures and the growing challenges around data center capacity. (* Disclosure below.)
Here’s theCUBE’s complete video analysis with Furrier, Vellante, McGuire and Peterson:
Check out three insights you may have missed during theCUBE’s coverage of SC25:
One of the key drivers behind the transformation of compute infrastructure is AI inferencing, the process of using a trained model to make predictions or draw conclusions from data. Organizations such as the United States Forest Service are using AI inferencing to monitor endangered species in the Pacific Northwest. Working with Super Micro Computer Inc., USFS and researchers at Oregon State University are applying AI using 5,000 autonomous recording units to sample sounds and gather data in the habitats of spotted owls.
“As we move past the training of models, now we’re getting to the deployment of inference models,” said Josh Grossman, principal product manager at Super Micro. “There’s still going to be a lot of training happening, but inference is going to really move into the fore and that’s a lot of what [USFS] is doing.”
Here’s theCUBE’s complete video interview with Josh Grossman, who was joined by Chris Sullivan, director of research and academic computing at Oregon State University:
This focus on inferencing has led hardware companies such as Nvidia to develop new tools for deploying models in multi-node environments at data center scale. One of these is Dynamo, an inference framework that can scale generative AI models efficiently in large, distributed networks. Nvidia has integrated its solution with storage partners, such as WekaIO Inc., to minimize AI inferencing disruption.
“Dynamo is really about delivering AI inferencing at scale, but across the entire tiered memory,” according to Dion Harris (pictured, right), senior director of HPC, cloud and AI infrastructure solutions GTM at Nvidia, in an interview with theCUBE. “Through Dynamo, we’ve exposed a new protocol, it’s called NIXL, which is Nvidia Inference Transfer Library. That allows us to expose this sort of hierarchy to our storage partners like Weka. They’re able to then immediately have this integration across the full orchestration.”
Here’s theCUBE’s complete video interview with Dion Harris, who was joined by Shimon Ben-David (pictured, left), chief technology officer of WekaIO:
As AI agents are increasingly being deployed in enterprise environments to perform specific tasks, businesses are looking for ways that high-performance computing and cloud infrastructure can help drive results. This trend was reinforced during SC25 this month when Vast Data Inc. announced a collaboration with Microsoft Corp. on its Azure platform to provide customers with high-performance AI infrastructure in the cloud.
Azure customers will have the opportunity to leverage Vast’s capabilities such as InsightEngine and AgentEngine for running intelligent, data-driven workflows and accelerating vector search, retrieval-augmented generation pipelines and data prep. Vast’s AgentEngine fuels agentic AI with real-time data streams to enable continuous AI reasoning across hybrid cloud environments.
“We want to build an agent builder that allows you to put agents directly on top of your data,” said Jeff Denworth, co-founder of Vast Data, in conversation with theCUBE. “The whole market has changed over the last 12 months, now that reasoning models are here. With the generative AI era, you basically pump a lot of data into some sort of training run. Now, data systems become an integral part of my compute framework. Our belief is that the world is about to embark on one of the largest technology refresh events in history, now that people realize that they need to uplevel their data infrastructure to feed these new agentic systems.”
Here’s theCUBE’s complete video interview with Jeff Denworth:
AI is also spawning a new breed of cloud providers. Neoclouds, specialized computing that focuses on high-performance GPUs and infrastructure to support AI initiatives, are an emerging business, providing less-costly service alternatives to larger hyperscalers. Vast Data has noted that neoclouds were looking more closely at service delivery as part of providing a fault-tolerant, disaggregated architecture.
“In terms of the neoclouds, we’re increasingly seeing them want to offer more services to their end users,” according to Andy Pernsteiner, field chief technology officer of Vast Data, during an interview at SC25. “Renting GPUs by the hour is one thing, but providing service layers on top of it … I mean if you think about let’s say, CoreWeave, do they want to have to build their own block store, their own database table format? Do they want to have to create their own mechanisms for doing all these things?’ We can increase the ability for them to, or reduce the time to market.”
Here’s theCUBE’s complete video interview with Andy Pernsteiner:
The evolution of enterprise workloads into coding copilots and reasoning agents has placed significant demands on high-performance computing and hardware infrastructure. This is testing the limits of GPU and DRAM memory resulting in potential bottlenecks and slower speeds. Weka’s announcement of commercial availability for its Augmented Memory Grid on NeuralMesh during SC25 highlights a push to streamline long-context reasoning and agentic AI workflows.
“What we’ve done with the augmented memory grid is we’ve taken the durable advantages of Weka’s product called NeuralMesh, and we’ve plugged that into inference systems in a supported way,” said Callan Fox, principal product manager, AI inference and data management, at Weka, during an appearance on theCUBE. “What that allows us to do is take the memory tier of DRAM that exists today, as a common one and augment that, extend it into our system. It allows much larger capacities in the memory tier, but at the same speed as DRAM.”
Here’s theCUBE’s complete video interview with Callan Fox, who was joined by Betsy Chernoff, principal AI product marketing manager at Weka:
AI demand is also placing a premium on storage efficiency. Hardware vendors such as Solidigm Inc. are retooling their offerings to meet organizational need for predictable responsiveness as GPU cores multiply and networking systems increase bandwidth.
“I think last year when I spoke to you guys, GPU and [High Bandwidth Memory] were kind of the favorite child,” explained Avi Shetty, senior director of AI enablement and partnerships at Solidigm, a trademark of SK Hynix NAND Products Solutions Corp., during a discussion with theCUBE. “Over the last year, we’ve seen storage kind of put itself in its place where you’ve seen usages and certain solutions which have exposed the need for having high performing, reliant, scalable, high-dense storage solutions.”
Here’s theCUBE’s complete video interview with Avi Shetty, who was joined by Isaiah Weiner, head of product management, core software, at WekaIO:
For more from theCUBE’s coverage of SC25, check out these segments:
To watch more of theCUBE’s coverage of SC25, here’s our complete video playlist:
(* Disclosure: TheCUBE is a paid media partner for SC25 event. Sponsors of theCUBE’s event coverage do not have editorial control over content on theCUBE or SiliconANGLE.)
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.