Three insights you might have missed from SC24
High-performance computing innovations are redefining the future of enterprise computing, pushing the boundaries of scalability, sustainability and innovation.
At the heart of this transformation is the emergence of scalable AI infrastructure, which is democratizing supercomputing and making advanced technologies accessible to enterprises of all sizes, according to John Furrier, executive analyst at theCUBE Research.
“I think this year you’re starting to see real build-out around the infrastructure hardware and where hardware is turning into systems,” Furrier said in during the recent SC24 event. “You’re going to start to see the game change, and then the era’s here, the chapter’s closed, the old IT is over and the new systems are coming in.”
Furrier and fellow theCUBE Research analysts Dave Vellante and Savannah Peterson spoke with tech leaders in AI and high-performance computing at SC24, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. Discussions centered on how AI-driven innovation shapes scalable infrastructure, sustainability practices and quantum computing’s future role in data center architectures. (* Disclosure below.)
Here are three key insights you may have missed from theCUBE’s coverage:
1. Collaborative innovations drive sustainable AI scalability for modern workloads.
As enterprises adopt flexible, open systems, collaborations across the tech industry address the challenges of power consumption and cost. Partnerships such as those between Super Micro Computer Inc. and WekaIO Inc. exemplify high-performance computing innovations, pioneering energy-efficient AI data centers. These collaborations ensure sustainability remains a core principle of scalability, according to Nilesh Patel, chief product officer at Weka; Patrick Chiu, senior director of storage product management at Supermicro; and Ian Finder, group product manager of accelerated computing at Nvidia Corp.
“As we continue to see the build-out [of AI data centers], two challenges are happening,” Patel told theCUBE during the event. “One is the power consumption; the power requirement in data centers is growing like crazy. The second thing is now we are getting into influencing space where it’s becoming a token economy. The cost token for dollars, tokens per wattage use and so on … have become our important key performance indicators. We got together with Nvidia and Supermicro and tried to attack one of the core problems that is becoming the Achilles heel for data center growth, particularly for AI infrastructure.”
AI’s exponential growth has pushed traditional computing frameworks to their limits, making clustered systems essential for scaling modern workloads, according to Hasan Siraj, head of software products, ecosystem, at Broadcom Inc. Networking advances reflect high-performance computing innovations, serving as the glue connecting these clusters, enabling efficient training of large language models while addressing latency and bandwidth challenges.
“If you are training a large model and these models are growing at an exponential, they don’t fit in a central processing unit, and a core of a CPU, virtualization is no play,” Siraj said during the event. “This is why you cannot fit a model within a server or two servers or four servers. That is why you need a cluster. When you have a cluster and everything is spread out, you need glue to put this all together. That is networking.”
Building on scalable clusters, open hardware solutions provide enterprises the flexibility to tailor infrastructure to diverse workloads. These systems break free from proprietary lock-in, delivering cost-effective options for scaling AI operations while optimizing resource usage, according to Steen Graham, chief executive officer of Metrum AI Inc. and Manya Rastogi, technical marketing engineer at Dell.
“I think right now with AI, we’ve really kind of optimized software in a great way,” Graham said during an interview at the event. “We’re building this really systematic software with AI workers that will save people material [and] time and ultimately drive topline revenue and getting enterprises to really high-fidelity solutions.”
Here’s the complete video interview with Patrick Chiu, Nilesh Patel and Ian Finder:
2. High-performance computing innovations drive flexibility and intelligent solutions.
The evolution of artificial intelligence demands modular systems that prioritize efficiency, flexibility and scalability. Broadcom, Dell and Denvr Dataworks Inc. exemplify this approach with AI factories designed for compact, energy-efficient operations. These modular superclusters integrate over 1,000 GPUs in under 900 square feet, leveraging advanced liquid immersion cooling to optimize power usage and space, according to Broadcom’s Hasan Siraj; Vaishali Ghiya, executive officer of global ecosystems and partnerships at Denvr Dataworks; and Arun Narayanan, senior vice president of compute and networking product management at Dell.
“AI workloads are very power-hungry,” Ghiya told theCUBE during the event. “That is exactly why we designed our Denvr Dataworks private zone, in partnership with Broadcom and Dell, so that we can give customers different choices and options as well as open architecture. Liquid immersion cooling, as well as liquid to the chip cooling, really results in the efficient power usage as well as a compact footprint.”
Decentralization further reshapes enterprise AI infrastructure, providing sustainable alternatives that challenge traditional hardware dependency. Organizations can optimize their hardware setups by embracing multi-vendor ecosystems with diverse solutions, such as Advanced Micro Devices Inc. GPUs. These integrations enable high-performance computing innovations for customized AI workloads while fostering innovation, according to Saurabh Kapoor, director of product management and strategy at Dell Technologies, and Jon Stevens, chief executive officer of Hot Aisle Inc.
“The thing that I think that we’re going to focus on is just continuously releasing whatever’s [the] latest and greatest, working with Dell, working with AMD [and] working with Broadcom to continuously make this latest and greatest hardware available to developers, to anyone, and support them with that,” Stevens told theCUBE during the event.
Data intelligence underpins the success of these modular systems, transforming raw data into actionable insights that drive scalability. By ingesting, analyzing and delivering insights across diverse data types, platforms such as DataDirect Networks Inc. enhance AI performance and adapt to evolving business needs, according to Alex Bouzari, co-founder and chief executive officer of DDN.
“The industry is completely transforming — it’s all about AI,” Bouzari told theCUBE during the event. “You have to be able to ingest the data, images, audio, text [and] video from lots of different sources. You have to be able to analyze it, process it, gain insight from it and then deliver that insight to organizations who will then benefit from it. And we are at the core of it. We are the data intelligence platform that propels the growth of AI across industries and marketing.”
Here’s the complete video interview with Alex Bouzari:
3. Networking and thermal management are the foundation of scalable AI systems.
Direct liquid cooling has emerged as a critical solution to manage the intense heat generated by powerful CPUs and GPUs, maintaining performance at scale while supporting high-performance computing innovations. The rise of exascale computing redefines the boundaries of high-performance computing, enabling massive data processing with unparalleled efficiency.
However, this progress introduces significant challenges in thermal management, necessitating advanced cooling technologies, according to Armando Acosta (pictured), director of HPC product management at Dell. Direct liquid cooling has emerged as a critical solution to manage the intense heat generated by powerful CPUs and GPUs, maintaining performance at scale while supporting high-performance computing innovations.
“If you look at the rise of exascale, what you’re starting to see now is with the rise of exascale and these large machines and HPC supercomputers, guess what? New challenges arise when you try to go to that scale,” Acosta said during the event. “When you look at exascale, what it’s driving is more direct liquid cooling technologies. If you want the highest performance, you want the best CPU or the highest performing GPU … you have to do direct liquid cooling.”
As artificial intelligence workloads expand, networking infrastructure must evolve to support high throughput and low-latency demands. Unlike traditional data centers, AI architectures require clusters of GPUs functioning cohesively as a single computational unit. This integration unlocks the potential of AI operations while driving high-performance computing innovations that deliver both efficiency and business value, according to Scott Bils, vice president of product management, professional services, at Dell.
“The key to driving outcomes and business value from gen AI is data,” he said during the event. “That’s where the role of AI networking becomes so critical. When you think about AI networking and the role it plays in data, when you think about clusters and AI architectures, they’re fundamentally different than traditional data center networking. When you think about clusters of GPUs, you essentially want the clusters at a rack level, or even a data center level, to function as a single computer … a single brain.”
To sustain long-term AI scalability, organizations must address growing demands on energy and infrastructure through tailored solutions, Bils noted. Automating data pipelines and employing AI-specific data catalogs improve performance and sustainability by streamlining access and ensuring compliance.
“As enterprise deployments begin to scale out, they’re going to face and are facing similar issues,” Bils said. “Helping them think through the overall design architecture, not just for today, but going forward as they scale out the environment, is a big part of the capability we bring — then, the expertise from Nvidia and our other partners in the space as well.”
Here’s the complete interview with Armando Acosta:
To watch more of theCUBE’s coverage of SC24, here’s our complete event video playlist:
(* Disclosure: TheCUBE is a paid media partner for SC24. Neither Dell Technologies Inc., the headline sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU