UPDATED 11:05 EST / OCTOBER 31 2024

Dell and Nvidia partner on Ethernet AI architecture and discuss collaboration in Making AI Real with Data. AI

PowerScale and SuperPOD: Dell and Nvidia certification spotlights growing role for storage and Ethernet in AI deployment

Artificial intelligence is driving a new era of innovation, and the role of Ethernet AI architecture in supporting scalable, high-performance systems is becoming increasingly vital.

Dell Technologies Inc.’s recent certification of its PowerScale portfolio for Nvidia’s DGX SuperPOD is more than just an industry move — it highlights key advancements in AI infrastructure that are transforming the way organizations deploy enterprise AI. This collaboration underscores the growing importance of strong storage solutions and efficient networking fabrics, such as Ethernet, to meet the evolving demands of AI workloads and applications across industries.

“There’s no one who can do it all by themselves — it’s going to be about how organizations, how these vendors can work together to provide an end-to-end solution,” said Bob Laliberte, principal analyst at theCUBE Research, in a recent analysis. [“Dell] is talking about extending those AI capabilities across the entire enterprise, through workstations and laptops, out to the edge and at retail.”

This feature is part of SiliconANGLE Media’s exploration of Dell’s market impact in enterprise AI. Be sure to watch theCUBE’s analyst-led on-demand coverage of “Making AI Real With Data,” a joint event with Dell and Nvidia, along with theCUBE’s discussion of SuperPOD with Dell executives. (* Disclosure below.)

PowerScale drives AI workloads

Why is storage becoming an important element in the implementation of AI?

In the development cycle for generative AI, data must be staged and prepared for the graphics processing units or GPUs so that it can be consumed at the processor level for model training and fine-tuning. In larger infrastructures, this process runs concurrently with connections to hundreds or even thousands of GPUs, and the storage system must be able to keep pace with this level of concurrency while handling GPU service requests as data is needed.

Dell enhanced its PowerScale portfolio to meet this demand for AI workloads. Its collaboration with Nvidia facilitates connectivity of network file system, or NFS, protocol transfers over remote direct memory access, or RDMA, to Nvidia’s high-powered DGX platforms.

“PowerScale designed the architecture to be able to handle these types of workloads,” said Darren Miller, director of vertical industry solutions, unstructured data storage, at Dell, in an interview with theCUBE. “This, along with capabilities like NFS over RDMA, allows for highly efficient, low congestion connectivity from your PowerScale node to those DGX nodes or DGX servers and GPUs.”

Dell also designed a new multipath driver for PowerScale that allows IO from all the cluster nodes through a single mount point, a directory that allows users to access data from different physical storage drives. The enhancement was geared toward improving performance as users attempted to feed GPUs and scale-up AI workloads.

“That’s important for the SuperPOD architecture because as a distributed compute architecture with multiple GPUs per node, each DGX server can draw and write to the storage system from a single mount point,” Miller explained. “So, we can scale the PowerScale cluster, we can provide that aggregate performance for reads and writes to the DGX systems.”

Ethernet AI architecture for networking fabric

This level of high-performance connectivity between Dell’s PowerScale storage and Nvidia’s DGX SuperPODs required a robust networking standard. Both firms agreed that Ethernet was the way to go.

It was not a trivial decision, given that Nvidia has demonstrated its own market affinity for a proprietary InfiniBand protocol in the past. However, in November, Nvidia signaled a shift toward Ethernet AI architecture with the announcement that Dell, Hewlett Packard Enterprise Co. and Lenovo Group Ltd. would be the first to integrate the chipmaker’s Spectrum-X Ethernet networking technologies for AI into their server portfolios.

In May, when Dell unveiled a new rack server, the PowerEdge XE9680L, it included support for eight Nvidia Blackwell GPUs and full 400G Ethernet. The release was part of Dell’s AI Factory, which integrated the firm’s AI portfolio offerings with Nvidia’s advanced AI infrastructure and software suite. The co-offering with Nvidia brings together the Dell AI Factory’s infrastructure, services and software with Nvidia’s advanced AI capabilities and software suite, all supported by high-speed Nvidia networking fabric. Nvidia’s networking fabric highlights a key element in Dell’s SuperPOD partnership with Nvidia. Ethernet works because of how DGX SuperPOD’s architecture meshes with PowerScale’s storage platform.

“SuperPOD is deployed in incremental deployments, starting with what Nvidia calls scalable units, which make up 32 DGX servers in a single scalable unit,” explained Dell’s Miller, during his recent conversation with theCUBE. “The DGX SuperPOD design with PowerScale was designed so that we would offer and bring to our customers the first Ethernet based storage fabric for DGX SuperPOD. We believe that it’s going to have tremendous impact in the businesses, in the industry and offer our customers a solution … they can integrate into their data centers almost immediately with their existing infrastructures or network upgrades that they’re planning for high-performance Ethernet.”

Use cases for advanced AI

The PowerEdge/SuperPOD certification opens a range of prospective use cases as organizations look for ways to unlock the impact of their AI initiatives. Dell has published a set of examples that demonstrate how PowerScale has been deployed to provide real-time analytics and insights for produce growers and helped manufacturers scale-out network attached storage for high-performance computing, safety and security.

Nvidia’s SuperPOD offering has already gained traction for conducting research at the University of Florida and training call center personnel in South Korea’s leading mobile operator. The combined PowerScale/SuperPOD functionality could be especially helpful in the healthcare industry, according to Rob Strechay, managing director and principal analyst at theCUBE Research.

“The combination of Nvidia DGX SuperPOD and Dell PowerScale is ideal for a range of advanced AI applications, particularly those that involve fine-tuning and training large language models (LLMs), vision models and healthcare-related AI workloads,” said Strechay, in his recent analysis of the certification. “The high-performance and secure multi-tenancy features make this integration particularly attractive for service providers offering GPU-as-a-service, where the flexibility to handle diverse AI workloads is paramount.”

What could be coming next as both Dell and Nvidia seek to maximize the benefits of their current Ethernet AI collaboration? A hint of what the future holds might emerge from an announcement Dell made in May during its major annual conference.

At Dell Tech World 2024, Dell unveiled Project Lightning, a new parallel file system for the PowerScale F910 all-flash offering that can achieve 97% network saturation while serving thousands of graphics processing units. Parallel systems can store significant amounts of data across servers while providing rapid access, a benefit for those seeking to maximize use of GPUs for AI workloads. In addition, Dell could be positioning itself to help enterprise IT shops seeking to run high performance computing on-premises.

If this is the case, it will follow a trend identified by SiliconANGLE in the progression of hybrid AI. A poll conducted on X by theCUBE Research’s Dave Vellante compared the rate of hybrid AI adoption to the hybrid cloud. Cloud-based AI solutions were still preferred, but the signs are there for a move toward on-prem, a shift that Project Lightning could accelerate.

“Hybrid AI is going to be like hybrid cloud, but it’s going to be different in that the on-prem vendors took over a decade to really get their act together to create the cloud operating model,” Vellante said. “[Hybrid AI is] not going to take that long.”

(* Disclosure: TheCUBE is a paid media partner for the Dell Making AI Real With Data event. Neither Dell Technologies Inc., the sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Image: SiliconANGLE/Bing

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU