Building out robust AI infrastructures: Networking, data pipeline automation and sustainability at the fore
A lot is riding on the success of today’s artificial intelligence efforts, placing the underlying infrastructural underpinnings under immense scrutiny. From AI networking to storage and compute, the enterprise resource draw is higher now than it’s ever been.
Given these facts, how can organizations streamline their infrastructure to maintain sustainable, robust long-term AI operations?
“The key to driving outcomes and business value from gen AI is data,” said Scott Bils (pictured), vice president of product management, professional services, at Dell Technologies Inc. “That’s where the role of AI networking becomes so critical. When you think about AI networking and the role it plays in data, when you think about clusters and AI architectures, they’re fundamentally different than traditional data center networking. When you think about clusters of GPUs, you essentially want the clusters at a rack level, or even a data center level, to function as a single computer … a single brain.”
In three separate interviews, Bils spoke with theCUBE Research’s Rob Strechay at SC24, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed Dell providing vital support as organizations streamline their AI data management processes and build scalable, performant infrastructures. (* Disclosure below.)
The unique demands of AI networking
AI networking significantly differs from traditional data center networks. AI workloads demand low latency, high throughput and seamless GPU-to-GPU communication. Unlike conventional setups where data is stored and retrieved in silos, AI systems require integrated architectures that enable data centers to function as cohesive units, according to Bils.
To achieve this, technologies such as InfiniBand and RDMA are becoming essential for connecting GPUs at scale. However, their complexity poses challenges for many organizations, particularly in terms of in-house expertise and architectural readiness.
“As enterprise deployments begin to scale out, they’re going to face and are facing similar issues,” Bils said. “Helping them think through the overall design architecture, not just for today, but going forward as they scale out the environment, is a big part of the capability we bring — then, the expertise from Nvidia and our other partners in the space as well.”
Organizations face hurdles such as network bottlenecks and underutilized GPUs, which directly affect AI performance. Dell addresses these challenges by combining its expertise, partnerships with industry leaders such as Nvidia Corp. and tailored solutions. Their approach involves deploying hardware and integrating networking and computing resources to ensure optimal performance, according to Bils.
“It’s helping them then integrate the data into the appropriate use cases and then automate and orchestrate that to ensure that you have the right velocity, including the right access to the right data sets to support the use cases,” he added. “It’s also that life cycle view from identifying the data sources, classifying, curating, cleansing and then automation, ingestion and scaling. It’s what organizations are going to have to do comprehensively to enable the AI opportunity.”
Data pipeline automation and sustainability key to long-term success
AI-driven applications require vast amounts of data processed with speed and efficiency. Dell’s approach to addressing these needs involves automating and orchestrating data pipelines tailored to specific use cases. This means understanding the performance metrics for each AI application — whether for large language models or other AI systems — and designing pipelines that meet those specific demands. With the right automation tools, businesses can scale and ensure the responsiveness of their AI models, according to Bils.
“You have to ensure that the data throughput, the way you’ve automated and orchestrated that model, is going to drive the scale, performance and responsiveness you need to match the outcome and deliver the value,” he said.
Another crucial aspect of managing AI data is the implementation of AI-specific data catalogs. These catalogs enhance data discoverability, classification and compliance, making it easier for organizations to access the most relevant data sets for their AI applications. Additionally, catalogs track data lineage, ensuring traceability of the data and its transformations, which is vital for ensuring data integrity and meeting governance requirements, Bils explained.
“It gets back to the data quality issues, being able to track that lineage and who’s touched that data,” he said. “Then the metadata as well. We think about data catalogs, an incredibly important part is the metadata about the content or the file itself, but also the metadata about the content that sits in the file or the object.”
The integration of AI into data centers has escalated energy demands, with GPUs driving significantly higher power consumption than traditional CPUs. In addition to energy costs, organizations also face geopolitical instability and infrastructure limitations, all while managing increasing regulatory pressure for sustainability. To combat these challenges, companies must prioritize improving power usage effectiveness, according to Bils.
“When you take a look at your typical data center, 40% to 60% of the operating costs are driven by energy costs,” he said. “A lot of the factors that drive prices there are beyond our customer’s control: geopolitical factors, factors around infrastructure, brittleness and stability. They have to control what they can control from an energy and sustainability standpoint.”
Here’s the complete video interviews, part of SiliconANGLE’s and theCUBE Research’s coverage of SC24:
(* Disclosure: Dell Technologies Inc. sponsored this segment of theCUBE. Neither Dell nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU