The future of networking: How HPC is being reshaped to accommodate AI workloads
As a show primarily geared toward hardware enthusiasts, Supercomputing 2023 has already kicked off with a bang — particularly on the networking side of things.
The networking landscape is evolving, and Broadcom Inc., a key player in the hardware domain, is collaborating with hyperscale customers to navigate the challenges posed by the ever-expanding artificial intelligence workloads.
“What has changed the whole environment are all these LLMs — ChatGPT-3 was 175 million parameters, GPT-4 is a trillion plus parameters,” said Hasan Siraj (pictured), head of software products and ecosystem at Broadcom. “When you’re trying to train them, you can’t do this in traditional compute — you can’t do this on a single GPU. [The network] connects all of these GPUs, because they have to communicate in order for the job to be complete.”
Siraj spoke with theCUBE industry analysts John Furrier and Savannah Peterson at SC23, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the indispensable role of networking, particularly ethernet, as the backbone of AI infrastructure. (* Disclosure below.)
Present-day challenges
The exponential growth in model parameters, illustrated by the ever-increasing scope of emerging LLMs, demands a reevaluation of traditional computing methods. The need for seamless communication between GPUs, with their massive parallel processing capabilities, necessitates a robust and high-bandwidth network, according to Siraj.
“When you are building these out, the communication network is taking up to 57% of the time in training,” he said. “The GPUs are sitting idling at that point in time doing nothing. It’s not the most valuable asset, but [the network] essentially pays for itself if you can improve the performance by 10%.”
Broadcom is playing a prominent role in shaping the ecosystem. The Jericho3-AI fabric, announced early this year, is a potential bottleneck killer, providing 12% better performance than even InfiniBand, according to Siraj.
“Jericho solves some of these problems in the sense that it does what we call perfect load balancing,” he explained. “Every packet gets sprayed across every link that’s out there. So you are a hundred percent utilizing the network.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of SC23:
(* Disclosure: TheCUBE is a paid media partner for SC23. Neither Dell Technologies Inc., the main sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU