

Companies are facing an architectural problem as they integrate power-hungry artificial intelligence models.
In the chaos of AI adoption, Penguin Solutions Inc. has emerged as a strong player when it comes managing high-performance computing for AI. Accelerated computing comprised 10% of all data spending in 2022 and will account for almost 90% by 2030, according to theCUBE Research.
“The market’s … growing exponentially,” said Pete Manca (pictured), president of Penguin. “The enterprise customers we speak to know they have to get some AI strategy in place, whether it’s for simple things like increased service offerings or more complex things like fraud detection and other use cases we hear out there. But they don’t know how to get there or they’re not set up today in order to get there. Traditional infrastructures are very different than AI infrastructures, and so they have to rethink how they do IT.”
Manca spoke with theCUBE’s Dave Vellante at the “Mastering AI: The New Infrastructure Rules” event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed Penguin’s history with accelerated computing and structuring hybrid AI. (* Disclosure below.)
Penguin’s background in managing HPC clusters and large scale clusters has proved a boon for the current era of AI. The company guides customers through every step of the process for upgrading their computing infrastructure.
“You create a software abstraction layer that hides the complexity of the underlying hardware, and you make it simple for the end user to manage the environment while you abstract away the complexities of the underlying hardware,” Manca explained. “That’s something that ClusterWare does for our customers … we try to abstract away all those complexities.”
Many businesses struggle with architecting hybrid AI, so Penguin builds a solution with them from the ground up. It starts with the data center, deciding whether or not to use liquid cooling or direct to chip, and goes up through the software layer, making the choice between a custom solution and an off-the-shelf solution.
For companies needing to restructure their IT set up, Manca highlights two options: going to a tier two service provider or leveraging their capabilities and building in house.
“Building in-house is probably the preferred way to go, but it means literally a soup to nuts transformation, from data center build out power cooling all the way through architecture of their system,” he said. “It’s a very complex environment, and they look to partners like Penguin Solutions to help guide them through that as a trusted advisor.”
In tackling the architectural problem many companies face, Penguin has helped customers grow their uptime on GPU clusters from 50 to 90%. Predictive failure analysis, or understanding if a GPU might fail before it does, is one of the keys to keeping a network fast and reliable, according to Manca.
“You’ve got to get the data from the storage through the network into memory to feed these GPUs and keep them busy,” he said. “Right there, you’ve got an architectural problem that you’re trying to solve around how do I get very sophisticated high speed parallel file systems to feed these GPUs as much data as possible? In some cases, in real time, it could be a batch or it could be a real-time processing engine. You’ve got to figure that out. Once you do that, then you’ve got to make sure that you keep the GPUs up and running. They’re a little bit finicky. It’s new technology.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the “Mastering AI: The New Infrastructure Rules” event:
Watch the complete event episode here:
(* Disclosure: TheCUBE is a paid media partner for the “Mastering AI: The New Infrastructure Rules” event. Neither Penguin Solutions Inc., the sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
THANK YOU