UPDATED 09:00 EST / NOVEMBER 13 2024

INFRA

HPE debuts powerful new supercomputer platforms for AI and high-performance computing workloads

Hewlett Packard Enterprise Co. said today it’s updating its portfolio of high-performance computing platforms.

The expanded lineup includes a couple of new HPE Cray Supercoming EX systems, plus a pair of new HPE Proliant servers that have been optimized for artificial intelligence workloads, including large language model training and fine-tuning.

HPE says the systems are said to be designed for research institutions tasked with solving some of the world’s toughest problems. They’re aimed at more traditional HPC workloads, such as sequencing DNA and automating stock trading, as opposed to being focused exclusively on AI workloads.

Leveraging expertise from Cray, the supercomputer manufacturing giant HPE acquired in 2019, they’re also the first machines in its class to be built using a 100% fanless, direct liquid cooing system architecture that spans every layer of the machine, including the compute nodes, networking and storage.

They include HPE Cray Supercomputing EX154n Accelerator Blade, which will launch towards the end of next year and has been built to drastically reduce the time it takes to complete supercomputing jobs. It’s also designed to handle AI workloads, and to do this it accommodates up to 224 of Nvidia Corp.’s new Blackwell graphics processing units in a single cabinet. Each accelerator blade comes with an Nvidia Grace Blackwell NVL4 Superchip, holding a total of four NVLink-connected Blackwell GPUs, paired with two Nvidia Grace central processing units over NVLink-C2C.

Coming sooner is the new HPE Cray Supercomputing EX4252 Gen 2 Compute Blade, which will be launched next spring. It’s more of a traditional supercomputing platform in the sense that it’s optimized to power a broader range of computing applications.

It lacks GPU hardware, which may make it less useful for AI, but in terms of traditional workloads, it’s a beast, packing up to CPU 98,304 cores in a single cabinet, making it the most powerful one-rack unit system of its kind. With eight 5thGen EPYC CPUs made by Advanced Micro Devices Inc. per core, it offers an extremely high density of CPUs, enabling customers to achieve higher-performance compute in a much smaller space than before.

To go with the new Cray Supercomputing EX models, HPE also unveiled the next generation of its exascale-capable interconnect portfolio, bundling network interface controllers, cables and switches that support speeds of up to 400 gigabits per second. In addition, there’s a new storage system and services software to look forward to.

The new network infrastructure is called the HPE Slingshot Interconnect 400, and it delivers twice the line-speed of its previous generation interconnect. It also supports advanced features like automated congestion management and adaptive routing, meaning it can reroute and optimize connectivity on the fly to ensure the lowest latency possible for any given workload it supports. It’ll be launched for clusters based on the latest HPE Cray systems in the fall of next year.

As for the HPE Cray Supercomputing Storage System E2000, it provides more than double the input/output capacity performance of HPE’s predecessor storage systems for supercomputers. Under the hood, it leverages the open-source Lustre file storage system, which helps to reduce the idle time associated with I/O operations. It all adds up to much faster storage reads and writes than before, and should significantly boost the performance of supercomputing operations when it launches early next year.

Finally there’s the new HPE Cray Supercomputing User Services Software, which is intended to improve the user experience of its supercomputing platforms with new features for optimizing system efficiency, managing power consumption and more.

New HPE ProLiant Compute XD servers for AI workloads

While the Cray supercomputers are optimized for a broader range of HPC workloads, the new HPE ProLiant Compute XD servers are built specifically for those all-important AI workloads that almost every enterprise is eager to embrace these days.

Trish Damkroger, senior vice president and general manager of HPC & AI Infrastructure Solutions at HPE, said enterprises and governments are becoming more interested in “sovereign AI initiatives,” as these enable them to retain full control over their AI models and training data. But for sovereign AI, those organizations need access to some extremely powerful hardware, which is exactly what the ProLiant Compute XD servers deliver.

HPE debuted its first batch of ProLiant Compute servers for AI in March, but the XD models are an entirely new category of machines that are optimized to support the deployment of large, high-performance AI clusters. The company has been working very closely with Nvidia on these machines, fine-tuning them to support the most advanced LLMs.

The new models include the HPE ProLiant Compute XD685, the most powerful of the two, is aimed at customers who prioritize performance over costs. It’s aimed at AI training and inference, and buyers can choose from either eight Nvidia H200 SXM Tensor Core GPUs or the same number of Nvidia Blackwell GPUs in a five-rack chassis, the company said. It’s a liquid-cooled system and it will go on sale early next year, at about the same time as the Blackwell GPUs are launched by Nvidia.

Customers have more options than just Nvidia’s hardware. HPE recently announced a separate edition of the HPE ProLiant Compute XD685 that features eight AMD Instinct MI325X accelerators and two AMD EPYC CPUs, instead of Nvidia’s hardware, which is also set to go on sale early next year.

As for the air-cooled HPE ProLiant Compute XD680 server, it’s an alternative aimed at customers that would prefer to optimize for price performance, while still being able to handle the most demanding AI training, tuning and inference jobs. Instead of Nvidia’s GPUs, it’s powered by eight of Intel Corp.’s Gaudi 3 AI accelerators, which are squeezed into a single compact node. They’ll go on sale sooner, with a launch date slated for next month.

Both of the new servers feature HPE’s Integrated Lights-Out technology for remote management, enabling select, authorized personnel to access them from any location, providing increased security compared to traditional in-band network access.

HPE said the new Proliant XD server class comes with optional services, such as installation, customization, integration and validation, along with full testing within the company’s own manufacturing facilities, for customers that want to expedite on-site deployment.

“Our customers turn to us to fast-track their AI system deployment to realize value faster and more efficiently, leveraging our decades of experience in delivering, deploying and servicing fully-integrated systems,” Damkroger said.

Image: SiliconANGLE/Freepik AI

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU