UPDATED 21:43 EST / MARCH 18 2024

INFRA

HPE debuts its Nvidia GPU-powered on-premises supercomputer for generative AI

Hewlett Packard Enterprise Co. has taken the lid off its highly anticipated generative artificial intelligence supercomputer platforms, designed to help companies create, fine-tune and run powerful large language models in their own data centers.

The announcement came as HPE and its rival Supermicro Inc. announced significant updates to their portfolios for generative artificial intelligence workloads. Among the new systems were some powerful new servers featuring Nvidia Corp.’s most advanced graphics processing units, the new Blackwell GPUs, which were announced at GTC 2024 today.

HPE has been working closely with Nvidia to leverage its expertise in high-performance computing and build a generative AI supercomputer in a box that provides all of the software and services developers need to build advanced models, along with powerful compute capabilities.

The company said its supercomputing platform for generative AI, unveiled in November, is now ready to order, providing an ideal solution for companies that need to run AI initiatives in their own, on-premises servers. Billed as a full-stack solution for developing and training large language models, the system is powered by Nvidia’s GH200 Grace Hopper Superchips, and features everything needed to get started in generative AI, including a liquid-cooling system, accelerated compute, networking, storage and AI services. Additionally, the company also announced availability of its enterprise computing solution for generative AI. Announced in November, that service enables users to quickly customize foundation models using private data and deploy production applications anywhere, from edge to cloud.

HPE said its supercomputer platform is targeted at large enterprises, research institutions and government agencies, and is available to buy directly, while the enterprise solution is also available through the HPE GreenLake pay-per-use model. The supercomputing platform is pre-configured for model training and tuning, while the enterprise solution is preconfigured for fine-tuning and inference workloads. They provide the powerful compute, storage, software and networking capabilities, along with consulting, to help companies get started with generative AI.

Under the hood, the enterprise platform offers a high-performance AI compute cluster powered by a combination of HPE ProLiant DL380a Gen11 servers and Nvidia’s H100 graphics processing units. It also integrates Nvidia’s Spectrum-X Ethernet networking technology and its BlueField-3 data processing units for optimizing AI workloads. HPE has added its own machine learning and analytics software into the mix, while the Nvidia AI Enterprise 5.0 platform, which comes with Nvidia’s newly announced NIM microservices, helps to streamline AI development.

The company said the enterprise solution will support various LLMs, including both proprietary ones and open-source variants. According to HPE, it’s ideal for the lightweight fine-tuning of AI models, retrieval-augmented generation and scale-out inference, and the company claims a 16-node system can fine-tune a 70 billion-parameter Llama 2-based model in just six minutes.

Both offerings are designed to address the AI skills gap, with HPE Services providing the expertise enterprises need to design, deploy and manage the platform on-premises, and also implement their AI projects.

HPE President and Chief Executive Antonio Neri said that many enterprises need a “hybrid by design” solution that can address the entire AI lifecycle. “From training and tuning models on-premises, in a colocation facility or in the public cloud, to inference at the edge, AI is a hybrid cloud workload,” Neri explained.

Justin Hotard, executive vice president and general manager of the HPC and AI Business Group at Hewlett Packard Labs, appeared on theCUBE, SiliconANGLE Media’s mobile livestreaming show, during the company’s HPE Discover 2023 event in Barcelona in November, where he spoke about the capabilities of the new supercomputing platform:

AI software stack

While putting the finishing touches on its generative AI supercomputing platform, HPE was also working with Nvidia on the various software systems needed to take advantage of it. These include the HPE Machine Learning Inference Software, available as a technology preview from today, which will help customers to rapidly and securely deploy AI models on its infrastructure. It integrates with Nvidia’s new NIM microservices, providing access to optimized foundation models hosted in prebuilt software containers.

In addition, HPE said it has developed a reference architecture for RAG, which is a technique that enables LLMs to augment their knowledge with proprietary datasets. It also released its HPE Machine Learning Data Management Software, Machine Learning Development Environment Software and Machine Learning Inference Software to support generative AI development.

Finally, HPE teased some upcoming new servers that will be based on Nvidia’s newly announced Blackwell GPU architecture, including the Nvidia GB200 Grace Blackwell Superchip, the HDX B200 and HGXB100 GPUs.

Supermicro reveals first Blackwell GPU-based servers

Though HPE will unveil more details of its Grace-based servers in the coming weeks, Supermicro appears to have an early lead. The company unveiled a range of new servers at GTC 2024 today, with new systems featuring the GB200 Grace Blackwell Superchip, plus the B200 and B100 Tensor Core GPUs based on Blackwell. In addition, the company said its existing Nvidia HGX H100 and H200 systems are being made “drop-in ready” for the new GPUs, meaning customers can simply acquire the silicon in order to turbocharge their existing data center investments.

According to Supermicro, it will be the first server company to launch Nvidia HGX B200 8-GPU and HGX B100 8-GPU systems later this year. The new systems will feature eight of Nvidia’s new Blackwell GPUs connected by the fifth-generation NBLink interconnect technology that delivers 1.8 terabytes per second of bandwidth. It promised that they’ll deliver a three-times boost in LLM training performance compared to systems utilizing Nvidia’s older Hopper architecture.

“Supermicro continues to bring to market an amazing range of accelerated computing platform servers that are tuned for AI training and inference that can address any need in the market today,” saidKaustubh Sanghani, Nvidia’s vice president of GPU product management.

Catering to the demand for on-premises LLM workloads, Supermicro has built a range of new MGX servers that will be powered by the GB200 Grace Blackwell Superchip, which is more powerful than the standard GPU chips. The new Superchip is equipped with two Blackwell GPUs, plus multiple central processing units, and will provide a significant boost for AI inference, with Supermicro claiming a 30-times performance increase compared to the previous-generation Superchip.

For the most advanced LLM workloads, Supermicro detailed an up-and-coming rack-scale server based on the Nvidia GB200 NVL72, which will connect 36 Nvidia Grace CPUs with 72 Blackwell GPUs in a single rack. Each of the GPUs in this configuration will be linked with the latest Nvidia NVLink technology for GPU-to-GPU communication at 1.8 terabits per second.

Images: Nvidia and Supermicro

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU