UPDATED 21:15 EST / OCTOBER 31 2023

AI

AWS offers more flexible access to Nvidia GPUs for short-duration AI workloads

Amazon Web Services Inc. said today it’s launching a new consumption model for enterprises looking to reserve access to cloud-hosted graphics processing units for short-duration artificial intelligence workloads.

Amazon Elastic Compute Cloud (EC2) Capacity Blocks for ML, generally available now, and allows customers to reserve access to “hundreds” of Nvidia Corp.’s most advanced H100 Tensor Core GPUs colocated in Amazon EC2 UltraClusters that are geared toward high-performance machine learning workloads.

To access the EC2 Capacity Blocks, customers simply specify their desired cluster size, future start date and duration required, and they’ll be able to ensure they have reliable, predictable and uninterrupted access to GPU resources for critical AI projects.

AWS said the EC2 Capacity Blocks solve a lot of problems for customers. These days, the most powerful AI workloads, such as training large language models, require significant compute capacity, and Nvidia’s GPUs are considered to be among the best hardware money can buy. However, with all of the buzz around generative AI this year, Nvidia’s chips are suddenly in very short supply, with not enough of them available to go around to all of the companies that require them.

The company said the GPU shortages are especially acute for those customers whose capacity needs fluctuate. Because they don’t require GPUs on an ongoing basis, they can struggle to access such resources when they do need them. To overcome this, many customers commit to purchasing GPU capacity for longer durations, only to leave it sitting idle when they’re not using it. EC2 Capacity Blocks helps such customers by giving them a more flexible and predictable way to procure GPU capacity for shorter periods.

AWS Principal Developer Advocate Channy Yun likened EC2 Capacity Block reservations to the process of booking a hotel room. “With a hotel reservation, you specify the date and duration you want your room for and the size of beds you’d like ─ a queen bed or king bed, for example,” he explained in a blog post. “Likewise, with EC2 Capacity Block reservations, you select the date and duration you require GPU instances and the size of the reservation (the number of instances). On your reservation start date, you’ll be able to access your reserved EC2 Capacity Block and launch your P5 instances.”

AWS explained that the EC2 Capacity Blocks are deployed in EC2 UltraClusters and interconnected with Elastic Fabric Adapter petabit-scale network to ensure low-latency and high throughput connectivity. Because of this, it’s possible to scale to hundreds of GPUs, it said. Customers can reserve clusters of GPUs ranging from one to 64 instances, for between one and 14 days, up to eight weeks in advance. That makes them ideal for AI model training and fine-tuning, short experiment runs and handling an expected surge in demand, for instance when a new product is launched, the company said.

Holger Mueller, an analyst with Constellation Research Inc., said AWS has come up with a creative solution to maximize the efficiency of its available GPU resources, which are now in peak demand and cost a premium to access. He said EC2 Capacity Blocks borrows from a mainframe-era approach that was first utilized back in the 1970s, when mainframes were operated as timeshare computers, supporting hundreds of users simultaneously for various workloads.

“It’s an old approach to maximizing the use of scarce compute resources and it aims to solve a key problem for enterprises with AI workloads, which need a reliable way to ensure they have GPU capacity when they need it,” Mueller said. “With AWS, enterprises no longer have to worry, the only downside is they might have to wait for that access. Realistically though, with AI workload demand so high, it will be some time until cloud providers can offer infinite compute capacity to their customers.”

“With Amazon EC2 Capacity Blocks, we are adding a new way for enterprises and startups to predictably acquire Nvidia GPU capacity to build, train and deploy their generative AI applications,” said AWS Vice President of Compute and Networking David Brown.

AWS customers can use the AWS Management Console, Command Line Interface or Software Development Kit to find and reserve GPU capacity via EC2 Capacity Blocks, starting now in the AWS US East (Ohio) region, with more regions and local zones to be added later. Pricing information can be found here.

Image: AWS

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU