UPDATED 21:15 EST / OCTOBER 31 2023

AWS offers more flexible access to Nvidia GPUs for short-duration AI workloads

Amazon Web Services Inc. said today it’s launching a new consumption model for enterprises looking to reserve access to cloud-hosted graphics processing units for short-duration artificial intelligence workloads.

Amazon Elastic Compute Cloud (EC2) Capacity Blocks for ML, generally available now, and allows customers to reserve access to “hundreds” of Nvidia Corp.’s most advanced H100 Tensor Core GPUs colocated in Amazon EC2 UltraClusters that are geared toward high-performance machine learning workloads.

To access the EC2 Capacity Blocks, customers simply specify their desired cluster size, future start date and duration required, and they’ll be able to ensure they have reliable, predictable and uninterrupted access to GPU resources for critical AI projects.

AWS said the EC2 Capacity Blocks solve a lot of problems for customers. These days, the most powerful AI workloads, such as training large language models, require significant compute capacity, and Nvidia’s GPUs are considered to be among the best hardware money can buy. However, with all of the buzz around generative AI this year, Nvidia’s chips are suddenly in very short supply, with not enough of them available to go around to all of the companies that require them.

The company said the GPU shortages are especially acute for those customers whose capacity needs fluctuate. Because they don’t require GPUs on an ongoing basis, they can struggle to access such resources when they do need them. To overcome this, many customers commit to purchasing GPU capacity for longer durations, only to leave it sitting idle when they’re not using it. EC2 Capacity Blocks helps such customers by giving them a more flexible and predictable way to procure GPU capacity for shorter periods.

AWS Principal Developer Advocate Channy Yun likened EC2 Capacity Block reservations to the process of booking a hotel room. “With a hotel reservation, you specify the date and duration you want your room for and the size of beds you’d like ─ a queen bed or king bed, for example,” he explained in a blog post. “Likewise, with EC2 Capacity Block reservations, you select the date and duration you require GPU instances and the size of the reservation (the number of instances). On your reservation start date, you’ll be able to access your reserved EC2 Capacity Block and launch your P5 instances.”

AWS explained that the EC2 Capacity Blocks are deployed in EC2 UltraClusters and interconnected with Elastic Fabric Adapter petabit-scale network to ensure low-latency and high throughput connectivity. Because of this, it’s possible to scale to hundreds of GPUs, it said. Customers can reserve clusters of GPUs ranging from one to 64 instances, for between one and 14 days, up to eight weeks in advance. That makes them ideal for AI model training and fine-tuning, short experiment runs and handling an expected surge in demand, for instance when a new product is launched, the company said.

Holger Mueller, an analyst with Constellation Research Inc., said AWS has come up with a creative solution to maximize the efficiency of its available GPU resources, which are now in peak demand and cost a premium to access. He said EC2 Capacity Blocks borrows from a mainframe-era approach that was first utilized back in the 1970s, when mainframes were operated as timeshare computers, supporting hundreds of users simultaneously for various workloads.

“It’s an old approach to maximizing the use of scarce compute resources and it aims to solve a key problem for enterprises with AI workloads, which need a reliable way to ensure they have GPU capacity when they need it,” Mueller said. “With AWS, enterprises no longer have to worry, the only downside is they might have to wait for that access. Realistically though, with AI workload demand so high, it will be some time until cloud providers can offer infinite compute capacity to their customers.”

“With Amazon EC2 Capacity Blocks, we are adding a new way for enterprises and startups to predictably acquire Nvidia GPU capacity to build, train and deploy their generative AI applications,” said AWS Vice President of Compute and Networking David Brown.

AWS customers can use the AWS Management Console, Command Line Interface or Software Development Kit to find and reserve GPU capacity via EC2 Capacity Blocks, starting now in the AWS US East (Ohio) region, with more regions and local zones to be added later. Pricing information can be found here.

Image: AWS

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

AWS offers more flexible access to Nvidia GPUs for short-duration AI workloads

Image: AWS

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

CES 2026

AWS re:Invent 2025

Microsoft Ignite 2025

SC25

Refresh North America 2025

AWS offers more flexible access to Nvidia GPUs for short-duration AI workloads

Image: AWS

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

CES 2026

AWS re:Invent 2025

Microsoft Ignite 2025

SC25

Refresh North America 2025

Cookies