UPDATED 14:10 EDT / MAY 10 2023

CLOUD

Google boosts AI training with A3 virtual machines powered by Nvidia’s H100 GPUs

Google Cloud is expanding its portfolio of virtual machines for training and running artificial intelligence and machine learning models with the launch of its A3 supercomputers.

Announced at Google I/O today, the Google Compute Engine A3 supercomputers are purpose-built VMs equipped to train and serve the most advanced AI models, including those that advance progress in the exciting area of generative AI, the company said.

State-of-the-art AI and machine learning requires massive amounts of computational power delivered by infrastructure that’s tailor-made for the purpose, Google Director of Product Management Roy Kim and Group Product Manager Chris Kleban explained in a co-authored blog post. With its A3 supercomputers, Google Cloud is offering a combination of Nvidia Corp.’s new H100 graphics processing units and its own leading networking advancements, ensuring customers can access the most powerful GPUs for AI workloads, Kim and Kleban said.

A single A3 supercomputer VM is powered by eight H100 GPUs built on Nvidia’s Hopper architecture, delivering three times faster compute than its predecessor chip, the A100. It also offers 3.6 terabytes per second of bisectional bandwidth across those GPUs via NVSwitch and NVLink 4.0, plus integration with Intel Corp.’s 4th Gen Xeon Scalable processors to offload administrative tasks.

The A3 supercomputer is the first GPU instance to leverage Google’s custom-designed Intel Infrastructure Processing Units to accelerate GPU to central processing unit data transfers by bypassing the CPU host. According to Google, this increases network bandwidth by up to 10 times that of its previous generation A2 VMs.

The instances also use Google’s intelligent Jupiter data center networking fabric, which can scale across 26,000 interconnected GPUs, helping it to provide up to 26 exaFlops of AI performance. As a result, Google said, the A3 VMs will considerably improve the time and costs of training large machine learning models. Moreover, when companies transition from training to serving their models, the A3 VMs can deliver a 30-times boost in inference performance compared to the A2 VMs.

Not only are the A3 VMs incredibly powerful, but Google Cloud offers some flexible deployment options too. For instance, customers can choose to deploy the A3 VMs on Google Cloud’s Vertex AI platform for building machine learning models on a fully-managed infrastructure that’s purpose-built for high-performance training. Vertex AI was recently updated with new generative AI capabilities, increasing support for large language model development.

Alternatively, customers that wish to architect their own customized software stack can deploy the A3 supercomputers on Google Compute Engine or Google Kubernetes Engine, the company said. That will allow teams to train and serve advanced foundation models while benefiting from autoscaling, workload orchestration and automatic updates, Google said.

Image: Google Cloud

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.