UPDATED 13:09 EDT / AUGUST 19 2020

AI

Microsoft targets its fastest Azure AI instance to date at large neural networks

Microsoft Corp. today previewed a new Azure instance for training artificial intelligence models that targets the emerging class of advanced, ultra-large neural networks being pioneered by the likes of OpenAI.

The instance, called the ND A100 v4, is being touted by Microsoft as its most powerful AI-optimized virtual machine to date.

The ND A100 v4 aims to address an important new trend in AI development. Engineers usually develop a separate machine learning model for every use case they seek to automate, but recently, a shift has started toward building one big, multipurpose model and customizing it for multiple use cases. One notable example of such an AI is the OpenAI research group’s GPT-3 model, whose 175 billion learning parameters allow it to perform tasks as varied as searching the web and writing code. 

Microsoft is one of OpenAI’s top corporate backers. The company has also adopted the multipurpose AI approach internally, disclosing in the instance announcement today that such large AI models are used to power features across Bing and Outlook.

The  ND A100 v4 is aimed at helping other companies train their own supersized neural networks by providing eight of Nvidia Corp.’s latest A100 graphics processing units per instance. Customers can link multiple ND A100 v4 instances together to create an AI training cluster with up to “thousands” of GPUs.

Microsoft didn’t specify exactly how many GPUs are supported. But even at the low end of the possible range, assuming a cluster with a graphics card count in the low four figures, the performance is likely not far behind that of a small supercomputer. Earlier this year, Microsoft built an Azure cluster for OpenAI that qualified as one of the world’s top five supercomputers, and that cluster had 10,000 GPUs. 

In the new ND A100 v4 instance, what facilitates the ability to cluster together GPUs is a dedicated 200-gigabit per second InfiniBand network link provisions to each chip. These connections allow the graphics cards to communicate with each across instances. The speed at which GPUs can share data is a big factor in how fast they can process that data, and Microsoft says its the ND A100 v4 VM offers 16 times more GPU-to-GPU bandwidth than any other major public cloud.

The InfiniBand connections are powered by networking gear supplied by Nvidia’s Mellanox unit. To support the eight onboard GPUs, the new instance also packs a central processing unit from Advanced Micro Devices Inc.’s second-generation Epyc series of server processors. 

The end result is what the company describes as a big jump in AI training performance. “Most customers will see an immediate boost of 2x to 3x compute performance over the previous generation of systems based on Nvidia V100 GPUs with no engineering work,” Ian Finder, a senior program manager at Azure, wrote in a blog post. He added that some customers may see performance improve by up to 20 times in some cases. 

Microsoft’s decision to use Nvidia chips and Mellanox gear to power the instance shows how chipmaker is already reaping dividends from its $6.9 billion acquisition of Mellanox, which closed this year. Microsoft’s own investments in AI and related technologies have likewise helped it win customers. Today’s debut of the new AI instance was preceded by the Tuesday announcement that the U.S. Energy Department has partnered with the tech giant to develop AI disaster response tools on Azure. 

The ND A100 v4 is currently in preview. 

Photo: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.