UPDATED 13:26 EDT / SEPTEMBER 11 2020

Microsoft AI tool enables ‘extremely large’ models with a trillion parameters

Microsoft Corp. has released a new version of its open-source DeepSpeed tool that it says will enable the creation of deep learning models with a trillion parameters, more than five times as many as in the world’s current largest model.

The company also sees the tool, released Thursday, boosting the work of developers working on smaller projects. DeepSpeed is a software library for performing artificial intelligence training. Announced in February, it has already gone through multiple iterations that increased the maximum size of the models it can train from more than 100 billion to more than a trillion.

At a high level, parameters can be thought of as the insights that an AI learns from processing data. These insights are what enable AI models to improve their accuracy and speed with time. The more parameters a neural network has, the more proficiently it can process the data it ingests and thereby produce higher-quality results.

The challenge that DeepSpeed was created to address is that developers can only equip their neural networks with as many parameters as their AI training infrastructure can handle. In other words, hardware limitations are an obstacle to building bigger and better models. DeepSpeed makes the AI training process more hardware-efficient so developers may increase the sophistication of the AI software they build without having to buy more infrastructure.

Microsoft says the tool can train a trillion-parameter language model using 100 of Nvidia Corp.’s previous-generation V100 graphics cards. Normally, the company claims, that task would take 4,000 of Nvidia’s current-generation A100 graphics cards 100 days to complete. That’s with the A100 being 20 times faster than the V100.

Even if the available hardware is reduced to just a single V100 chip, Microsoft says, DeepSpeed could still train a language model with up to 13 billion parameters. For comparison, the largest language model in the world has about 17 billion parameters and the largest neural network overall packs about 175 billion.

Bar graph showing largest models can be trained using default PyTorch and ZeRO-Offload on a single GPU.

If these results hold up in real-world projects, DeepSpeed could be a major boon for AI projects. Research at groups such as OpenAI that are working to push the envelope on the size of neural networks could use it to reduce the hardware costs associated with their work. Startups and others pursuing practical day-to-day applications of AI, in turn, could harness Microsoft’s tool to build more sophisticated models than they otherwise could afford to with their limited infrastructure budgets.

DeepSpeed “democratizes multi-billion-parameter model training and opens the window for many deep learning practitioners to explore bigger and better models,” Microsoft executives Rangan Majumder and Junhua Wang wrote in a blog post.

These scalability improvements are made possible by several new technologies in the latest version of DeepSpeed. One is a ZeRO-Offload, which improves how many parameters AI training servers can handle by making creative use of the memory in those servers’ central processing units. Another innovation, dubbed 3D parallelism, distributes work among the training servers in a way that increases hardware efficiently.

“3D parallelism adapts to the varying needs of workload requirements to power extremely large models with over a trillion parameters while achieving near-perfect memory-scaling and throughput-scaling efficiency,” Microsoft’s Majumder and Wang wrote. “In addition, its improved communication efficiency allows users to train multi-billion-parameter models 2–7x faster on regular clusters with limited network bandwidth.”

Image: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Microsoft AI tool enables ‘extremely large’ models with a trillion parameters

Image: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Understanding Today's Digital Business With Dynatrace

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

VMware Explore 2025

RECENT CUBE EVENTS

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

theCUBE + NYSE Wired: Robotics & AI Infrastructure Leaders 2025

AppDev Done Right Summit 2025

Microsoft AI tool enables ‘extremely large’ models with a trillion parameters

Image: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

Understanding Today's Digital Business With Dynatrace

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

VMware Explore 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

theCUBE + NYSE Wired: Robotics & AI Infrastructure Leaders 2025

AppDev Done Right Summit 2025

Cookies