UPDATED 13:26 EDT / SEPTEMBER 11 2020

AI

Microsoft AI tool enables ‘extremely large’ models with a trillion parameters

Microsoft Corp. has released a new version of its open-source DeepSpeed tool that it says will enable the creation of deep learning models with a trillion parameters, more than five times as many as in the world’s current largest model.

The company also sees the tool, released Thursday, boosting the work of developers working on smaller projects. DeepSpeed is a software library for performing artificial intelligence training. Announced in February, it has already gone through multiple iterations that increased the maximum size of the models it can train from more than 100 billion to more than a trillion. 

At a high level, parameters can be thought of as the insights that an AI learns from processing data. These insights are what enable AI models to improve their accuracy and speed with time. The more parameters a neural network has, the more proficiently it can process the data it ingests and thereby produce higher-quality results.

The challenge that DeepSpeed was created to address is that developers can only equip their neural networks with as many parameters as their AI training infrastructure can handle. In other words, hardware limitations are an obstacle to building bigger and better models. DeepSpeed makes the AI training process more hardware-efficient so developers may increase the sophistication of the AI software they build without having to buy more infrastructure. 

Microsoft says the tool can train a trillion-parameter language model using 100 of Nvidia Corp.’s previous-generation V100 graphics cards. Normally, the company claims, that task would take 4,000 of Nvidia’s current-generation A100 graphics cards 100 days to complete. That’s with the A100 being 20 times faster than the V100. 

Even if the available hardware is reduced to just a single V100 chip, Microsoft says, DeepSpeed could still train a language model with up to 13 billion parameters. For comparison, the largest language model in the world has about 17 billion parameters and the largest neural network overall packs about 175 billion. 

Bar graph showing largest models can be trained using default PyTorch and ZeRO-Offload on a single GPU.

If these results hold up in real-world projects, DeepSpeed could be a major boon for AI projects. Research at groups such as OpenAI that are working to push the envelope on the size of neural networks could use it to reduce the hardware costs associated with their work. Startups and others pursuing practical day-to-day applications of AI, in turn, could harness Microsoft’s tool to build more sophisticated models than they otherwise could afford to with their limited infrastructure budgets. 

DeepSpeed “democratizes multi-billion-parameter model training and opens the window for many deep learning practitioners to explore bigger and better models,” Microsoft executives Rangan Majumder and  Junhua Wang wrote in a blog post.

These scalability improvements are made possible by several new technologies in the latest version of DeepSpeed. One is a ZeRO-Offload, which improves how many parameters AI training servers can handle by making creative use of the memory in those servers’ central processing units. Another innovation, dubbed 3D parallelism, distributes work among the training servers in a way that increases hardware efficiently.

“3D parallelism adapts to the varying needs of workload requirements to power extremely large models with over a trillion parameters while achieving near-perfect memory-scaling and throughput-scaling efficiency,” Microsoft’s Majumder and Wang wrote. “In addition, its improved communication efficiency allows users to train multi-billion-parameter models 2–7x faster on regular clusters with limited network bandwidth.”

Image: Microsoft

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU