Nvidia doubles the memory of its A100 GPU for AI workloads
Nvidia Corp. is widely held to be the leader in powering artificial intelligence workloads, but it’s refusing to rest on its laurels. Today it said that it has supercharged what is already the world’s fastest graphics processing unit, adding twice as much memory as before.
The new Nvidia A100 80-gigabyte GPU (pictured) comes with double the memory of its predecessor, the standard A100 40GB GPU that was launched earlier this year, enabling it to deliver more than 2 terabytes per second of memory bandwidth. The result is that the A100 GPU can now be fed twice as much data in the same amount of time, speeding up AI workloads and taking on much larger datasets for analysis, Nvidia said.
Nvidia’s A100 GPU had already lit up the AI world with its breathtaking performance, setting new records for every test across all six application areas for data center and edge computing systems in the second version of MLPerf Inference benchmarks last month. The tests are meant to establish how fast different AI systems can perform inference, or come to a conclusion or result, based on the information they digest.
Now the A100 is even faster, Nvidia said, as it rolled off a long list of achievements that it has seen in its internal testing. In AI system training, for example, the updated chip was able to deliver a three-times speed boost when it comes to retraining the DLRM recommender system model that’s commonly used to create online product recommendation systems.
The A100 80GB can now be partitioned into as many as seven different GPU instances with 10GB of memory each, Nvidia said. Doing so provides more secure hardware isolation and maximizes efficiency when running various smaller workloads side-by-side. As a result, a single A100 80GB instance was able to to deliver 1.25 times faster inference throughput with the RNN-T automatic speech recognition model, Nvidia said.
The chip also speeds up big-data analytics workloads in the terabyte-size range with a two-times performance boost, the company said. And in scientific applications, the chip achieved throughput gains of almost 100% on a single node while running the Quantum Espresso materials simulation model.
“Achieving state-of-the-results in HPC and AI research requires building the biggest models, but these demand more memory capacity and bandwidth than ever before,” Bryan Catanzaro, Nvidia’s vice president of applied deep learning research, said in a statement.
Nvidia said the more powerful A100 GPU will be available before the end of the year in its existing Nvidia DGX A100 system, as well as its new Nvidia DGX Station A100 workgroup server (pictured here) announced today, which is being billed as the first “petascale system” of its kind.
The DGX Station packs four A100 GPUs, making it capable of delivering a massive 2.5 petaflops of AI performance, Nvidia said. Within that system, the four GPUs are fully interconnected with Nvidia’s NVLink technology, which means it packs 320GB of memory to tackle the most data intensive workloads can be thrown at it. It also supports Nvidia’s Multi-Instance GPU technology, which means it can be split into 28 separate GPU instances to run even more parallel workloads at once.
Alongside the new chip and system, Nvidia is adding more bandwidth to its Mellanox InfiniBand supercomputer networking technology that powers not just its own systems, but also those of other companies.
The Nvidia Mellanox 400G InfiniBand delivers much lower latency and doubles data throughput to 400 gigabits per second. It also comes with new Nvidia In-Network Computing engines that provide additional workload acceleration, the company said. These allow deep learning training operations to be offloaded and accelerated by the InfiniBand network, resulting in a 32-times boost in AI acceleration, Nvidia said.
Meanwhile, switch system aggregated bi-directional throughput has been accelerated by five-times to 1.64 petabytes per second. That enables users to run much larger workloads than before with fewer constraints, Nvidia said.
“The most important work of our customers is based on AI and increasingly complex applications that
demand faster, smarter, more scalable networks,” said Gilad Shainer, senior vice president of networking at Nvidia. “The Nvidia Mellanox 400G InfiniBand’s massive throughput and smart acceleration engines let HPC, AI and hyperscale cloud infrastructures achieve unmatched performance with less cost and complexity.”
The company said the Nvidia Mellanox 400G InfiniBand technology will be available soon and will be integrated in new systems from companies such as Dell Technologies Inc., Lenovo Group Ltd. and Atos Inc.
“As HPC markets move to more AI-assisted workloads, Nvidia is boosting its ML capabilities and adding to its traditional FLOPS performance plus improving its internetworking for a systems approach,” Patrick Moorhead of Moor Insights & Strategy said about the updates.
These incremental improvements to Nvidia’s key AI platforms will be welcome news for enterprises, since it also means the availability of faster and better hardware at lower costs, said Constellation Research Inc. analyst Holger Mueller.
“It’s good news as it means Nvidia’s platforms will become mainstream and cheaper sooner, and not just for the most advanced applications but also for bread-and-butter ML and AI workloads,” Mueller said.
Images: Nvidia
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU