Nvidia unleashes an Azure cloud supercomputer, Magnum IO and Arm server architecture
Graphics processor chipmaker Nvidia Corp. today announced three supercomputer innovations aimed at expanding its designs to power artificial intelligence workloads in many more data centers.
In particular, Nvidia introduced an Arm-based server architecture reference, a scalable Microsoft Azure cloud-accessible supercomputer and the Magnum IO software suite for data scientists and AI. Nvidia’s graphics processing units have become the go-to processors for AI because the highly parallel processing they used to power graphics and gaming proved ideal for machine learning as well.
Arm-based server architecture reference platform for AI
Nvidia Chief Executive Officer Jensen Huang announced the release of its Arm-based server architecture reference at the SC19 supercomputing conference today. This reference design platform – consisting of hardware and software building blocks – will enable the high-performance computing development industry to harness a broader range of central processing unit architectures.
It will allow supercomputing centers, hyperscale-cloud operators and enterprises to combine Nvidia’s CUDA software-based graphics computing chips with the latest Arm-based server platforms.
“There is a renaissance in high-performance computing,” Huang said. “Breakthroughs in machine learning and AI are redefining scientific methods and enabling opportunities for new architectures. Bringing Nvidia GPUs to Arm opens the floodgates for innovators to create systems for growing new applications from hyperscale cloud to exascale supercomputing and beyond.”
To build this reference platform, Nvidia teamed up with Arm and its ecosystem partners, including Ampere Computing, Fujitsu Ltd. and Marvell Technology Group. The platform also benefitted from deep collaboration with Cray Inc., a subsidiary of Hewlett Packard Enterprise Co., and HPE itself.
Microsoft Azure cloud-based Nvidia supercomputer
Nvidia also announced the general availability of a new Microsoft Corp. Azure cloud NDv2 supersized instance that uses up to 800 Nvidia Tensor Core GPUs interconnected on a single Mellanox InfiniBand backend network.
The company says this enables customers, for the first time, to rent an entire AI supercomputer on-demand from their desk.
“Until now, access to supercomputers for AI and high-performance computing has been reserved for the world’s largest businesses and organizations,” said Ian Buck, vice president and general manager of accelerated computing at Nvidia. “Microsoft Azure’s new offering democratizes AI, giving wide access to an essential tool needed to solve some of the world’s biggest challenges.”
This new offering is ideal for AI and machine learning workloads will provide dramatic performance benefits over traditional CPU-based computing.
Microsoft and Nvidia engineers used 64 NDv2 instances on a pre-release version of the cluster to train BERT, a popular natural language conversational AI model, in a mere three hours. That was achieved in part with Nvidia’s CUDA-core technology and the Mellanox interconnects.
Magnum IO for data scientists and AI researchers
Magnum IO is a software suite of software designed to help data scientists, AI and high-performance computing researchers process massive amounts of data in minutes as opposed to hours.
This software suite and tools are able to deliver up to 20 times faster data for multi-server, multi-GPU computing nodes when working with massive datasets than traditional models. That makes it ideal for massive complex financial analysis, climate modeling and other HPC workloads.
“Processing large amounts of collected or simulated data is at the heart of data-driven sciences like AI,” said Huang.
Nvidia developed Magnum IO in close collaboration industry leaders in computing, networking and storage, including DataDirect Networks Inc., Excelero Inc., IBM Corp., Mellanox Technologies Ltd. and WekaIO Ltd.
At the heart of Magnum IO is GPUDirect, which is an architecture that allows data to bypass CPUs and use “open highways” offered by GPUs, storage and networking devices in order to process data. At launch, it’s compatible with a wide variety of communications interconnects using peer-to-peer and remote access direct memory elements.
“Extreme compute needs extreme I/O,” Huang said. “Magnum IO delivers this by bringing Nvidia GPU acceleration, which has revolutionized computing, to I/O and storage. Now, AI researchers and data scientists can stop waiting on data and focus on doing their life’s work.”
Its newest element is GPUDirect Storage, which allows researchers to bypass CPUs when accessing storage and quickly process data files for purposes of simulation, analysis or visualization.
Nvidia Magnum IO is generally available now, with the exception of GPUDirect Storage, which is available only to select early-access customers. The full release of GPUDirect Storage is planned for the first half of 2020.
Photo: Nvidia
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU