UPDATED 12:00 EDT / MARCH 21 2023

INFRA

Nvidia unleashes its next-generation GPUs, DPUs and AI accelerators

Nvidia Corp. announced a new generation of hardware at GTC 2023 today that’s designed for artificial intelligence inference tasks to power the most advanced generative AI models being built today.

AI inference, the running of trained AI models, has emerged as one of the hottest areas in enterprise technology, with models such as OpenAI LP’s GPT-4 powering ChatGPT and other AI engines being rapidly incorporated into a range of software-based tools and services. Inferencing is key to the development of everything from customer service chatbots to AI-powered avatars and video analysis in manufacturing quality control environments.

Companies require massive computing resources to power AI inferencing, and that’s where Nvidia comes in with its extremely powerful graphics processing units. Today it announced its latest generation of Nvidia DGX platforms incorporating its new Nvidia H100 GPUs (pictured). The H100 GPUs are based on the Nvidia Hopper architecture and feature a built-in Transformer Engine that’s optimized for developing, training and deploying generative AI, large language models and AI recommender systems. According to the company, the H100 enables AI training at up to nine times faster than its previous-generation GPUs, with up to 30 times faster AI inference on LLMs.

Alongside the H100 GPUs, Nvidia also revealed a new generation of specialized AI accelerators and inferencing chips, a new data center processing unit, and advances in the field of computational lithography.

Nvidia DGX H100 GPUs

Customers can access the new H100 GPUs in the latest Nvidia DGX H100 platform. It’s powered by eight H100 GPUs linked with Nvidia’s NVLink high-speed interconnects and Quantum InfiniBand and Spectrum Ethernet networking. Altogether, the platform provides 32 petaflops of compute performance, Nvidia said, with networking speeds that are twice as fast as the previous DGX A100 platform.

The DGX H100 is integrated with the complete Nvidia AI software stack, including the latest edition of Nvidia AI Enterprise and Nvidia Base Command, which serves as its operating system.

Enterprises won’t need to acquire their own DGX H100 platform though, for they can also access the H100 GPUs in the cloud through a number of partners, including Oracle Corp., Microsoft Corp. and Amazon Web Services Inc. All three cloud providers have announced services and instances powered by the H100 GPU. Meanwhile, Meta Platforms Inc. said it has deployed hundreds of H100 GPUs in its new Grand Teton AI supercomputer for internal use by its AI production and research teams.

“Generative AI’s incredible potential is inspiring virtually every industry to reimagine its business strategies and the technology required to achieve them,” Nvidia Chief Executive Jensen Huang said in a keynote address. “Nvidia and our partners are moving fast to provide the world’s most powerful AI computing platform to those building applications that will fundamentally transform how we live, work and play.”

Nvidia said the DGX H100 platform is available now worldwide, while Microsoft Azure and Oracle Cloud Infrastructure are now offering access in private preview. AWS, along with Google Cloud, are expected to provide access soon.

More Advanced AI accelerators and DPUs

Besides the H100 GPU, Nvidia also unveiled some additional hardware that’s geared for specialized AI tasks. The Nvidia L4 (below) is a single-slot and low-profile accelerator for AI, video and graphics that can fit into any standard server, where it can decode, run models and encode video at up to 120 times faster than the best central processing unit-based platforms. It can also simultaneously run up to 1,040 HD streams, or process graphics four times faster than the previous Nvidia T4 accelerator. Google Cloud has announced early access to the Nvidia L4 via its cloud platform.

There’s also a variation called the Nvidia L40, which is an accelerator designed specifically for AI-powered image generation. According to Nvidia, the L40 serves as the engine of the Nvidia Omniverse platform for building and running 3D metaverse applications.

Meanwhile, the Nvidia H100 NVL is a specialized chip for real-time LLM inferencing. It’s a PCIe GPU that can fit into any server that’s able to accommodate a PCIe card. It features two H100 GPUs in a double-wide case with an NVLink bridge, delivering a combined 188 gigabytes of memory and 12 times the throughput of a standard HGX A100 chip. Nvidia said it’s designed for deploying massive LLMs like ChatGPT at scale, capable of up to 12-times faster inference performance compared to the A100 GPU.

Finally, Nvidia revealed the latest generation of its BlueField-3 data processing unit (below), which boasts double the number of Arm processor cores and more accelerators than the BlueField-2 DPU. This means it can run workloads at up to eight times faster. DPUs are designed to offload, accelerate and isolate certain types of infrastructure administration tasks more efficiently, including network traffic inspection. By offloading this work to the DPU, a server’s central processing units can focus solely on the compute tasks they have been given, thereby improving overall performance.

Nvidia said Oracle Cloud Infrastructure has decided to standardize on the BlueField-3 DPU to run its new DGX Cloud platform, which makes AI training available in the cloud. By using BlueField-3, Oracle says it can support large-scale GPU clusters with dynamic and always-on capacity to deliver massive efficiency gains by offloading data center infrastructure tasks.

The next generation of AI chips

Perhaps the most inspiring news is that the new Nvidia platforms are just a taste of things to come, for the company is also working with semiconductor manufacturers such as Taiwan Semiconductor Manufacturing Co., ASML Holding NV and Synopsys Inc. to bring accelerated computing to chip design and manufacturing.

Nvidia announced the availability of its new cuLitho software library for computational lithography, which refers to the use of algorithmic models in the semiconductor manufacturing process. TSMC and Synopsys both said they’re integrating the Nvidia cuLitho software with their manufacturing processes and systems to design and build Nvidia’s most advanced Grace Hopper architecture GPUs, while ASML is planning to do the same.

According to Nvidia, these advances will enable the manufacture of even tinier transistors and wires than are currently possible, while accelerating time to market and boosting energy efficiency. For instance, cuLitho has already shown it can deliver a performance leap of 40 times beyond existing lithography, enabling 500 Nvidia DGX H100 systems to achieve the work of 40,000 CPU systems.

In the near term, Nvidia said, chip fabs that run cuLitho will be able to produce three to five times more photomasks, which are the templates for a chip’s design, using nine times less power than existing systems. Longer-term, cuLitho brings the promise of better designs with more computational power, higher density, greater energy efficiency and higher production yields, Nvidia said. In other words, it’s playing a key role in the development of even more powerful AI hardware to come. Further down the road, cuLitho will enable better design rules, higher density, higher yields and AI-powered lithography for circuit widths of two nanometers or less.

“The chip industry is the foundation of nearly every other industry in the world,” Huang said. “With lithography at the limits of physics, Nvidia’s introduction of cuLitho and collaboration with our partners TSMC, ASML and Synopsys allows fabs to increase throughput, reduce their carbon footprint and set the foundation for 2nm and beyond.”

Images: Nvidia

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

https://siliconangle.com/aws-marketplace/

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.