UPDATED 16:30 EDT / MARCH 16 2026

Nvidia introduces platform for large-scale AI training and inference

Nvidia Corp. today stoked the fires of the emerging artificial intelligence factory trend with the announcement of Dynamo 1.0, an open-source platform the company is positioning as an essential software layer for large-scale AI deployments.

The announcement at the company’s GPU Technology Conference in San Jose is aimed at one of the most daunting problems in enterprise AI: how to run increasingly complex generative and agentic workloads efficiently at large scale.

Nvidia said that the economics of inference are becoming as important as raw model performance. The company sees a rapidly expanding market for software that can manage growing AI complexity, said Ian Buck, vice president of hyperscale and high-performance computing.

“As we move up the complexity scale, so does the value and the capability of the AI and the dollar per million tokens,” he said. “Software stacks like Dynamo provide an uplift for models on Vera Rubin NVL72 and achieve 10 times the throughput per watt, or one-10th the token cost.”

Vera Rubin NVL72 is a new rack-scale AI supercomputer platform that Nvidia announced in January. It’s designed to handle massive-scale AI training and inference.

Platforms like Dynamo are critical to Nvidia’s efforts to expand beyond chips, servers and networking into becoming a supplier of the operating software needed to orchestrate AI infrastructure across entire data centers. Dynamo can be used for generative and agentic inference at scale and integrates with a wide range of popular inference and orchestration frameworks.

Open-sourcing Dynamo is an example of Nvidia “extending its moat upward,” said Chirag Dekate, an analyst specializing in agentic and AI infrastructures, AI cloud and quantum computing at Gartner Inc.

“Inference is becoming a software orchestration problem, so whoever manages routing, caching and scheduling at scale will shape the economics of AI,” he said. “By open-sourcing Dynamo, Nvidia is making a classic standards play: lower adoption friction, attract ecosystem partners and turn its preferred runtime model into the market’s default operating model.”

The rise of agentic AI has created new complexity and demands on infrastructure and software because new models interact not just with people but with each other at speeds that are far beyond those needed for human interactions.

Nvidia calls these agentic demands the “fourth scaling law” beyond pretraining, post-training and test-time scaling. “A place where agentic is talking not just to humans, but to other AIs, increases the demand for low latency and large context inference at scale,” Buck said.

That shift is pushing infrastructure requirements beyond simple chatbot workloads. Buck said agentic models “need to deliver tokens at 15 times faster and with 10 times larger models.” He said the current 100 billion-parameter models will soon expand to 10 trillion-parameter systems processing 1,500 tokens per second.

Gartner’s Dekate noted that Dynamo focuses on maximizing the utilization of GPU fleets to improve utilization. “Emerging reasoning models, multimodal workloads and agentic systems are making inference much more distributed, latency-sensitive and cost-sensitive,” he said. Dynamo’s planner monitors prefill and decode activity and reallocates GPU resources, while the smart router is KV-cache-aware, allowing re-computation to be minimized.

Dynamo also fits into Nvidia’s broader software stack for AI agents, which was announced today. The new Agent Toolkit is a package of “open models, runtimes, and blueprints for building, evaluating, and optimizing safer, long-running autonomous agents,” said Kari Briski, senior vice president of generative AI software. The toolkit includes Nvidia Inference Microservices for model inference, and Dynamo for production at large scale.

Image: Nvidia

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Nvidia introduces platform for large-scale AI training and inference

Image: Nvidia

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Nvidia introduces platform for large-scale AI training and inference

Image: Nvidia

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Cookies