UPDATED 12:10 EDT / MARCH 12 2026

How the AI stack is evolving as enterprises scale deployments — from GPU clusters to integrated AI infrastructure and factories led by Nvidia.

From GPU clusters to AI factories: The next phase of AI infrastructure heading into Nvidia GTC

As organizations move from pilot projects to production systems, the AI stack continues to evolve.

Companies are starting to see AI transition from experimentation to operational scale, growing beyond the simple GPU clusters of its infancy. These changes are ultimately forcing enterprises to monitor network performance more closely than ever. At the center of many of these developments is Nvidia Corp., which continues to advance the technologies organizations need to maximize efficiency in this area.

“Nvidia’s advantage is widening as the company turns silicon, networking and software into an integrated production system for intelligence,” said Dave Vellante, chief analyst at theCUBE Research.

Through its advancements in CPUs, GPUs and networking and software integration, Nvidia is leading the charge in helping businesses keep up with AI and the operational complexity that accompanies large-scale deployments. At its annual GTC event in San Jose, California, beginning March 16, the company will share its vision for the AI stack of the future, and it may go well beyond chips and other hardware.

This feature is part of SiliconANGLE Media’s exploration of the evolving AI infrastructure stack. (* Disclosure below.)

Redesigning the AI stack for AI factories

As the AI stack matures, infrastructure is being redesigned around throughput, efficiency and coordination across multiple layers of the system. Nvidia is increasingly positioning its platform not simply as a collection of chips, but as an integrated architecture that connects compute, memory, networking and software into a unified environment.

This architectural shift reflects a broader change in how organizations approach AI deployment. Rather than focusing solely on model development, enterprises are now grappling with the operational challenge of delivering AI services reliably and economically at scale.

“Nvidia is no longer shipping chips,” Vellante said. “It is delivering tightly integrated systems engineered to maximize throughput, utilization and economic efficiency at the scale required for AI factories.”

As AI systems move into production environments, several components of the stack are becoming increasingly critical. Networking fabrics, orchestration frameworks and automated infrastructure management are emerging as key enablers of large-scale AI deployments. Power consumption and energy efficiency are also rising to the forefront of infrastructure planning. As GPU clusters expand into large-scale AI factories, organizations must manage increasingly complex power distribution and cooling requirements.

“Traditional Ethernet was never built for the ultra-low latency and predictable performance that AI workloads demand,” said Paul Nashawaty, principal analyst at theCUBE Research. “Standard switching fabrics introduce jitter and congestion that can cripple multi-node training jobs or distributed inference pipelines.”

A growing ecosystem of technology partners is helping enterprises address these emerging infrastructure constraints. Companies across the AI stack — from storage platforms to networking and power management providers — are aligning their technologies with Nvidia’s architecture to improve performance and operational efficiency.

Texas Instruments Inc., for example, has collaborated with Nvidia on technologies that support power management and sensing capabilities in next-generation data center infrastructure. As AI systems scale, innovations such as high-voltage direct current power distribution are becoming increasingly important for improving efficiency and reliability in large GPU environments.

Storage architecture is also evolving as organizations seek to feed increasingly large AI models with massive volumes of data. WekaIO Inc. has integrated Nvidia technologies — including high-performance networking components such as the Nvidia ConnectX-8 SuperNIC — into its WEKApod Nitro platform to accelerate data movement and simplify AI infrastructure deployment.

Advances in flash storage are similarly playing a role in improving AI system performance. Solidigm Inc. has been working with Nvidia’s Magnum IO architecture to optimize data movement between GPUs and storage systems, enabling faster access to the datasets required for large-scale training and inference workloads.

Meanwhile, the growing importance of vector search and retrieval pipelines is driving collaboration between Nvidia and search platform provider Elastic N.V. Elastic has developed integrations designed to accelerate vector search indexing and query performance within Elasticsearch, helping organizations extract insights from increasingly large datasets used in AI applications.

As organizations begin to see the AI stack expand in both prominence and use, concerns surrounding risk, transparency and governance are on the rise. Assurance practices are now at the head of the discussion, and enterprises are paying particular attention to service providers that can help — especially for working outside of the cloud.

“Rather than abandoning on-premises or colocation strategies in favor of hyperscale public cloud, distributed AI infrastructure could enable hybrid architectures that span owned facilities and partner data centers,” Nashawaty said.

The growing importance of assurance practices

As AI infrastructure expands across cloud, data center and edge environments, governance and risk management are becoming central considerations for enterprise deployments. Organizations must ensure that AI systems operate within regulatory, security and ethical boundaries while still delivering operational efficiency. That challenge is prompting many companies to explore new approaches to assurance, compliance and AI governance frameworks.

“Cyber resiliency has become the prerequisite to build any meaningful AI infrastructure and sits squarely at the confluence of data governance, data protection and AI,” said Christophe Bertrand, principal analyst at theCUBE Research. “A cyber resilient infrastructure is one of the foundations to AI you can trust.”

Professional services organizations are increasingly developing platforms designed to address those governance challenges. Ernst & Young Global Ltd., for example, has introduced the EY.ai Agentic Platform, which integrates domain expertise with Nvidia’s AI stack and reasoning models to help enterprises manage compliance and oversight requirements. The company has also introduced a portfolio of governance-focused tools under the EY.ai for Risk initiative, designed to help organizations strengthen internal controls and risk management processes as AI adoption accelerates.

As AI deployments expand beyond centralized data centers, many enterprises are also exploring edge-based architectures that bring inference capabilities closer to where data is generated. Edge infrastructure platform provider Zededa Inc. is working with Nvidia technologies, such as the TAO Toolkit and the Nvidia NGC catalog, to help organizations deploy and manage distributed AI workloads across large fleets of edge devices. These platforms enable enterprises to remotely deploy, update and orchestrate applications across multiple nodes — reducing operational overhead while supporting scalable AI deployments.

This distributed approach is contributing to the emergence of what some analysts describe as “mini AI factories” — interconnected clusters of computing resources operating closer to the edge of the network.

“AI infrastructure economics are now defined at the rack and factory level, not at the chip level,” Vellante said. “Nvidia’s advantage lies in designing systems where compute, memory, networking and software operate as a single, tightly coordinated machine. That is where throughput is maximized, token economics are transformed and the next phase of AI factory value is being created.”

These evolving architectures are likely to shape many of the discussions at Nvidia’s upcoming GTC event. As enterprises continue to expand their AI capabilities, the conference has become a key venue for examining how infrastructure, software and operational models are converging to support large-scale AI deployment.

“As amazing as Nvidia’s progress has been, I think observers continue to underestimate the potential of the company and its ecosystem,” Vellante added. “We’re seeing a massive shift in computing architectures take place in real time, powered by AI factories. GTC has become the most important conference in the tech industry and is a must-attend event to learn about what’s next.”

As the AI industry moves deeper into the production phase, the systems required to support that transformation are becoming more complex — and more integrated. GTC 2026 is expected to provide a window into how the next generation of AI infrastructure will be designed, deployed and scaled across enterprises worldwide.

(* Disclosure: TheCUBE is a media partner for Nvidia GTC. Sponsors of theCUBE’s coverage do not have editorial control over content on theCUBE or SiliconANGLE.)

Image: SiliconANGLE/ChatGPT

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

From GPU clusters to AI factories: The next phase of AI infrastructure heading into Nvidia GTC

Redesigning the AI stack for AI factories

The growing importance of assurance practices

Image: SiliconANGLE/ChatGPT

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

From GPU clusters to AI factories: The next phase of AI infrastructure heading into Nvidia GTC

Redesigning the AI stack for AI factories

The growing importance of assurance practices

Image: SiliconANGLE/ChatGPT

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Vast Forward 2026

CES 2026

AWS re:Invent 2025

Cookies