AI
AI
AI
In just a few years, computing has undergone a massive shift. What was once a marketplace dominated by general‑purpose servers and monolithic data centers has fractured into a complex ecosystem of specialized accelerators, hyper‑scaled clusters, edge‑enabled devices, large‑scale cloud providers and sovereign‑cloud platforms.
At the center of this transformation stand two sides of the market growth, 1) Nvidia Corp., the incumbent kingpin of merchant graphics processing units, and 2) everyone else, the established semiconductor and infrastructure players.
I’m often asked who will win? Will it be Nvidia or everyone else? The answer is both. The demand for AI applications is fueling an infrastructure renaissance not seen since the 1990s. Now it’s a 100x value enablement that weaves together competitive narratives — and sets the agenda for Dell Technologies World this week in Las Vegas.
Let’s explore the trends powering the “AI factory era,” referring to the large‑scale systems that support training, reasoning and inference at unprecedented scale. I’ll break down the following areas, each critical in this supercycle of transformation:
AI workloads — whether foundation‑scale training or real‑time reasoning — are hungry beasts, devouring FLOPS, memory bandwidth, network interconnects and rack‑scale power. Nvidia’s GPUs have fed that hunger for years, carving out over 90% share of AI system shipments. Their innovative GPUs, the CUDA ecosystem and smart software libraries have made them the default choice for Amazon Web Services Inc., Google LLC, Microsoft Corp. and Meta Platforms Inc. as they built ever‑more ambitious AI Factories.
But this dominance carries risk. Hyperscalers routinely face “GPU rationing” — quarterly quotas, long lead times and price volatility. They’re at the mercy of shifting U.S. export controls and potential Chinese counter‑measures. For companies spending tens of billions on AI capex, the ability to “own their destiny” is a strategic imperative. Many, like AWS, are building — or contemplating — custom chips and system software.
Others are rallying around XPUs: custom AI accelerators co‑designed with hyperscaler partners to speed critical kernels. These ASICs sit alongside Nvidia GPUs in standard PCIe slots, enabling mixed clusters that deliver comparable throughput at 20% to 30% lower capex. More importantly, they provide a credible second source of compute, giving buyers the bargaining power to secure GPU allocations more timely. As one industry executive said, “It’s not just about saving money — it’s about controlling the future of your AI infrastructure.”
The rise of XPUs amplifies a deeper debate. Should AI factories be built on open, composable architectures or closed, tightly integrated stacks?
Today’s AI landscape swings between these extremes. Hyperscalers value open composability despite the integration burden, while enterprises crave turnkey AI solutions but worry about vendor lock‑in. POCs are bleeding into production, yet startups — starved for clarity on where to add value — still struggle to articulate services in a half‑open, half‑closed world.
The vision of an AI factory extends beyond GPUs or XPUs. It spans training, reasoning, and inference — each with unique workload profiles. It demands networking and storage architectures tailored to massive clusters and hardware‑aware orchestration software that places workloads optimally.
The rise of multistep reasoning — on‑the‑fly decision branches and chain‑of‑thought processing — changes everything. A 10 million‑token Q&A can balloon to 100 million tokens once reasoning is enabled, multiplying network traffic and GPU cycles tenfold. Clusters must rethink interconnect design, buffer sizing and bursting performance to support these peaks.
Inference workloads range from high‑throughput, low‑latency cloud requests to resource‑constrained real‑time edge deployments. Use cases — agentic chatbots, generative recommendation engines, digital twins — often demand sub‑10-millisecond response times. Network operators envision AI‑powered network slices in cell towers, while embedded devices — from autonomous drones to industrial robots — require specialized ASICs and chip‑plus‑FPGA hybrids.
The optimal AI factory spans central cloud clusters for training, regional on‑premises clusters for inference, and edge micro‑clusters for low‑latency tasks. Hardware configurations — GPU/XPU ratios, NIC speeds, NVMe topologies — must adapt fluidly to workload demands.
A broad mix of systems and technologies is essential, covering general‑purpose AI deployments and highly specialized vertical workloads. Although enterprise AI budgets have stalled, they’re beginning to rebound. A breakout is expected in 2026 as organizations seek efficiency and ROI. Meanwhile, industrial and automotive sectors highlight supply‑chain risks and uncertainties that diversified portfolios mitigate.
Vendors with broad portfolios of chips, software and services are better insulated against downturns. Not all vendors will win with a single SKU; diversification is key.
Enterprises who are slower and sometimes hesitant to invest in AI at scale are quietly ramping up budgets. By 2026, corporate AI capex is expected to accelerate to high levels driven by:
Early examples include financial services piloting real‑time credit scoring clusters, manufacturers embedding AI‑driven defect detection, and retailers personalizing shopping experiences with on‑prem inference.
At Mobile World Congress 2025 in Barcelona, conversations on theCUBE with more than 30 enterprises and operators revealed a consensus: Sovereign‑cloud AI is real and urgent. National regulations mandate that sensitive data and model execution stay within domestic borders. Operators want turnkey, on‑country AI clouds that rival AWS, Azure and Google Cloud but remain under local control. They can deploy inference nodes within regulated borders, integrate them into national 5G cores, and offer AI‑as‑a‑service without offloading data to foreign clouds.
Edge AI embeds intelligence in the physical world — smart cameras that detect safety hazards, autonomous logistics robots and digital twins monitoring factory floors. New AI‑enabled chipsets will power these devices, blurring operational technology and information technology boundaries.
By 2026, tens of millions of edge nodes will form a distributed AI environment that complements centralized cloud systems. Physical AI and robotics will become major vectors in AI infrastructure.
Every month we are seeing major leaps in algorithmic efficiency, fueling demand via new reasoning workloads. Traditional Q&A might consume 10 million tokens; multistep chain‑of‑thought reasoning blasts that to 100 million tokens. Real‑time agents consulting knowledge graphs add more compute layers, driving ever‑larger clusters.
This self‑reinforcing cycle cements AI factories’ centrality in computing and rewards vendors with unified, end‑to‑end platforms.
Vendors that master AI Factory solutions stand to reap windfalls. Hyperscalers committed over $300 billion in capex for 2025, two‑thirds of which will support AI compute and networking. Enterprises, now guided by chief AI officers, are poised to unlock new capex waves for AI‑native infrastructure.
No discussion of global AI infrastructure is complete without U.S.‑China tensions. In April 2025, proposed semiconductor tariffs threatened 10% to 25% duties on key components. Manufacturers are shifting export/import duties onto customers and relocating footprints from China and Vietnam to Mexico and Canada. Agile logistics — turning geopolitical risk into a competitive moat — will cement customer trust in an uncertain world.
As we enter the heart of the AI Factory era, several innovation curves demand attention:
We stand at the dawn of a new computing era. AI factories will drive the next wave of productivity, automation, and innovation across every industry. Established and emerging players are racing to fill niches in the AI factory supply chain.
For enterprises and hyperscalers alike, the message is clear: The accelerator, network, cloud and supply‑chain choices you make today will determine your competitive posture for the decade to come. The AI factory era starts now.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.