UPDATED 11:00 EST / DECEMBER 02 2025

AI

Exclusive: AWS CEO Matt Garman declares a new era: Agents are the new cloud

After 13 years covering Amazon Web Services Inc. up close — including watching more than a decade of reinvention from the front row — I’ve learned to feel when the ground is shifting. And this year, that feeling is unmistakable.

AWS re:Invent 2025 opens with a charge in the air as more than 60,000 attendees descend on Las Vegas — developers, founders, Fortune 100 chief information officers, national security leaders and hyperscale cloud architects. But as the headlines break, most people will still be left wondering what’s really moving under the surface.

That deeper story came through in my exclusive recent conversations in Seattle with AWS Chief Executive Matt Garman, Chief Marketing Officer Julie White and other senior AWS leaders. Combined with SiliconANGLE and theCUBE reporting and data from our wired community insiders, a clearer picture emerges: AWS is declaring the true arrival of agentic artificial intelligence — and rebuilding the cloud from the silicon up.

The agent era begins

While most of the world remains fixated on generative AI, AWS is already moving past it in what it describes as a real, enterprise-oriented way — not just buzzwords. Garman was blunt: “The next 80% to 90% of enterprise AI value will come from agents,” he told me.

These aren’t chatbots or copilots. They are autonomous, long-running, massively scalable digital workers — agents that operate for hours or days, learn organizational preferences and collaborate in swarms.

This vision is anchored in a full-stack architecture: custom silicon, new model families, sovereign AI factories and an agent runtime built to eliminate the heavy lifting that has slowed enterprise adoption. AWS believes the agent era will be every bit as transformative as the cloud era was in 2006 — and they’re engineering for that scale today.

AI Factories: The new sovereign footprint

One of the clearest signals of this shift is AWS’ formal embrace of AI Factories, its new sovereign-scale infrastructure footprint — a concept I’ve been talking about on theCUBE Pod for over a year. In those conversations, I argued that AI would evolve beyond traditional cloud regions and edge stacks into campus-scale, high-density AI systems purpose-built to turn enterprise data estates into continuous intelligence engines. Garman and AWS essentially validated that view, describing AI Factories as “highly opinionated AWS-managed AI systems” deployed directly inside customer data centers.

These are not edge appliances or Outposts-style racks. They are full-blown AI campuses — the same architectural pattern behind Project Rainier, the 500,000-Trainium2 build with Anthropic. As Garman put it: ‘The campus is the new computer.”

For a select group of customers — sovereign nations, defense agencies and hyperscale enterprises — AI Factories deliver cloud-consistent services entirely on their own turf. Everyone else consumes the same architecture from AWS regions. It’s exactly the industrialization trend I’ve been forecasting: AI isn’t just a service anymore — it’s an infrastructure category, and AWS is now manufacturing it at global scale.

Who actually buys factories

Garman is also straight up  about the addressable market for these massive builds. As he told me, “99.999% of customers will never purchase an AI factory.” In my conversations with Fortune 100 CIOs and chief technology officers over the past year, this has been the recurring theme: Enterprises aren’t struggling because they lack giant infrastructure — they’re struggling because they lack practical, production-ready paths to adopt AI at scale. The bottlenecks have been governance, identity, security, data quality, model drift and the sheer operational burden of stitching together immature tools. I’ve said repeatedly on theCUBE that the enterprise AI stall wasn’t about GPU scarcity — it was about plumbing scarcity.

Garman’s point reinforces that: The vast majority of enterprises don’t need sovereign-scale clusters; what they need is cloud-delivered factory patterns that abstract away complexity and let them plug into hardened infrastructure without reinventing it. Only a very small cohort — sovereign nations, U.S. government and sensitive agencies, and a handful of the largest global enterprises — requires fully dedicated on-premises AI Factories. For everyone else, the “factory” shows up through regional AWS services, Trainium-powered clusters, Bedrock, Nova Forge and AgentCore. And that’s exactly what enterprises have been asking for: the ability to access industrial-grade AI without industrial-grade buildouts.

Silicon as strategy: Trainium’s expansion

At the silicon layer, AWS has tightened its vertical integration. Trainium3 is now generally available, packaged as a two-rack “Ultra Server” that Garman called “the most powerful AI system available today.” It’s optimized for both training and for the increasingly heavy inference loads enterprises are putting into production.

Garman was matter-of-fact about the impact: More than 50% of Bedrock tokens already run on Trainium — a meaningful performance and cost moat for AWS.

Then came the next reveal: Trainium4, preannounced with an expected an eightfold compute increase over Trainium3 and significantly higher memory bandwidth. Combining it with 3.8 gigawatts of new data-center power added over the past year, AWS is signaling that it intends to dominate the cost-performance race for frontier inference.

Why centralization wins — and what the edge still does

On the edge-versus-cloud debate, Garman’s view is decisive: Heavy intelligence centralizes to the cloud.

The edge is compute- and power-constrained and best suited for lightweight inference (think wake-word detection on Alexa). The data breadth, model variety and power required for real capability live in the cloud. The emerging pattern, he says, looks like an application programming interface call to smarter, larger systems, not full frontier models at the edge.

The Nova model family and the push to frontier performance

Silicon is only half the story. AWS also unveiled the Nova 2 model family — Lite, Pro and higher-performing variants — covering high-volume reasoning, real-time speech-to-speech and other demanding workloads. Early benchmarks, according to Garman, place them head-to-head with Claude 3.5, GPT-4.5 and Gemini Flash.

Nova Forge: The new training substrate

The deeper breakthrough is Nova Forge, a system Garman described as the first true open-training pipeline for enterprise frontier models.

Fine-tuning has been the ceiling for most enterprises. Forge blows past that limit by letting companies inject proprietary data early in the training process — at checkpoints inside Nova — performing co-training alongside Amazon’s curated datasets.

Private domain-specific AI for business

Customers insert their proprietary data, train within their own VPC, and can run the resulting model serverlessly on Bedrock — without the model or customer data flowing back to the Nova team. The output is a private, frontier-grade model that deeply understands the company’s data and never leaves their boundary.

Garman’s blunt view: “Generic tokens are useless unless they know your business.”

Domain intelligence at scale: Reddit’s example

Reddit is already demonstrating the impact. By bringing its own domain data into Nova Forge’s pretraining process —w ithout additional fine-tuning — Reddit achieved a kind of “social intuition” generic systems miss: It reads context, reduces false positives, flags real threats and scales to millions of communities without scaling engineering complexity.

Banks, pharma giants and large manufacturers are lining up for the same capability. The economics are compelling: Instead of spending $50 million to $100 million to train a frontier model from scratch, enterprises can create a domain-specific frontier model for a small fraction of that — and even distill smaller models from it. In short, Forge delivers frontier capability without frontier cost.

AgentCore: The runtime for the agent era

If Nova Forge is the new model substrate, AgentCore is the new runtime. White explained that it solves the biggest enterprise blocker of the past year: Teams were spending months reinventing repetitive foundational systems — identity, policy, security, memory, observability, drift detection — just to make early agents safe and deployable.

AgentCore is composable. Teams can mix and match secure compute, memory and Agent Observability, and pair them with models from Nova, Anthropic, OpenAI, Meta Platforms’ Llama, Qwen or Google Gemini — or open source — without re-platforming. It’s the identity-and-access-management moment for agents, moving from prototypes to production in regulated workflows and mission-critical operations.

Kiro: Where agentic development takes shape

AWS’ Kiro — the integrated development environment and the command-line interface — helped pioneer agentic development: not just code completion or “vibe coding,” but directing agents to perform long-running tasks and collaborate as a swarm. At re:Invent, AWS introduced the Kiro Frontier Agent — a long-running agent that works like a team of junior engineers operating overnight, with instructions and autonomy, and the ability to scale out.

Frontier agents: Work that learns you — across teams

With substrate and runtime in place, AWS introduced Frontier Agents — long-running, autonomous agents that can work for hours or weeks, scale both up and out, and learn team preferences over time.

Beyond engineering, the first wave includes cloud operations, security and penetration-testing agents — systems that triage incidents, probe defenses and enforce policy in production.

“Three to six months in,” Garman told me, “these agents behave like part of your team. They know your naming conventions, your repos, your patterns.” The constraint is no longer developer hours — it’s imagination and prioritization.

The AWS full AI stack

Stepping back, the pieces form AWS’ most coherent AI platform strategy yet. The flow works like this:

Infrastructure → silicon → models → custom frontier training → agent runtime → frontier agents → enterprise workflows

This is not a bundle of features. It is a systematic rearchitecture of the cloud for a world where billions of agents operate across industries.

AWS is leaning heavily into capital spending: silicon, power, networking, sovereign footprints and global supply chain scale. This is a multiyear expansion cycle, not a bubble.

ROI, not a bubble

If you cover this industry long enough, you learn that bubbles form when capital chases stories, not systems. What’s happening now is the opposite. The spend is going into power, silicon, networking ,intelligent interconnections, sovereign large scale computing footprints, fast-emerging smart AI edge factories and full-stack platforms that unlock measurable productivity. This massive innovation wave is rooted in infrastructure, not narratives, and the companies leaning in understand that AI is no longer optional — it’s a societal revolution that can yield competitive advantage with compounding returns.

On talk of an “AI bubble,” Garman shrugs. Customers buy where ROI shows up in the P&L. With 3.8 gigawatts of new data-center power landed in the past year and demand “across the stack,” he’s betting enterprise value — not hype — sustains the buildout.

From 2006 to 2025: The next paradigm

In 2006, AWS changed the world by abstracting servers and letting developers build without friction. Today’s shift is analogous — only larger. 

The cloud era abstracted infrastructure. 

The agent era abstracts work.

With AI Factories, Trainium, Nova Forge, AgentCore, Kiro and Frontier Agents, AWS is rebuilding the cloud around agentic systems that can learn, reason and act. While other outlets will chase this week’s headlines, our reporting makes one thing clear: AWS is not chasing the AI trend. It is industrializing it.

For enterprises, startups and global infrastructure builders, this marks the beginning of a new paradigm — where every company can have its own frontier model, its own fleet of agents and its own AI-powered future.

Photo: Robert Hof/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.