UPDATED 16:12 EDT / MAY 12 2026

Stephen Watt, vice president and distinguished engineer at Red Hat Inc., talks to theCUBE about the horizontal cloud — Red Hat Summit 2026 AI

AI’s easy on-ramp has become a costly exit problem for enterprises, says Red Hat

As enterprises push AI beyond the pilot stage, the cost and complexity of running inference at scale are forcing a fundamental rethink of how infrastructure is designed, governed and sourced, putting horizontal cloud — one shared foundation for running workloads across the enterprise at the center of AI strategy.

The open hybrid cloud model is emerging as a practical answer to a market that has become dangerously dependent on a small number of frontier model providers, including Anthropic PBC and OpenAI Group PBC. The journey from frontier model convenience to self-managed, cost-efficient inference now sits at the center of enterprise AI strategy, according to Stephen Watt (pictured), vice president and distinguished engineer, Office of the CTO, at Red Hat Inc.

“You’d be crazy today not to start on a frontier model provider, like OpenAI or Anthropic, but then after a while, when you hit a certain scale — like in token economics — you’d be crazy to stay on that,” Watt said. “That’s the dilemma: When you want to leave, what are your options, and how do you navigate [that]?”

Watt spoke with theCUBE’s Rob Strechay and Rebecca Knight at Red Hat Summit 2026, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed how inference routing, agentic AI governance and horizontal cloud architecture are reshaping enterprise AI deployments. (* Disclosure below.)

Horizontal cloud as the AI inference escape hatch

The pressure to move off expensive frontier models is fueling demand for a new class of shared, governed inference infrastructure. As Red Hat AI 3.4 extends model-as-a-service and distributed inferencing capabilities, the industry is converging on the idea that a horizontal cloud platform — one shared layer spanning storage, compute and management — can unlock both efficiency and control. One New Zealand Group Ltd.’s deployment of a horizontal telco cloud platform built on Red Hat OpenShift illustrates the stakes, Watt explained. The operator cut delivery time by 40% and reduced operational costs by 30-45%, collapsing processes that once took weeks or months into days. The key enabler was treating the platform as a shared foundation rather than a collection of isolated pilots.

“Every department’s doing their own experimentation, their own pilots, but everybody’s going about it a different way,” Watt said. “Everybody will emerge from the pilot phase, and there’ll be some shared observations. Once that’s done, central IT can … use that to figure out what platform we buy and drive total cost of ownership and increase efficiency.”

Red Hat’s vLLM Semantic Router project gives organizations a practical mechanism for navigating that transition. Rather than relying on a single large model monolith, the router directs inference requests to purpose-trained open-weight models — one tuned for physics, another for history, for example — based on the nature of each query, improving accuracy while lowering cost, Watt explained.

“You can basically ensure that the inference requests are always going to the highest-performing models,” he said. “That’s one of the core premises of open source — the ability to build a custom solution. We give you all the ingredients to make the best recipe.”

Stay tuned for the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of Red Hat Summit 2026.

(* Disclosure: Red Hat sponsored this segment of theCUBE. Neither Red Hat nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.