AI
AI
AI
The recent KubeCon + CloudNativeCon Europe event did not read like a celebration so much as an admission. AI is everywhere, but the systems underneath it are strained — not by models, but by layout. The data and machines that make AI are spread across clouds, edge sites and on-prem environments that never agreed on how to behave as one system — the core challenge driving the Kubernetes control plane for AI.
New research shows that the majority of AI initiatives fail to reach production, with most breakdowns caused by integration and operational execution challenges rather than model performance.
Paul Nashawaty, principal analyst at theCUBE Research, describes the structural realities at the heart of the issue: “AI is exposing a fundamental flaw in enterprise infrastructure; it was never designed to operate as a unified system. What KubeCon EU makes clear is that fragmentation across cloud, edge and on-prem is now the primary barrier to production AI.”
That fragmentation now gets a name: sovereignty. Systems inherit policy, enterprise or regional borders. Those boundaries constrain where data and workloads can run, forcing AI systems to operate across distributed environments rather than a single unified stack.
“Maybe the finance BU has their Llama model; maybe the accounting BU has an OpenAI model,” said Mike Barrett, vice president and general manager of Red Hat Hybrid Platforms at Red Hat Inc., who spoke with theCUBE, SiliconANGLE Media’s livestreaming studio, during KubeCon EU. “What is the most cost-effective way to surface the intelligence that you want back? Because of that, [Red Hat’s enterprise customers] are looking for a horizontal platform.”
Red Hat, the poster child of Kubernetes in the enterprise, aims to rein in fragmentation with a Kubernetes control plane for AI workloads across all environments.
This feature is part of SiliconANGLE Media’s exploration of how enterprises are building the control plane for AI, with Red Hat playing a central role in shaping that approach. (* Disclosure below.)
Kubernetes was never designed for AI inference. It schedules containers. It does not guarantee consistency across regions. That gap becomes visible when inference workloads move into production.
“These models are doing an amount of compute that’s hard to fathom, but when I talk to users of llm-d [an open-source, Red Hat-led Kubernetes-native inference framework hosted by CNCF, designed to scale distributed LLM workloads across clusters], they’re not only trying to build a state-of-the-art performance system, they’re also trying to do these day-two operations,” said Robert Shaw (pictured, left), director of engineering at Red Hat.
That “day-two” problem is where AI systems often break — not in training, but in runtime behavior, latency swings, resource contention and policy drift. Red Hat AI Enterprise seeks to operationalize and accelerate agentic AI and production inference with a unified, “metal-to-agent” solution, according to Jan Melen, governing board vice chair at the Cloud Native Computing Foundation, which hosts Kubernetes and runs the KubeCon + CloudNativeCon conferences.
“Cloud-native exists because of global open-source collaboration model,” Melen said during the KubeCon EU keynote. “Thousands of contributors from every region building shared infrastructure together.”
The implication is not subtle. AI is pushing systems built on global consistency into environments defined by fragmentation.
“Agentic AI isn’t a model problem — it’s a platform architecture problem,” said Rob Strechay, principal analyst at theCUBE Research. “The enterprises that win won’t pick better models; they’ll build better infrastructure to run them.”
Kubernetes becomes less about orchestration and more about enforcing behavioral consistency across fractured environments, Strechay pointed out.
While Kubernetes can unify control, it can’t assume every team can operate that control directly. Enterprise adoption collapses when complexity is exposed raw.
“What we realized is that AI is being developed by data scientists, and as part of that, they’re building their own infrastructure to run it on,” said Brian Stevens (pictured, right), senior VP and chief technology officer for AI at Red Hat.
That gap between builders and operators is where platform engineering enters, according to Strechay.
“Fragmented tooling, skill gaps and operational complexity are becoming the real bottlenecks, driving a shift toward platform engineering and Kubernetes as a unifying control plane,” he explained.
The system stabilizes only when Kubernetes stops being exposed directly and becomes mediated through platforms that reduce friction. Red Hat OpenShift AI sits in that role, abstracting operational complexity into repeatable patterns, with model training, deployment, serving and inference for hybrid environments.
Enterprises do not modernize everything at once. Billing systems and databases tend to stay where they are. Basically, risk keeps legacy systems alive.
Research shows that 84% of IT decision-makers report difficulty managing separate VM and container environments, with siloed tools and fragmented operations driving inefficiency across hybrid infrastructure. If those VMs stay outside Kubernetes, the system stays split. But what if virtualization is brought into Kubernetes?
“We think virtualization and containers should not live in silos; they should be on one platform — and KubeVirt makes that happen,” said Daniel Messer, senior manager of product management at Red Hat.
KubeVirt, a project now maturing at CNCF, extends Kubernetes into virtualization, allowing VMs and containers to share the same control plane.
“Graduating for us makes it more obvious for people that [KubeVirt] is deeply embedded in the Kubernetes ecosystem [and] the CNCF ecosystem,” added Andrew Burden, KubeVirt maintainer.
The direction is consolidation of operational surfaces, not elimination of legacy systems.
Sovereign AI often looks like a solution, but it also imposes constraints. Laws block data from moving across borders. Policy blocks centralization. Enterprises split workloads across clouds, on-prem and edge environments whether their architecture is ready or not, according to Gabriele Bartolini of EnterpriseDB, who reframed the underlying principle in a recent interview with The New Stack.
“True sovereignty starts with the database,” Bartolini said. “If your PostgreSQL isn’t portable across environments, you don’t really control your stack.”
And he warns against assuming managed convenience equals control: “Convenience is the cloud’s biggest shortcut, but convenience isn’t sovereignty. Real control means you can move your database anywhere and it behaves the same.”
Jan Melen’s keynote draws a hard line inside the sovereignty debate: “We should separate code sovereignty from deployment sovereignty. The code itself remains global commons, shared, open, collaboratively developed.”
Deployment is where sovereignty bites. That is where law and policy decide where workloads can actually run, and under what conditions. The split is what Kubernetes tries to operationalize: global code, local execution.
No vendor can cover AI infrastructure alone. A Kubernetes control plane for AI only works if it spans systems instead of replacing them. That puts the burden on the ecosystem — the shared standards, APIs and upstream projects that let different tools operate as one system.
Nashawaty points to Red Hat’s role inside that upstream layer: “Red Hat’s influence extends well beyond its commercial platform. The company has long been one of the most active contributors to the Cloud Native Computing Foundation ecosystem.”
That upstream work is not cosmetic. It is what keeps Kubernetes consistent across vendors. Without it, every distribution drifts and the control plane fractures into competing implementations.
Aside from contributing to open-source projects, Red Hat is partnering with companies for scalable enterprise AI. Notably, Red Hat AI Factory with Nvidia focuses on building, deploying and scaling AI infrastructure using Red Hat OpenShift and Nvidia accelerated computing for high-performance AI workloads.
“When as many as 75% of enterprises report double-digit AI failure rates tied to fragmented systems, it’s clear the bottleneck has shifted to infrastructure,” said Nashawaty, underscoring the cost when upstream infrastructure coordination breaks down, especially when it comes to AI.
That failure rate is not about missing features. It reflects systems that cannot operate together. Ecosystems prevent Kubernetes from collapsing into another silo — the exact outcome it is supposed to avoid.
AI does not collapse infrastructure in one place. It stresses every seam at once. Kubernetes becomes the layer that attempts to hold those seams together.
Stevens described the shift toward consolidating fragmented systems onto a single platform: “It’s a very powerful concept to reapply that — and also consolidate it on the same platform with fewer vendors and less attack surface for changes in different learning curves, which I think has been the power of Kubernetes all along.”
That consolidation only works if the ecosystem holds. Melen underscores what happens if it doesn’t: “If sovereignty leads to fragmentation, we risk undermining the trillions of dollars of value that open source has already brought globally.”
The system does not become simpler, but it becomes governable. Kubernetes isn’t the tool of choice because it is perfect. The industry chooses it because fragmentation leaves no alternative coordination layer that scales.
Red Hat’s bet is that abstraction through Kubernetes is the only viable way to keep AI operational across disparate worlds.
(* Disclosure: TheCUBE is a paid media partner for the KubeCon + CloudNativeCon NA event. Neither Red Hat Inc., the primary sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.