UPDATED 16:17 EDT / OCTOBER 29 2025

Stu Miniman, senior director of market insights, hybrid platforms, at Red Hat, and Robert Shaw, director of engineering at Red Hat, talk to theCUBE about Kubernetes optimization as apart of KubeCon NA 2025. AI

Kubernetes optimization drives Red Hat’s next leap in AI performance

Kubernetes optimization is shaping the next wave of artificial intelligence infrastructure, bridging the gap between large-scale intelligence and production-grade reliability.

Enterprises are moving past AI experimentation toward scalable deployment, and Kubernetes has emerged as the foundation enabling that shift. As new workloads such as reasoning and agentic applications push performance limits, Red Hat Inc.’s open-source initiatives show how Kubernetes optimization is evolving to meet the demands of compute-intensive inference. The growing synergy between AI frameworks and container orchestration sets the stage for a new phase of innovation across the cloud-native ecosystem, according to Robert Shaw (pictured, right), director of engineering at Red Hat.

Stu Miniman, senior director of market insights, hybrid platforms, at Red Hat, and Robert Shaw, director of engineering at Red Hat, talk to theCUBE about Kubernetes optimization as apart of KubeCon NA 2025.

Red Hat’s Stu Miniman and Robert Shaw discuss Kubernetes optimization across open-source and hybrid environments.

“Almost all of the deployments of LLMs are coming on top of Kubernetes,” Shaw said. “It really sets the picture for what Kubernetes is best at — these long-lived services, production-quality applications and all the reliability and scalability built for running other workloads.”

Shaw and Stu Miniman (left), senior director of market insights, hybrid platforms, at Red Hat spoke with theCUBE’s Rob Strechay in a preview for the KubeCon + CloudNativeCon NA event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed how Red Hat is advancing Kubernetes optimization to enhance AI inference performance, scalability and efficiency across open-source and hybrid environments. (* Disclosure below.)

Building the future of AI through Kubernetes optimization

As open-weight models proliferate, the need to map them efficiently onto diverse hardware accelerators has become crucial. Red Hat’s collaboration across Nvidia Corp., Advanced Micro Devices Inc.​​, Google LLC and Intel Corp. aims to streamline this process through projects such as vLLM, which bridges open-source models and high-performance accelerators. These integrations form the backbone of next-generation inference environments designed for flexibility and speed, Shaw explained.

“That’s what vLLM is all about. It’s about mapping that whole ecosystem of open-source models onto that whole set of hardware accelerators,” he said. “[It] provides that integration point for all of these key ecosystem participants.”

The next step is scaling beyond individual nodes to entire clusters, Shaw added. Llm-d — a project born from this need — applies distributed optimization to Kubernetes environments. By rethinking how workloads are balanced and specialized, llm-d maximizes throughput across multi-node clusters and enables enterprises to harness AI inference at scale without losing efficiency.

“There’s a lot of work, not just at llm-d, but all the Kubernetes pieces underneath. How do pipelines and GitOps play into this new world? There’s dozens of projects there,” Miniman said. “We know the CNCF has been great at pulling in a lot of these projects to make sure that they work there, because the infrastructure and the applications have to play well together in this fast-moving space.”

Hybrid and edge computing further expand the potential for Kubernetes optimization. With smaller language models running at the edge and reasoning workloads centralized in data centers, performance tuning becomes essential to maintaining responsiveness and cost efficiency. Red Hat’s approach emphasizes the open-source community’s ability to innovate around these distributed AI use cases, according to Shaw.

“Even small language models are … very expensive to generate tokens,” he said. “Our goal with llm-d is to take all of these different performance optimizations, compose them into Kubernetes to make it easy for enterprises and startups to be able to take advantage of those same optimizations when they’re going to deploy in the familiar environment of Kubernetes that they’re used to.”

Beyond performance, Red Hat’s engineering leadership sees Kubernetes optimization as key to achieving long-term sustainability and accessibility for enterprise AI. By integrating projects such as SPIFFE, SPIRE and KServe with OpenShift, the company is aligning security, observability and hybrid orchestration into a unified ecosystem that supports real-world deployments, Miniman noted.

“AI’s changing a lot of these things,” he added. “Not only are there all the AI pieces, but underneath, there’s a lot of changes that need to happen … the workloads are very different, how long they stay, how stateful they are and everything there.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the KubeCon + CloudNativeCon NA event:

(* Disclosure: Red Hat sponsored this segment of theCUBE. Neither Red Hat nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.