

There was a time when Kubernetes was mostly a curiosity — a developer tool built in the wilds of open source to solve problems not everyone knew they had yet. But a little more than a decade later, the open-source project has become the backbone of cloud-native computing and a cornerstone of how artificial intelligence workloads are deployed and scaled across enterprise environments.
As Kubernetes evolves into its second decade, Google LLC remains a central player in the container ecosystem it helped create. The company continues to expand the capabilities of Google Kubernetes Engine and Google Cloud Run to meet the demand for scalable, AI-powered infrastructure. Meanwhile, the broader Kubernetes community is refining the project’s flexibility and maturity, paving the way for enterprise innovation across industries.
TheCUBE Research’s Savannah Peterson shares her take on the “new stage of maturity” in IT.
“I think we’re reaching a new stage of maturity within the ecosystem as well,” theCUBE Research’s Savannah Peterson said. “It’s a lot less hype. Kubernetes is actually being deployed. I think the AI stack is actually driving a bit of that as well. I think we’re at a place where this isn’t just a project. People aren’t thinking about it; we’re actually implementing and seeing what that looks like.”
The experimental era of Kubernetes has given way to enterprise-scale deployment, and the implications are broader than infrastructure. As organizations operationalize AI, the Kubernetes ecosystem is being recast in real time, with Google Cloud helping lead a shift in what containerization can enable across industries.
This feature is part of the “Google Cloud: Passport to Containers” interview series, which explores how businesses use AI and containers to scale efficiently in the cloud. (* Disclosure below.)
As AI development accelerates, GKE has become essential scaffolding for training, serving and scaling machine learning models. Developers need infrastructure that can handle massive data loads, model versioning and compute-intensive tasks, all while staying flexible across dev, test and production. GKE combines the portability of containers with the orchestration muscle of Kubernetes, allowing teams to iterate quickly and serve models at scale, according to Brandon Royal (pictured, right), product manager of AI infrastructure at Google Cloud, and Bobby Allen (left), cloud therapist at Google.
“It could be training a very small model with a very specific set of information, or it could be all the way up to very large language models that are doing incredible text encoding, text generation or even image models,” Royal told theCUBE during an exclusive interview.
Google Cloud’s Bobby Allen and Brandon Royal talk with theCUBE about cloud-native AI technologies as a paradigm shift for data-driven businesses.
Inference is now just as critical as training, Royal added. With open-source models and pre-trained intelligence readily available, developers can deploy capabilities via application programming interface endpoints instead of retraining from scratch. This shift makes integration smoother, enabling faster application development without reinventing the wheel.
“A model is only valuable until we can put it behind an API and make it available to do something interesting,” Royal said. “That’s really where the fun and interesting stuff happens. Inference is becoming more and more critical to businesses that are looking at deploying AI models in their platforms.”
GKE also reduces the complexity of building and integrating AI systems by providing containerized environments that slot into existing app stacks and scale on demand. That flexibility is widening access to advanced capabilities, even for teams without deep machine learning experience, according to Allen.
“It’s futuristic, but it’s also bleeding into everything,” Allen told theCUBE. “I think people can feel the pace speeding up every day.”
The shift from monolithic servers to virtualized infrastructure marked the beginning of modern cloud architecture, but the real leap came with containers. As Docker pushed containerization into the mainstream, developers gained a way to package code and dependencies into portable units that sidestepped the platform conflicts of traditional environments. That shift redefined how teams build, test and ship software, according to Spencer Bischof, product manager of GKE at Google, and Gari Singh, product manager of Google Cloud at Google.
“If you start thinking about source containers from that development perspective, you can package up your entire app and all its dependencies independent of the host operating system,” Singh told theCUBE. “Containers have been around for a long time, but Docker popularized them by making them a lot easier to use.”
Google Cloud’s Spencer Bischof and Gari Singh talk with theCUBE about GKE and the evolution of containers.
That simplification paved the way for Kubernetes to emerge as the orchestration standard for modern infrastructure. But it was Google’s work on GKE that helped scale the system for enterprise use. The idea of mini servers running in virtualized environments unlocked a new level of efficiency, according to Bischof.
“Traditionally in the past, we’d have things like large servers; then it matured, and we started virtualizing those machines because no one has all the space to have one single server,” he told theCUBE. “A couple of folks at Google, Red Hat and others said, ‘What happens if we made something smaller, compact and we could stuff thousands of these containers, mini servers, into a virtual environment?’ That’s what a container is.”
As Kubernetes adoption grew, so did the need for smarter defaults and easier onramps. Instead of forcing developers to configure every detail manually, Google introduced tools such as GKE Autopilot and compute classes to abstract away the infrastructure heavy lifting, according to Bischof.
“If you just want to get started with Kubernetes, something that’s based on Kubernetes, go start there,” he said. “Now, you’re not necessarily sure how you want to spin up a GKE cluster — we have complete walkthroughs and guides. Just follow the best practice built in using something like Autopilot. You don’t need to worry about understanding how the networking works, because of the complexity of the systems and the complexity of the storage.”
As AI adoption accelerates, the cost of inference — not training — has become the biggest challenge for organizations looking to scale. Google Cloud Run offers a flexible solution, combining container portability with serverless pricing and on-demand GPU access, according to Yunong Xiao, director of engineering at Google Cloud, and Steren Giannini, head of product for Google Cloud Run. That model is reshaping how businesses deploy AI in real time, without being locked into proprietary hardware or stuck waiting for scarce infrastructure.
“The container you deploy to Cloud Run has nothing proprietary about Cloud Run,” Giannini told theCUBE. “That’s a very unique value proposition. You can literally take it, you run it on your local machine, you run it on Kubernetes, you run it on another cloud, but hopefully you prefer to run it on Google Cloud Run because it’s more efficient and highly scalable.”
Google Cloud’s Jago Macleod and John Belamaric talk with theCUBE about the future of Kubernetes for AI.
Cloud Run is already powering a range of high-demand applications, from L’Oréal’s online assistant to Shopify’s flash-sale infrastructure, according to Xia0 and Giannini. L’Oréal uses AI to support high-volume customer interactions. Shopify relies on Cloud Run to handle unpredictable traffic spikes and latency surges during major retail events. In both cases, serverless inference on Kubernetes delivers the scale and agility required for enterprise-grade performance.
“The big problem that people are struggling with [is] inference … it’s the cost,” Xiao said. “There’s very expensive hardware that you have to buy, and there’s a capacity crunch. What we’re seeing with our customers … is all of them are struggling to even just get supply of the cards or the [tensor processing units] or [graphics processing units] to be able to run their inference applications. We are actually … providing on-demand access to GPUs today.”
As AI use cases stretch infrastructure in new ways, Kubernetes is evolving to meet them. Originally designed for microservices and stateless applications, it now supports inference-aware scheduling, high-performance hardware optimization and intelligent automation. These enhancements allow organizations to fine-tune deployments without micromanaging resources, according to Jago Macleod, director of engineering, Kubernetes, at Google Cloud, and John Belamaric, senior staff software engineer at Google.
“One of the things we’re doing within Kubernetes is what we call dynamic resource allocation, which is about helping Kubernetes understand the hardware better than it used to,” Belamaric told theCUBE. “It’s like trying to shift the work of making all of these decisions from the user, the human to the machine … so when you’re asleep at night, if some other job finishes, you can get the thing you wanted, and your workload will run by the time you wake up in the morning.”
The Kubernetes community started with open-source ideals, but its impact now extends far beyond code. As AI becomes a shared force across industries, the ecosystem continues to grow, not just through technical breakthroughs, but a shared sense of purpose, according to Allen.
“I want everyone to feel like they can play a part,” Allen told theCUBE. “This is going to be something that touches all of mankind. Let me just find my part and play that role.”
(* Disclosure: TheCUBE is a paid media partner for the “Google Cloud: Passport to Containers” series. Neither Google Cloud, the sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
THANK YOU