

Kubernetes has evolved far beyond its roots as an open-source container orchestration platform — it’s now a cornerstone of modern AI and machine learning infrastructure.
As the chaotic sprint of early innovation gives way to a more focused and deliberate momentum, Kubernetes is powering a new era of intelligent workloads. For developers and enterprises alike, it has become an essential engine for deploying, managing and scaling AI with precision and efficiency.
Google Cloud’s Jago Macleod and John Belamaric talk with theCUBE about Kubernetes for AIOps.
“Early on, the end users were the community, so we didn’t have a lot of [project management] work going on,” said Jago Macleod (pictured, right), director of engineering, Kubernetes, at Google Cloud. “We were bringing what we had at Borg into the open-source world, so we moved fast. We ran production workloads we probably shouldn’t have at that point. Then it exploded into the ecosystem with a huge explosion of projects in the CNCF. Look at the landscape page in the CNCF, and it’s pretty dizzying at this point to figure out what you need, what’s useful and what’s necessary.”
Macleod and John Belamaric (left), senior staff software engineer at Google, spoke with theCUBE’s Savannah Peterson for the “Google Cloud: Passport to Containers” interview series, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the future of Kubernetes for AI, underscoring the need to balance stability with innovation, optimize for inference workloads and simplify migration processes. (* Disclosure below.)
As AI and machine learning workloads become more complex, Kubernetes has had to evolve to support these new demands. Traditionally designed for microservices and stateless applications, it’s now being adapted to support high-performance AI workloads, particularly in training and inference, according to Belamaric.
“My focus for the last year or so has been tightly focused on trying to enable our work in upstream Kubernetes to enable Kubernetes to work better for AI and ML workloads,” he said. “Some of the speed bumps we see in those areas are just that Kubernetes was originally designed for a different set of use cases … things like hardware. In a microservices type of HTTP world, we’re trying to make hardware more and more fungible.”
In AI and machine learning, hardware resources such as GPUs and TPUs are highly specialized, and not all are created equal. Kubernetes is now being enhanced to allow for dynamic resource allocation, enabling AI workloads to request and optimize specific hardware configurations. This ensures that AI applications run efficiently, reducing computational bottlenecks and optimizing inference speed.
“One of the things we’re doing within Kubernetes is what we call dynamic resource allocation, which is about helping Kubernetes understand the hardware better than it used to,” Belamaric said. “It’s like trying to shift the work of making all of these decisions from the user, the human to the machine, and that way when you’re asleep at night, if some other job finishes, you can get the thing you wanted and your workload will run by the time you wake up in the morning. It’s shifting more work to the machine. That’s one of the speed bumps.”
Kubernetes is being optimized for inference workloads, ensuring that AI applications run efficiently at scale. With advancements such as in-place pod updates and improved scheduling mechanisms, Kubernetes allocates resources dynamically based on workload demand. The goal is to make Kubernetes inference-aware, ensuring that organizations can deploy AI models seamlessly without being hindered by infrastructure limitations, according to Macleod.
“A lot of the work we’re doing adjacent to the DRA work is in this area — in-place pod updates so you can scale up and down pods at runtime without resizing them,” he said. “The scheduling aspect and auto-scaling become a lot more interesting when you can scale a pod or add a new one. You can do this at different layers in the cake again. That’s the big push; the idea that inference is the next web app is a term that we can talk about a lot.”
Looking ahead, Kubernetes for AI will continue to cut across a plethora of workloads. The next wave of innovation will focus on enhancing its ability to support inference workloads at scale, enabling AI applications to be deployed across cloud, edge and on-premises environments seamlessly.
“[The harvested efficiency] goes straight to that accelerating human innovation, certainly in our little patch,” Belamaric said. “We spend a lot of time doing things that, maybe, if 80% of it could be done by machine, that would be fivefold the number of people in the project.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the “Google Cloud: Passport to Containers” interview series:
(* Disclosure: TheCUBE is a paid media partner for the “Google Cloud: Passport to Containers” series. Neither Google Cloud, the sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
THANK YOU