UPDATED 13:00 EDT / APRIL 03 2025

Kubernetes for AI optimizes workloads, scales ML models, and enhances efficiency in modern AI-driven applications.

NEWS

Kubernetes meets AI: How Google Cloud is engineering stability for the next wave of inference

Kubernetes has evolved far beyond its roots as an open-source container orchestration platform — it’s now a cornerstone of modern AI and machine learning infrastructure.

As the chaotic sprint of early innovation gives way to a more focused and deliberate momentum, Kubernetes is powering a new era of intelligent workloads. For developers and enterprises alike, it has become an essential engine for deploying, managing and scaling AI with precision and efficiency.

Google Cloud’s Jago Macleod and John Belamaric talk with theCUBE about Kubernetes for AIOps.

“Early on, the end users were the community, so we didn’t have a lot of [project management] work going on,” said Jago Macleod (pictured, right), director of engineering, Kubernetes, at Google Cloud. “We were bringing what we had at Borg into the open-source world, so we moved fast. We ran production workloads we probably shouldn’t have at that point. Then it exploded into the ecosystem with a huge explosion of projects in the CNCF. Look at the landscape page in the CNCF, and it’s pretty dizzying at this point to figure out what you need, what’s useful and what’s necessary.”

Macleod and John Belamaric (left), senior staff software engineer at Google, spoke with theCUBE’s Savannah Peterson for the “Google Cloud: Passport to Containers” interview series, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the future of Kubernetes for AI, underscoring the need to balance stability with innovation, optimize for inference workloads and simplify migration processes. (* Disclosure below.)

Kubernetes for AI represents the next implementation frontier

As AI and machine learning workloads become more complex, Kubernetes has had to evolve to support these new demands. Traditionally designed for microservices and stateless applications, it’s now being adapted to support high-performance AI workloads, particularly in training and inference, according to Belamaric.

“My focus for the last year or so has been tightly focused on trying to enable our work in upstream Kubernetes to enable Kubernetes to work better for AI and ML workloads,” he said. “Some of the speed bumps we see in those areas are just that Kubernetes was originally designed for a different set of use cases … things like hardware. In a microservices type of HTTP world, we’re trying to make hardware more and more fungible.”

In AI and machine learning, hardware resources such as GPUs and TPUs are highly specialized, and not all are created equal. Kubernetes is now being enhanced to allow for dynamic resource allocation, enabling AI workloads to request and optimize specific hardware configurations. This ensures that AI applications run efficiently, reducing computational bottlenecks and optimizing inference speed.

“One of the things we’re doing within Kubernetes is what we call dynamic resource allocation, which is about helping Kubernetes understand the hardware better than it used to,” Belamaric said. “It’s like trying to shift the work of making all of these decisions from the user, the human to the machine, and that way when you’re asleep at night, if some other job finishes, you can get the thing you wanted and your workload will run by the time you wake up in the morning. It’s shifting more work to the machine. That’s one of the speed bumps.”

The future of the Kubernetes/AI integration

Kubernetes is being optimized for inference workloads, ensuring that AI applications run efficiently at scale. With advancements such as in-place pod updates and improved scheduling mechanisms, Kubernetes allocates resources dynamically based on workload demand. The goal is to make Kubernetes inference-aware, ensuring that organizations can deploy AI models seamlessly without being hindered by infrastructure limitations, according to Macleod.

“A lot of the work we’re doing adjacent to the DRA work is in this area — in-place pod updates so you can scale up and down pods at runtime without resizing them,” he said. “The scheduling aspect and auto-scaling become a lot more interesting when you can scale a pod or add a new one. You can do this at different layers in the cake again. That’s the big push; the idea that inference is the next web app is a term that we can talk about a lot.”

Looking ahead, Kubernetes for AI will continue to cut across a plethora of workloads. The next wave of innovation will focus on enhancing its ability to support inference workloads at scale, enabling AI applications to be deployed across cloud, edge and on-premises environments seamlessly.

“[The harvested efficiency] goes straight to that accelerating human innovation, certainly in our little patch,” Belamaric said. “We spend a lot of time doing things that, maybe, if 80% of it could be done by machine, that would be fivefold the number of people in the project.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the “Google Cloud: Passport to Containers” interview series:

(* Disclosure: TheCUBE is a paid media partner for the “Google Cloud: Passport to Containers” series. Neither Google Cloud, the sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU

Kubernetes meets AI: How Google Cloud is engineering stability for the next wave of inference

Kubernetes for AI represents the next implementation frontier

The future of the Kubernetes/AI integration

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RAISE Summit 2025

AWS & Ecosystem Leaders Halftime Report - 2025

Understanding Today's Digital Business With Dynatrace

Black Hat USA 2025

World of Workato 2025

RECENT CUBE EVENTS

Databricks Data + AI Summit 2025

AWS Summit Washington, DC 2025

Google Cloud Partner AI Series 2025

Snowflake Summit 2025

IBM Think 2025

Kubernetes meets AI: How Google Cloud is engineering stability for the next wave of inference

Kubernetes for AI represents the next implementation frontier

The future of the Kubernetes/AI integration

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

LATEST STORIES

LATEST STORIES

RAISE Summit 2025

AWS & Ecosystem Leaders Halftime Report - 2025

Understanding Today's Digital Business With Dynatrace

Black Hat USA 2025

World of Workato 2025

Databricks Data + AI Summit 2025

AWS Summit Washington, DC 2025

Google Cloud Partner AI Series 2025

Snowflake Summit 2025

IBM Think 2025

Cookies