UPDATED 13:00 EDT / APRIL 29 2025

Jeremy Olmsted-Thompson, principal engineer at Google, and Roman Arcea, principal engineer at Google, talk with theCUBE about how the company aims to simplify AI infrastructure at the “Google Cloud Passport to Containers” interview series – 2025. AI

Inside Google Cloud’s new blueprint for scaling AI infrastructure

Artificial intelligence’s potential may be explosive, but its infrastructure demands are anything but chaotic. As enterprises accelerate toward production-scale AI, success increasingly depends on clear standards, efforts to simplify AI infrastructure and the ability to run workloads wherever business needs arise.

While compute capacity and Kubernetes scalability dominate most cloud conversations, the deeper thread is infrastructure maturity — how far it has come and how smart it must now become, according to Roman Arcea (pictured, left), group product manager at Google LLC.

 Google’s Roman Arcea and Jeremy Olmsted-Thompson talk with theCUBE about how the company aims to simplify AI infrastructure – “Google Cloud Passport to Containers” interview series – 2025.

Google’s Roman Arcea and Jeremy Olmsted-Thompson talk with theCUBE about how the company aims to simplify AI infrastructure while preserving enterprise flexibility.

“I think it’s time for us to start to converge on a standard of provisioning compute capacity in a distributed fashion, and I think Kubernetes is that API that seems to be the most promising right now in the market to give us this unified standard for infrastructure consumption,” he told theCUBE. “It’s the first time now where we see that it’s both the application ecosystem and the developer ecosystem that wants to integrate with Kubernetes from the upper layer.”

Arcea and Jeremy Olmsted-Thompson (right), principal engineer at Google, spoke with theCUBE’s Savannah Peterson for the “Google Cloud: Passport to Containers” interview series, during an exclusive interview on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed simplifying AI architecture while preserving enterprise flexibility. (* Disclosure below.)

Google’s strategy: Simplify AI infrastructure while preserving enterprise flexibility

Kubernetes has become the platform of choice for AI workloads, but its real challenge is how to simplify AI infrastructure while balancing ease of use with deep customization. The pressure to simplify has never been higher, yet most enterprise workloads still demand access to advanced configurations, according to Olmsted-Thompson.

“You come to run AI in Kubernetes because Kubernetes can do basically anything,” he said. “But maybe you don’t have that experience actually in Kubernetes, and it is a very large surface. Simplification … it’s really about making you think about less.”

Google’s approach to optional complexity reflects years of listening to customers’ real-world needs. Most workloads don’t require the full range of configuration settings, but the total set of knobs required across all workloads can be vast, according to Olmsted-Thompson.

“You don’t want to have to learn about all of those knobs, but what we’ve found is you can’t really take them away, because whoever needed them is going to need them again,” he said. “We’ve chosen good defaults, you’re in a secure configuration, and as you grow, you only need to worry about that little area that you’re running up against.”

That balancing act — broad support without cognitive overload — also drives scalability efforts across Google Kubernetes Engine, according to Arcea. Building out the platform’s 65,000-node capacity helps high-end users and contributes to a more resilient open-source ecosystem.

“We are improving GKE’s offering … of this endless, almost open infrastructure canvas on which you can build your business,” Arcea added. “We contribute all of those investments back to the open source to make the entire open-source product stronger, better [and] more resilient. Upgrade’s big on our radar.”

Automation and abstraction turn scaling into a fine art

Enterprises want AI workloads to scale. But they also want that scale to come with predictability, performance and a lot less babysitting. The key to reconciling these demands lies in abstracting the infrastructure details away from developers to simplify AI infrastructure and streamline the operational lift, according to Olmsted-Thompson.

“A few years ago, with GKE Autopilot, we came out with this concept of compute classes,” he said. “It’s really about creating an abstraction between the platform and the application and abstracting away the categorization of compute that you might need for a given workload.”

Compute classes represent a shift in mindset: Instead of targeting infrastructure directly, developers aim for business outcomes. Platform administrators can now design custom compute environments — setting fallback logic, hardware preferences and spot instance strategies — while developers stay focused on building, unaware of the quota battles unfolding behind the curtain, Olmsted-Thompson explained.

“Let’s say I want to define my own high-performance compute class,” he said. “I could say, ‘I prefer spot, give me all the spot you can, but if you can’t get enough, I’ll fall back to on-demand.’ The application developers don’t really need to think about this anymore … they get to target the platform admin builds and shapes with whatever controls they need, building their own abstraction.”

But developer simplicity doesn’t mean less control. Google’s evolving automation strategy builds in robust guardrails — via policy engines, observability and compute class constraints — that allow enterprises to scale safely without sacrificing oversight, Arcea pointed out.

“Compute classes is nothing else in the policy engine that allows our users to consume infrastructure on their terms,” he said. “It’s fully automated, but if you want to set boundaries that are right for your business while still keeping that powerful automation, this is what we’re going towards.”

That automation shift has happened fast. Not long ago, customers bristled at the idea of Kubernetes making resource decisions on their behalf. Now, it’s standard practice, fueled by necessity and a 40-times increase in GKE users adopting automated resizing, according to Arcea.

“Every single major customer now does that,” he said. “The conversation right now is literally, ‘I don’t want to babysit my [central processing units], my memory, my pods … make sure you deliver my objectives. I couldn’t care less if you run it on this shape or that shape, as long as the price, performance, time to market and ease of operations are there.’”

Quick startup times can dramatically reduce infrastructure costs by shrinking the buffer capacity needed to handle latency. As generative AI workloads become normalized enterprise services, developers are shifting away from hardware concerns and focusing more on performance goals — another sign of the broader push to simplify AI infrastructure across operations. At the same time, leaders are reevaluating infrastructure investments based on return on investment, not just capability, according to Olmsted-Thompson and Arcea.

“It’s fine to over-provision if that gives you returns,” Arcea said. “But it’s not OK to run and pay for overhead if that doesn’t return anything back to you. I think if you frame that problem this way … it’s very clear where we should be going with those investments.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the “Google Cloud: Passport to Containers” interview series:

(* Disclosure: TheCUBE is a paid media partner for the “Google Cloud: Passport to Containers” interview series. Neither Google Cloud, the sponsor of theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU