UPDATED 20:20 EDT / MAY 22 2025

Google and Red Hat team up to revolutionize AI infrastructure with open-source tools, inference efficiency, and scalable systems. AI

Google Cloud retools AI infrastructure for the inference era

As artificial intelligence models grow more complex and enterprise demands scale, Google Cloud is working to reshape the future of AI infrastructure.

Google and Red Hat team up to revolutionize AI infrastructure with open-source tools, inference efficiency, and scalable systems.

Google Cloud’s Mark Lohmeyer talks with theCUBE about the company’s open-source AI efforts.

From open-source projects such as JAX to versatile large language models, the company is redefining performance and cost efficiency as hardware costs continue to rise, according to Mark Lohmeyer (pictured), vice president and general manager of AI computing infrastructure at Google Cloud.

“We’ve said 2025 is the year of inference at Google, and we’re seeing this in terms of our internal use as well as how our external cloud customers are using AI,” he said. “You’re seeing the emergence of reasoning models, which require multiple steps to reason through what is the best possible answer. Then, those reasoning models get built into agents and agentic workflows. If you think about the pressure that this new class of inference workload, these types of models puts on the infrastructure, it’s like nothing we’ve ever seen.”

Lohmeyer spoke with theCUBE’s Rob Strechay and Rebecca Knight at Red Hat Summit, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed Google Cloud reworking the manuscript and engineering the infrastructure to lead AI’s rapid evolution. (* Disclosure below.)

The inference era redefines AI infrastructure needs

With the rise of reasoning models capable of multi-step decision-making, organizations now face intense computational loads. These models often function as agents within broader agentic workflows, solving complex tasks collaboratively. Meeting their demands requires powerful, dynamically adaptable and cost-efficient infrastructure, according to Lohmeyer.

VLLM has gained traction for its performance and cost efficiency, particularly on GPUs. Google is extending VLLM support to TPUs, unlocking even greater price-performance potential. This flexibility allows customers to switch between accelerators based on workload needs, reducing cost per inference — a critical metric for AI-driven business models.

“Google has a vibrant history in open source,” Lohmeyer said. “Kubernetes, of course, changed the world in many ways. But also, more recently, technologies like JAX are amazing frameworks for AI model training and serving that we developed within Google to support our creation and training of Gemini. But then we thought, ‘Hey, this is such a powerful technology, let’s open-source it.'”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of Red Hat Summit:

(* Disclosure: Red Hat Inc. sponsored this segment of theCUBE. Neither Red Hat nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU