

As artificial intelligence models grow more complex and enterprise demands scale, Google Cloud is working to reshape the future of AI infrastructure.
Google Cloud’s Mark Lohmeyer talks with theCUBE about the company’s open-source AI efforts.
From open-source projects such as JAX to versatile large language models, the company is redefining performance and cost efficiency as hardware costs continue to rise, according to Mark Lohmeyer (pictured), vice president and general manager of AI computing infrastructure at Google Cloud.
“We’ve said 2025 is the year of inference at Google, and we’re seeing this in terms of our internal use as well as how our external cloud customers are using AI,” he said. “You’re seeing the emergence of reasoning models, which require multiple steps to reason through what is the best possible answer. Then, those reasoning models get built into agents and agentic workflows. If you think about the pressure that this new class of inference workload, these types of models puts on the infrastructure, it’s like nothing we’ve ever seen.”
Lohmeyer spoke with theCUBE’s Rob Strechay and Rebecca Knight at Red Hat Summit, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed Google Cloud reworking the manuscript and engineering the infrastructure to lead AI’s rapid evolution. (* Disclosure below.)
With the rise of reasoning models capable of multi-step decision-making, organizations now face intense computational loads. These models often function as agents within broader agentic workflows, solving complex tasks collaboratively. Meeting their demands requires powerful, dynamically adaptable and cost-efficient infrastructure, according to Lohmeyer.
VLLM has gained traction for its performance and cost efficiency, particularly on GPUs. Google is extending VLLM support to TPUs, unlocking even greater price-performance potential. This flexibility allows customers to switch between accelerators based on workload needs, reducing cost per inference — a critical metric for AI-driven business models.
“Google has a vibrant history in open source,” Lohmeyer said. “Kubernetes, of course, changed the world in many ways. But also, more recently, technologies like JAX are amazing frameworks for AI model training and serving that we developed within Google to support our creation and training of Gemini. But then we thought, ‘Hey, this is such a powerful technology, let’s open-source it.'”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of Red Hat Summit:
(* Disclosure: Red Hat Inc. sponsored this segment of theCUBE. Neither Red Hat nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
THANK YOU