UPDATED 10:49 EDT / MAY 23 2025

Joe Fernandes, vice president and general manager of the AI Business Unit at Red Hat Inc. and Brian Stevens, senior vice president and AI chief technology officer at Red Hat, talk to theCUBE during Red Hat Summit 2025 about the importance of unified AI infrastructure.

Red Hat’s Inference Server: The open-source glue for scalable AI

Thanks to artificial intelligence’s drive to enhance productivity, unifying AI infrastructure has become increasingly important. It streamlines and strengthens the entire AI development lifecycle, allowing organizations to build, deploy and scale AI solutions more efficiently, securely and cost-effectively.

Red Hat AI Inference Server addresses the unifying AI infrastructure need as a scalable, secure and consistent platform designed to deploy, manage and serve machine learning models across hybrid cloud environments. This solution aligns with the growing demand for robust AI infrastructure, according to Brian Stevens (pictured, left), senior vice president and AI chief technology officer at Red Hat Inc.

Red Hat’s Brian Stevens and Joe Fernandes talk with theCUBE about the company’s unified AI infrastructure commitment.

“The Inference Server is kind of the core; it’s the equivalent of Linux if you will, and Red Hat AI Inference Server being our chosen name, with [virtual large language model] being the open-source project equivalent to the Linux kernel, he said. “It really is that glue layer. It’s the thing that can stay the same so that all the innovation, accelerators and models can reach users without change. I think what we’ve done with vLLM [is that] Red Hat Inference Server can be that core platform and then all the stuff that we’ve talked about, the right agents, [model context protocol].”

Stevens and Joe Fernandes (right), vice president and general manager of the AI Business Unit at Red Hat, spoke with theCUBE’s Rebecca Knight and Rob Strechay at Red Hat Summit, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the importance of a unified AI infrastructure and how Red Hat is leading the charge through the AI Inference Server and vLLM. (* Disclosure below.)

Unifying AI infrastructure through Red Hat’s vLLM project

Red Hat’s vLLM project plays a key role in unifying AI infrastructure by bringing scalability and enterprise readiness to large language deployment. Through its integration with Kubernetes, support for hybrid clouds and focus on open-source innovation, vLLM enables organizations to make futuristic AI progress, according to Stevens.

“The way AI is heading is a very fragmented kind of world,” he said. “Our vision’s really been like, ‘How do we unify that into a common platform like we did with Linux, where there could be one core … vLLM that can run all models and run all accelerators?’ In doing so, think about what that means to end users. They can just have one vLLM platform and get to use all the best in future models and all the accelerators seamlessly.”

Red Hat leverages the Llama Stack to develop enterprise-ready agentic AI systems by integrating it into its OpenShift AI platform. This integration provides a unified framework for building, deploying and managing intelligent agents capable of complex reasoning, tool integration and retrieval-augmented generation workflows, according to Fernandes.

“Meta had just released Llama Stack as part of, I think, the Llama 3 launch,” he said. “It was an open-source license, but that gave us the opportunity to work with Meta and other partners who had similar interests. It’s going to become the core [application programming interface] for end users who want to build agents and applications on the platform. As they build new agents, it’ll bring new capabilities and also integrate with other capabilities like we talked about: The Model Context Protocol, MCP from Anthropic, that’s already integrated into the Llama Stack agents API for tool calling.”

Enabling AI models to run across various environments, such as the cloud, on-premise and the edge, is crucial for ease of deployment, adaptability and performance optimization. Red Hat supports this objective to maximize utility, according to Fernandes.

“Red Hat’s always been a platform company,” he said. “I think AI is just that next evolution, and so as a platform provider, we need to enable customers to run their AI models across any environment, any accelerator and … any model they choose to power their business. If you’re building a new application and you’re not building it on a cloud-native and containerized architecture, you’re sort of out of the mainstream.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of Red Hat Summit:

(* Disclosure: Red Hat Inc. sponsored this segment of theCUBE. Neither Red Hat nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Red Hat’s Inference Server: The open-source glue for scalable AI

Unifying AI infrastructure through Red Hat’s vLLM project

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

UiPath Fusion 2025

theCUBE + NYSE Wired: AI Factories - Data Centers of the Future 2025

DigiCert World Quantum Readiness Day 2025

EVOLVE25

Oktane 2025

Red Hat’s Inference Server: The open-source glue for scalable AI

Unifying AI infrastructure through Red Hat’s vLLM project

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

UiPath Fusion 2025

theCUBE + NYSE Wired: AI Factories - Data Centers of the Future 2025

DigiCert World Quantum Readiness Day 2025

EVOLVE25

Oktane 2025

Cookies