

Thanks to artificial intelligence’s drive to enhance productivity, unifying AI infrastructure has become increasingly important. It streamlines and strengthens the entire AI development lifecycle, allowing organizations to build, deploy and scale AI solutions more efficiently, securely and cost-effectively.
Red Hat AI Inference Server addresses the unifying AI infrastructure need as a scalable, secure and consistent platform designed to deploy, manage and serve machine learning models across hybrid cloud environments. This solution aligns with the growing demand for robust AI infrastructure, according to Brian Stevens (pictured, left), senior vice president and AI chief technology officer at Red Hat Inc.
Red Hat’s Brian Stevens and Joe Fernandes talk with theCUBE about the company’s unified AI infrastructure commitment.
“The Inference Server is kind of the core; it’s the equivalent of Linux if you will, and Red Hat AI Inference Server being our chosen name, with [virtual large language model] being the open-source project equivalent to the Linux kernel, he said. “It really is that glue layer. It’s the thing that can stay the same so that all the innovation, accelerators and models can reach users without change. I think what we’ve done with vLLM [is that] Red Hat Inference Server can be that core platform and then all the stuff that we’ve talked about, the right agents, [model context protocol].”
Stevens and Joe Fernandes (right), vice president and general manager of the AI Business Unit at Red Hat, spoke with theCUBE’s Rebecca Knight and Rob Strechay at Red Hat Summit, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the importance of a unified AI infrastructure and how Red Hat is leading the charge through the AI Inference Server and vLLM. (* Disclosure below.)
Red Hat’s vLLM project plays a key role in unifying AI infrastructure by bringing scalability and enterprise readiness to large language deployment. Through its integration with Kubernetes, support for hybrid clouds and focus on open-source innovation, vLLM enables organizations to make futuristic AI progress, according to Stevens.
“The way AI is heading is a very fragmented kind of world,” he said. “Our vision’s really been like, ‘How do we unify that into a common platform like we did with Linux, where there could be one core … vLLM that can run all models and run all accelerators?’ In doing so, think about what that means to end users. They can just have one vLLM platform and get to use all the best in future models and all the accelerators seamlessly.”
Red Hat leverages the Llama Stack to develop enterprise-ready agentic AI systems by integrating it into its OpenShift AI platform. This integration provides a unified framework for building, deploying and managing intelligent agents capable of complex reasoning, tool integration and retrieval-augmented generation workflows, according to Fernandes.
“Meta had just released Llama Stack as part of, I think, the Llama 3 launch,” he said. “It was an open-source license, but that gave us the opportunity to work with Meta and other partners who had similar interests. It’s going to become the core [application programming interface] for end users who want to build agents and applications on the platform. As they build new agents, it’ll bring new capabilities and also integrate with other capabilities like we talked about: The Model Context Protocol, MCP from Anthropic, that’s already integrated into the Llama Stack agents API for tool calling.”
Enabling AI models to run across various environments, such as the cloud, on-premise and the edge, is crucial for ease of deployment, adaptability and performance optimization. Red Hat supports this objective to maximize utility, according to Fernandes.
“Red Hat’s always been a platform company,” he said. “I think AI is just that next evolution, and so as a platform provider, we need to enable customers to run their AI models across any environment, any accelerator and … any model they choose to power their business. If you’re building a new application and you’re not building it on a cloud-native and containerized architecture, you’re sort of out of the mainstream.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of Red Hat Summit:
(* Disclosure: Red Hat Inc. sponsored this segment of theCUBE. Neither Red Hat nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
THANK YOU