UPDATED 09:00 EDT / OCTOBER 14 2025

AI

Red Hat AI 3 targets production inference and agents

IBM Corp. subsidiary Red Hat today announced Red Hat AI 3, calling it a major evolution of its hybrid cloud-native artificial intelligence that can power enterprise projects in production at scale.

Red Hat AI 3 is designed to manage AI workloads that span datacenters, clouds and edge environments while maintaining flexibility and control. A hybrid architecture is key to the strategy.

“AI platforms aren’t going to run a single model on a single inference server on a single machine,” said Joe Fernandes, vice president and general manager of Red Hat’s AI business unit. “You’re going to have multiple models across multiple inference servers across a distributed environment.”

The release bundles enhancements across the open-source giant’s OpenShift AI, Enterprise Linux AI and the AI Inference Server into one platform aimed at solving the messiness of enterprise AI inference using multiple tools and frameworks.

The platform is designed to help organizations transition from experimentation to operational AI across platforms. “Generative AI adoption is exploding across enterprises in every vertical,” Fernandes said. “We see challenges arising around the rising costs of generative AI, particularly as the number of use cases grows and those use cases begin to scale out to production deployments.”

Focus on inference

At the heart of Red Hat AI 3 is inference, the compute-hungry cousin of model training, where applications run. With AI 3, Red Hat is investing heavily in this operational layer, building on open-source projects like the vLLM open-source library and introducing llm-d, a new distributed inference engine designed to bring intelligence to how LLMs are scheduled and served on the Kubernetes orchestrator for portable software containers.

“With llm-d, customers can adopt an intelligent AI platform that integrates seamlessly with Kubernetes,” said Steven Huels, Red Hat’s vice president of AI engineering. “Kubernetes scheduling helps maximize model performance and utilization of the underlying [graphics processing unit hardware so they’re not sitting there idle.”

Huels emphasized that LLM inference diverges from traditional application behavior.  “Traditional apps are uniform and predictable,” he said. “LLMs handle variable requests. Traditional app services are often stateless, but with LLMs, we’re trying to maintain state and rely heavily on KVCache,” a technique used in transformer-based language models to speed up the process of generating text.

Model-as-a-service

On top of the inference layer, Red Hat AI 3 includes a new model-as-a-service function that uses an integrated AI gateway powered by Red Hat Connectivity Link. The idea is to make models accessible as simple, scalable endpoints with usage tracking and plug-and-play support for third-party software. Enterprises that want to serve models internally for cost, privacy or compliance reasons can operate like a commercial model-serving service within their own environments.

The platform also anticipates the rise of agent-based systems. To support agents, Red Hat AI 3 adds the Llama Stack application programming interface layer and integrates the emerging Model Context Protocol. “We give you the flexibility to choose whatever framework you prefer,” Fernandes said. “We want to be the platform that can support all agents regardless of how they’re built.”

Red Hat AI 3 also introduces a model customization toolkit based on the InstructLab open-source project that supports community contributions to large language models. It provides specialized Python libraries that give developers greater flexibility and control and uses the open-source Docling project to streamline the ingestion of unstructured documents into an AI-readable format.

“This allows AI engineers to develop and tune models the way that they’re most accustomed,” Huels said.

Red Hat is also enhancing retrieval-augmented generation, integration with Kubeflow workflow automation and launching Feature Store, a component within the OpenShift AI platform that provides a centralized repository for storing, managing and serving machine learning features.

Photo: Red Hat

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.