

Red Hat Inc. today announced a series of updates aimed at making generative artificial intelligence more accessible and manageable in enterprises.
They include the debut of the Red Hat AI Inference Server, a collection of third-party validated AI models and enhancements across Red Hat OpenShift AI and Red Hat Enterprise Linux AI. The updates are collectively intended to reduce deployment complexity, standardize model inference, and broaden multicloud and multilanguage support.
Inference, which is the process of running trained models to generate predictions or responses, is increasingly becoming a bottleneck in enterprise AI workflows because of growing model sizes and hardware demands. The new AI Inference Server is meant to address this roadblock with a high-throughput, cost-efficient alternative that works across hybrid and multicloud environments.
It’s built on the open-source vLLM project, a high-performance inference engine that supports continuous batching, multiple graphics processing units and large context inputs. VLLM has been adopted as a de facto standard by several model developers, including those behind Llama, Gemma and Phi.
“Red Hat AI Inference Server becomes a unified product for deploying vLLM as a container, delivering two to four times more token production with pre-optimized models,” said Brian Stevens, senior vice president and AI chief technology officer at Red Hat.
Red Hat described its version as a hardened, enterprise-ready package that includes compression technology from Neuralmagic Inc. and a curated, optimized model catalog hosted by Hugging Face Inc. It said its goal is to provide consistent, resource-efficient inference across different cloud providers, hardware accelerators, and operating systems, including non-Red Hat platforms.
The embrace of vLLM is notable. By betting on an open-source inference engine and contributing enterprise-grade tooling and support, Red Hat is signaling its intent to shape this emerging layer of the AI stack much like it did with containers and Kubernetes through branded products such as OpenShift.
In conjunction, Red Hat introduced “AI Validated Models,” a set of third-party models available via Hugging Face that are tested for compatibility and performance within the Red Hat AI stack. They’re intended to simplify the model selection process and give information technology teams confidence in performance. Red Hat says ongoing validation will help keep the collection up to date.
The model catalog integrates directly with Red Hat OpenShift AI and RHEL AI and is also accessible to users of the standalone AI Inference Server.
“We’ll be publishing models that we’ve actually tested from different providers… available online through our Hugging Face repository,” said Joe Fernandes, vice president and general manager of the AI business unit at Red Hat.
Other Red Hat AI platforms are getting updates as well. OpenShift AI version 2.20 introduces a preview of a new model catalog interface, distributed training capabilities for PyTorch via Kubeflow, and a feature store based on Feast to centralize and manage training and inference data.
RHEL AI version 1.5 now supports Google Cloud Marketplace deployments, joining Amazon Web Services Inc. and Microsoft Corp. Azure in Red Hat’s multicloud lineup. Also included is enhanced multilingual support for Spanish, German, French, and Italian through InstructLab, an open-source project for enhancing large language models. Support for Japanese, Hindi, and Korean is on deck. Customers can bring their own “teacher” and “student” models for fine-tuning and evaluation.
By integrating AI lifecycle management into its existing OpenShift and RHEL platforms, Red Hat targets organizations that want to scale AI workloads without reinventing their infrastructure or retraining their operations teams.
“AI is the next set of workloads that customers need to deploy,” Fernandes said. “We plan to extend that with our AI platforms and work with services partners to bring those solutions to customers.”
THANK YOU