UPDATED 08:00 EDT / MAY 20 2025

Red Hat Expands AI offerings with inference server and validated models

Red Hat Inc. today announced a series of updates aimed at making generative artificial intelligence more accessible and manageable in enterprises.

They include the debut of the Red Hat AI Inference Server, a collection of third-party validated AI models and enhancements across Red Hat OpenShift AI and Red Hat Enterprise Linux AI. The updates are collectively intended to reduce deployment complexity, standardize model inference, and broaden multicloud and multilanguage support.

Inference, which is the process of running trained models to generate predictions or responses, is increasingly becoming a bottleneck in enterprise AI workflows because of growing model sizes and hardware demands. The new AI Inference Server is meant to address this roadblock with a high-throughput, cost-efficient alternative that works across hybrid and multicloud environments.

It’s built on the open-source vLLM project, a high-performance inference engine that supports continuous batching, multiple graphics processing units and large context inputs. VLLM has been adopted as a de facto standard by several model developers, including those behind Llama, Gemma and Phi.

“Red Hat AI Inference Server becomes a unified product for deploying vLLM as a container, delivering two to four times more token production with pre-optimized models,” said Brian Stevens, senior vice president and AI chief technology officer at Red Hat.

Hardened inferencing

Red Hat described its version as a hardened, enterprise-ready package that includes compression technology from Neuralmagic Inc. and a curated, optimized model catalog hosted by Hugging Face Inc. It said its goal is to provide consistent, resource-efficient inference across different cloud providers, hardware accelerators, and operating systems, including non-Red Hat platforms.

The embrace of vLLM is notable. By betting on an open-source inference engine and contributing enterprise-grade tooling and support, Red Hat is signaling its intent to shape this emerging layer of the AI stack much like it did with containers and Kubernetes through branded products such as OpenShift.

In conjunction, Red Hat introduced “AI Validated Models,” a set of third-party models available via Hugging Face that are tested for compatibility and performance within the Red Hat AI stack. They’re intended to simplify the model selection process and give information technology teams confidence in performance. Red Hat says ongoing validation will help keep the collection up to date.

The model catalog integrates directly with Red Hat OpenShift AI and RHEL AI and is also accessible to users of the standalone AI Inference Server.

“We’ll be publishing models that we’ve actually tested from different providers… available online through our Hugging Face repository,” said Joe Fernandes, vice president and general manager of the AI business unit at Red Hat.

Other Red Hat AI platforms are getting updates as well. OpenShift AI version 2.20 introduces a preview of a new model catalog interface, distributed training capabilities for PyTorch via Kubeflow, and a feature store based on Feast to centralize and manage training and inference data.

RHEL AI version 1.5 now supports Google Cloud Marketplace deployments, joining Amazon Web Services Inc. and Microsoft Corp. Azure in Red Hat’s multicloud lineup. Also included is enhanced multilingual support for Spanish, German, French, and Italian through InstructLab, an open-source project for enhancing large language models. Support for Japanese, Hindi, and Korean is on deck. Customers can bring their own “teacher” and “student” models for fine-tuning and evaluation.

By integrating AI lifecycle management into its existing OpenShift and RHEL platforms, Red Hat targets organizations that want to scale AI workloads without reinventing their infrastructure or retraining their operations teams.

“AI is the next set of workloads that customers need to deploy,” Fernandes said. “We plan to extend that with our AI platforms and work with services partners to bring those solutions to customers.”

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Red Hat Expands AI offerings with inference server and validated models

Hardened inferencing

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

VMware Explore 2025

Future of Data Platforms Summit 2025

WOW: World of Workato 2025

Supermicro Open Storage Summit 2025

Black Hat USA 2025

Red Hat Expands AI offerings with inference server and validated models

Hardened inferencing

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

VMware Explore 2025

Future of Data Platforms Summit 2025

WOW: World of Workato 2025

Supermicro Open Storage Summit 2025

Black Hat USA 2025

Cookies