UPDATED 09:52 EDT / MAY 13 2026

Taneem Ibrahim, director of engineering for AI inference at Red Hat and Bill Pearson, vice president, data center and AI, at Intel Corp, discussed scalable AI inference during Red Hat Summit 2026.

Red Hat and Intel spotlight scalable AI inference as enterprises move beyond the GPU gold rush

As companies move from testing AI to broader adoption, the biggest challenge is building scalable AI inference systems that perform without breaking the budget. The next wave of AI won’t be won on raw power alone — it will be decided by who can do more with less.

When AI inference first took off, the focus was on deploying the largest possible models across massive GPU clusters following the rise of ChatGPT and open-weight models. That’s when customers turned to Red Hat Inc., looking for ways to scale those models across platforms like Red Hat Enterprise Linux and OpenShift without sacrificing control or cost efficiency, according to Taneem Ibrahim (pictured, right), director of engineering for AI inference at Red Hat.

“That’s when the friction moment came in for us, like, ‘How do I take this project — called vLLM, [which] we’re the largest commercial contributor to — and work it at scale with a project like llm-d?’” Ibrahim said. “How you drive the cost per token down so that you can operationalize your AI, you can govern your AI [and] you can deploy it at scale?”

Ibrahim and Bill Pearson (left), vice president of data center and AI at Intel Corp., spoke with theCUBE’s Rob Strechay and Rebecca Knight at Red Hat Summit 2026, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed the development of scalable AI inference systems and the growing role of open-source, CPU-driven AI deployments. (* Disclosure below.)

Scalable AI inference shifts infrastructure priorities

As agentic AI reshapes infrastructure demands, CPUs are playing a bigger role than they did during the earlier GPU-heavy phase of adoption, Pearson explained. Companies are now focused on finding the right balance of both to meet performance needs efficiently, underpinning Red Hat and Intel’s latest collaboration in bringing full vLLM support for Intel Xeon to Red Hat AI 3.4.

“It isn’t a one-size-fits-all approach, but rather, ‘What’s my workload? What’s the outcome I’m looking for?’” he said. “‘How do I put together the right combination of hardware and software to go and deliver that outcome?’”

Part of that calculus is recognizing the hardware companies already have. CPUs are already deployed across most data centers, and a growing share of inference workloads — particularly agentic tasks like tool calling and data orchestration — don’t require GPUs at all. That frees up GPU capacity for the heavy lifting, according to Pearson.

“As we’ve gone through this with our customers in the industry, we’ve seen that people often have just assumed, ‘I’ve got a hammer. I need the nail to hit it with,'” he said. “Once they take a step back to say, ‘Wait a minute. I have these CPUs in my data center’ — or, ‘I need to figure out how to balance the right number of CPUs with the right number of GPUs to achieve that outcome I’m looking for’ — they’re actually going to get better results at a better price point for delivering lower-cost tokens.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of Red Hat Summit 2026:

(* Disclosure: Red Hat sponsored this segment of theCUBE. Neither Red Hat nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Red Hat and Intel spotlight scalable AI inference as enterprises move beyond the GPU gold rush

Scalable AI inference shifts infrastructure priorities

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Red Hat Summit 2026

Securing the AI Factory with Dell Technologies and Intel 2026

Atlassian Team 2026

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

Red Hat and Intel spotlight scalable AI inference as enterprises move beyond the GPU gold rush

Scalable AI inference shifts infrastructure priorities

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Red Hat Summit 2026

Securing the AI Factory with Dell Technologies and Intel 2026

Atlassian Team 2026

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

Cookies