UPDATED 19:03 EST / MARCH 01 2026

INFRA

Report: Nvidia is working on a top-secret AI inference chip that could debut next month

Nvidia Corp. is reportedly working on a dedicated inference processor that will be used by OpenAI Group PBC and other artificial intelligence companies to develop faster and more efficient models, according to a report late Friday in the Wall Street Journal.

The new inference platform is expected to be launched at Nvidia’s annual GTC developer conference in San Jose later this month, and will integrate technology the company acquired from the chip startup Groq Inc. in December.

Inference, which refers to the process of running trained AI models in production, has emerged as a key area of focus in the AI industry. Nvidia rivals such as Google LLC and Amazon Web Services Inc. have both developed specialized inference chips that compete with its graphics processing units, and it also faces competition from dedicated inference chip startups such as Cerebras Systems Inc. and SambaNova Systems Inc.

The Journal said OpenAI has had early access to Nvidia’s new inference chip and will become one of its earliest adopters, in what amounts to a significant win for the chipmaker. Though OpenAI has been shopping for more efficient alternatives to Nvidia’s GPUs in order to diversify its computing stack, it received $30 billion in funding from the world’s top chipmaker last week in a deal that reaffirms its commitment to the company.

Nvidia is the world’s most dominant maker of GPUs, which are specialized processors that can perform billions of tasks simultaneously. But although the company continues to insist that they’re useful for both training and inference, its GPUs are no longer considered the most efficient option for powering AI applications. Many companies have found that Nvidia’s chips consume too much energy, making them extremely costly for applications such as AI agents, which carry out tasks autonomously on behalf of human users and require immense computing power.

That’s why OpenAI signed a multibillion-dollar contract with Cerebras last month to access its dinner plate-sized inference-focused chips. Cerebras claims that its silicon is much faster than Nvidia’s GPUs when it comes to inference tasks.

Nvidia’s inference chip is reportedly going to integrate technology developed by Groq. Nvidia paid $20 billion to license Groq’s technology on a nonexclusive basis in December, and as part of that deal it also hired its founding Chief Executive Officer Jonathan Ross and its President Sunny Madra. It was billed at the time as one of the largest-ever “acquihires” in Silicon Valley’s history.

Groq’s inference chips are known as “language processing units,” and they’re based on an entirely novel architecture that enables them to perform inference with much lower energy usage. However, Nvidia hasn’t said how it plans to use the startup’s technology.

If the report is true and Nvidia does announce a dedicated chip for inference, it would make for a notable U-turn considering its stance just one year earlier, said Holger Mueller of Constellation Research. “At last year’s GTC, Nvidia CEO Jensen Huang was talking about the exploding demand for AI inference, and he positioned this as a win for the company, saying its current chip offerings would address these workloads,” the analyst pointed out. “But it’s coming under pressure, with many reports saying Google wants to do 10% of Nvidia’s business. So given Nvidia’s engineering roots, it wouldn’t be a surprise if it suddenly unveiled a more efficient chip architecture for inference soon.”

OpenAI reportedly wants to use Nvidia’s new inference chip to power its Codex programming tool, which is a rival to Anthropic PBC’s Claude Code. Coding applications have emerged as one of the most powerful and profitable use cases for generative AI, and it’s an area where OpenAI is only second-best, for Claude Code is widely considered to be the market leader.

Nvidia is also pushing its central processing units as another alternative for running inference workloads. Traditionally, most companies pair its GPUs with CPUs, using the two chips in tandem to compensate for the inefficiencies of the other.

But Nvidia says some agentic AI workloads can actually run more efficiently on its most advanced Grace CPUs alone. Last month, Meta Platforms Inc. became the first company to commit to making the first sizable CPU-only deployment to support its ad-targeting agents in production.

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.