UPDATED 15:52 EDT / APRIL 01 2026

Val Bercovici (left), chief AI officer of WekaIO Inc., Daniel Kearney (pictured, right) chief technology officer of Firmus Technologies Pty Ltd. talk to theCUBE about AI memory optimization engineering challenges of KV cache, the rise of AI agents, and how a joint proof-of-concept between WEKA and Firmus delivered 6.5 times more tokens using the same power budget - Nvidia GTC AI Conference & Expo 2026

Weka and Firmus target AI’s memory bottleneck with 6.5x token gains in POC

The next great AI race may be won not on raw compute, but on memory. With shortages exposing hard limits across data centers, AI memory optimization is taking center stage as a way to unlock more tokens, more efficiency and more value from the same infrastructure.

What was once a backstage infrastructure concern is now a primary competitive advantage, according to Val Bercovici (pictured, left), chief AI officer of WekaIO Inc. In a proof of concept conducted with Firmus Technologies Pty Ltd, Weka used storage economics to extend memory — preserving context instead of repeatedly reprocessing it on GPUs — to show how organizations could dramatically increase token output without increasing energy consumption.

“The results were what we expected, which was [that] you’re able to get — out of the same CapEx and OpEx, the same GPUs and energy cost — 6.5 times more, so 550% more, tokens,” Bercovici said. “It’s as if in a macro scenario, you just created five and a half new data centers out of thin air.”

Bercovici and Daniel Kearney (right), chief technology officer of Firmus Technologies Pty Ltd., spoke with theCUBE’s Gemma Allen at the Nvidia GTC AI Conference & Expo, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed AI memory optimization, the rise of agents and how a joint POC between Weka and Firmus showed more tokens using the same power budget. (* Disclosure below.)

AI memory optimization seeks to eliminate the recompute tax

A significant challenge in current AI infrastructure is the redundancy of “prefilling” data, according to Bercovici. When memory windows are limited, GPUs often evict old prompts to make room for new ones, forcing the system to reprocess information over and over. This recompute tax is particularly problematic for long-running agents that require deep context to remain effective over hours or days.

“This ability to bring in more specific silicon or more extended context to fit workloads, even retrospectively, continues to extend the usefulness of a CapEx investment that happened prior — that’s huge,” Kearney said. “We can engineer out obsolescence. We can now bring in an existing GPU-based system to market that’s ready for the next generation of workloads without having to redeploy and throw out the old to bring in a whole new system.”

The relationship is a pairing of memory-extension technology and AI factory infrastructure: Weka supplied the software approach with Augmented Memory Grid on NeuralMesh, while Firmus provided the environment and GPU resources to prove it out under real conditions. In testing, that combination let agents preserve context instead of constantly re-prefilling, unlocking more token output from the same GPUs and power envelope. Those gains matter because they compound over the long-running, memory-intensive workloads that increasingly define agentic AI, Bercovici explained.

“That compound benefit of having faster and faster agent turns — spread out over tens of thousands of agent turns — means drugs are being discovered faster. Cures are being discovered faster. Trades are being optimized better. There’s so many use cases right now where there’s massive business value,” he said. “The winners and losers will be determined by who seizes the moment right now.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the Nvidia GTC AI Conference & Expo:

(* Disclosure: Weka sponsored this segment of theCUBE. Neither Weka nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Weka and Firmus target AI’s memory bottleneck with 6.5x token gains in POC

AI memory optimization seeks to eliminate the recompute tax

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Weka and Firmus target AI’s memory bottleneck with 6.5x token gains in POC

AI memory optimization seeks to eliminate the recompute tax

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies