AI
AI
AI
The next great AI race may be won not on raw compute, but on memory. With shortages exposing hard limits across data centers, AI memory optimization is taking center stage as a way to unlock more tokens, more efficiency and more value from the same infrastructure.
What was once a backstage infrastructure concern is now a primary competitive advantage, according to Val Bercovici (pictured, left), chief AI officer of WekaIO Inc. In a proof of concept conducted with Firmus Technologies Pty Ltd, Weka used storage economics to extend memory — preserving context instead of repeatedly reprocessing it on GPUs — to show how organizations could dramatically increase token output without increasing energy consumption.
“The results were what we expected, which was [that] you’re able to get — out of the same CapEx and OpEx, the same GPUs and energy cost — 6.5 times more, so 550% more, tokens,” Bercovici said. “It’s as if in a macro scenario, you just created five and a half new data centers out of thin air.”
Bercovici and Daniel Kearney (right), chief technology officer of Firmus Technologies Pty Ltd., spoke with theCUBE’s Gemma Allen at the Nvidia GTC AI Conference & Expo, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed AI memory optimization, the rise of agents and how a joint POC between Weka and Firmus showed more tokens using the same power budget. (* Disclosure below.)
A significant challenge in current AI infrastructure is the redundancy of “prefilling” data, according to Bercovici. When memory windows are limited, GPUs often evict old prompts to make room for new ones, forcing the system to reprocess information over and over. This recompute tax is particularly problematic for long-running agents that require deep context to remain effective over hours or days.
“This ability to bring in more specific silicon or more extended context to fit workloads, even retrospectively, continues to extend the usefulness of a CapEx investment that happened prior — that’s huge,” Kearney said. “We can engineer out obsolescence. We can now bring in an existing GPU-based system to market that’s ready for the next generation of workloads without having to redeploy and throw out the old to bring in a whole new system.”
The relationship is a pairing of memory-extension technology and AI factory infrastructure: Weka supplied the software approach with Augmented Memory Grid on NeuralMesh, while Firmus provided the environment and GPU resources to prove it out under real conditions. In testing, that combination let agents preserve context instead of constantly re-prefilling, unlocking more token output from the same GPUs and power envelope. Those gains matter because they compound over the long-running, memory-intensive workloads that increasingly define agentic AI, Bercovici explained.
“That compound benefit of having faster and faster agent turns — spread out over tens of thousands of agent turns — means drugs are being discovered faster. Cures are being discovered faster. Trades are being optimized better. There’s so many use cases right now where there’s massive business value,” he said. “The winners and losers will be determined by who seizes the moment right now.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the Nvidia GTC AI Conference & Expo:
(* Disclosure: Weka sponsored this segment of theCUBE. Neither Weka nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.