UPDATED 14:26 EDT / APRIL 03 2026

Weka's Betsy Chernoff and Solidigm's Ace Stryker talk to theCUBE about how AI inference is shifting from a compute bottleneck to a context memory wall at the Nvidia GTC AI Conference & Expo 2026.

AI’s context memory explosion hits the storage wall as NAND scarcity tightens its grip

Artificial intelligence inference is entering a new era defined not by compute alone, but by an escalating demand for context memory that traditional storage architectures were never designed to handle.

Inference didn’t hit a compute wall — it hit a context memory wall. As AI workloads evolve from single-shot prompts to multi-turn, agentic sessions with million-token context windows, the volume of key-value cache data is swelling into the petabytes, outpacing what GPU and DRAM memory tiers can absorb. The global NAND shortage has moved from a supply-chain talking point to a material operational risk for organizations with high AI workloads. The challenge is reshaping how storage companies approach AI factory design, according to Betsy Chernoff (pictured, left), principal AI and product marketing manager at WekaIO Inc.

“If you think about it from a level of where we started from even a year ago, people were just doing single shot prompts,” Chernoff said. “But as we’ve grown, you’ve seen things like multi-turn, concurrency, many users, many different rounds of conversations. Then, in addition to that, the context lengths themselves have grown. All of these have exponentially increased the amount of memory required for these systems.”

Chernoff and Ace Stryker (right), director of AI marketing and ecosystem at Solidigm, a trademark of SK hynix NAND Product Solutions Corp., spoke with theCUBE’s Gemma Allen at the Nvidia GTC AI Conference & Expo, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed how context memory is creating an entirely new storage tier in AI clusters and why the current NAND shortage makes efficiency more critical than ever. (* Disclosure below.)

Context memory creates new storage tier

At GTC 2026, Nvidia announced BlueField-4 STX, a modular reference architecture that inserts a dedicated context memory layer between GPUs and traditional storage. The first rack-scale implementation includes the new Nvidia CMX context memory storage platform, which expands GPU memory with a high-performance context layer for scalable inference and agentic systems. The announcement validates a direction both Weka and Solidigm have been building toward, according to Stryker.

“It feels like storage kind of got a promotion this year,” he said. “That third job is new dedicated nodes specifically for storing context memory or KV cache. That’s a completely new tier of storage in an AI cluster. And, frankly, the market was already under siege and feeling intense demand before that announcement.”

Weka has been preparing for this shift since it unveiled Augmented Memory Grid at GTC 2025. At this year’s show, Chernoff pointed to a production-grade proof of concept with Firmus that delivered up to 6x improvement in tokens per second, underscoring the real-world impact of persistent KV cache storage.

“When we talk about numbers for token throughput, and we talk about things like customers never having to recompute another token unnecessarily, all of this impacts your ROI,” Chernoff said. “And that includes our partnership with Solidigm as well, because we can’t do this without you guys.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the Nvidia GTC AI Conference & Expo:

(* Disclosure: Solidigm sponsored this segment of theCUBE. Neither Solidigm nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

AI’s context memory explosion hits the storage wall as NAND scarcity tightens its grip

Context memory creates new storage tier

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

AI’s context memory explosion hits the storage wall as NAND scarcity tightens its grip

Context memory creates new storage tier

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies