UPDATED 15:28 EDT / SEPTEMBER 10 2025

AI

‘Kicking our butts’: Rapid pace of AI development sparks an urgent push to build better infrastructure

Artificial intelligence innovation is moving at warp speed, but major tech industry players are sounding alarm bells that infrastructure is failing to keep pace with advancements in the field.

“AI is kicking our butts and teaching us that we know nothing about infrastructure,” Yee Jiun Song (pictured below), vice president of engineering at Meta Platforms Inc., said Tuesday at the AI Infra Summit in Santa Clara, California.

Zeroing in on the fundamental disconnect, Dion Harris, senior director of AI and HPC Infrastructure Solutions at Nvidia Corp., also noted at the conference that though new AI models are being introduced every week, the time frame for building out the infrastructure to support AI is currently measured in years.

“We have to get everyone else to be prepared for where we’re going,” Harris told the gathering. “The biggest challenge is making sure that everyone is ready to come with us. There is this misalignment of time scales. That in and of itself is a challenge.”

Nvidia previews faster inferencing processor

For its part, Nvidia Tuesday previewed an upcoming chip, the Rubin CPX, that is designed to provide 8 exaflops of computing capacity for AI inferencing. According to the chipmaker, the Rubin CPX will be able to optimize certain mechanisms for large language models three times faster than its current-generation silicon. It’s part of Nvidia’s philosophy that an investment of several million dollars in infrastructure can generate tens of millions in token revenue.

“The performance of the platform is the revenue of an AI factory,” Ian Buck, vice president of hyperscale and high-performance computing at Nvidia, said during a keynote appearance. “This is how we feel about inference.”

Yee Jiun Song, vice president of engineering at Meta Platforms, spoke at the AI Infra Summit.

Though Nvidia’s latest chip will help boost computing capacity for AI inferencing and specific LLM tasks, the scale of AI adoption is forcing model providers to invest hundreds of billions of dollars to build out new data center clusters. One of the more notable examples of this is the Prometheus supercluster under development by Meta. Scheduled to come online in 2026, the Ohio-based facility will be one of the first gigawatt data center clusters in the AI era.

“Meta is now only one of a few companies that are racing to build data centers at this scale,” Song said. “There never has been a more exciting time to be working in infrastructure.”

Prometheus is just a warm-up for future data center clusters in the planning stage. Meta has also announced Hyperion, a second data center cluster that is expected to require up to 5 gigawatts of power. Although Meta has not announced a date for Hyperion’s completion, one industry leader is already questioning whether clusters of this size will meet the global demand for AI processing.

“I don’t think that’s enough,” said Richard Ho, head of hardware at OpenAI. “It doesn’t appear clear to us that there is an end to the scaling model. It just appears to keep going. We’re trying to ring the bell and say, ‘It’s time to build.’”

AI agents drive need for the right stack

Increasing adoption of agents for enterprise tasks is one factor behind the urgency in building the infrastructure to support AI deployment. Large tech players such as Amazon Web Services Inc. are making major investments in agentic AI, fueling rapid advancement of what the technology can ultimately do.

Though one of the key use cases is currently “agent-assisted” application development, the technology is expected to progress rapidly toward “agent-driven” solutions, which will place further demands on infrastructure, according to Barry Cooks, vice president of compute abstractions at AWS.

“The expectation here is this will just continue to expand,” Cooks said during an appearance at the conference. “We’re in the midst of a huge change in the technical landscape in how we do our day-to-day work. It’s super-important that you have the right stack.”

Having the right stack will require new approaches in how systems are architected, a challenge that is being addressed in areas such as memory. For AI processors to function effectively, they need rapid access to data, driven by temporary storage such as dynamic random access memory or DRAM. If DRAM is slow, memory becomes a bottleneck.

Software-defined memory provider Kove Inc. has been working on this issue by essentially virtualizing server memory into a large pool to reduce data latency. On Tuesday, Kove announced benchmark results for AI inference engines Redis and Valkey that demonstrated a capability to run five times larger workloads faster than local DRAM.

“The big challenge that we have is traditional DRAM,” Kove CEO John Overton said during his keynote presentation. “GPUs are scaling, CPUs are scaling… memory has not. As long as we think about memory as stuck in the box, we’ll remain stuck in the box.”

More than 3,000 attendees participated in the AI Infra Summit in Silicon Valley this week.

Another big challenge is in the processors that keep getting bigger and bigger, ganging up hundreds or thousands of compute cores on a single piece of silicon. That’s creating another bottleneck — communications among all those cores.

“The next 1,000x leap in computing will be completely about interconnect,” Nick Harris, founder and CEO of Lightmatter Inc., which has raised $850 million for its silicon photonics technology, said Wednesday at the conference. “Chips are getting bigger. I/O at the ‘shoreline’ is not enough. It’s time for more horsepower. Not faster horses.”

Meantime, AI itself is becoming critical all the way down to the design of chips, too. “About half the chips built today are using AI; in three years, it will be 90%,” noted Charles Alpert, an AI fellow at chip design software firm Cadence Design Systems Inc., which for years has steadily been incorporating more AI into its tools. “The need to make designers more productive has never been higher.”

Leveraging open-source solutions

Companies are also increasingly turning to the open-source community for help in building out the infrastructure to support AI. Initiatives such as the Open Compute Project have fostered an ecosystem focused on redesigning hardware technology to support demands on compute infrastructure. Last year, Nvidia contributed portions of its Blackwell computing platform design to OCP.

Meta joined a number of high-profile firms in 2023 to found the Ultra Ethernet Consortium, a group dedicated to building an Ethernet-based communication stack architecture for high-performance networking. The group has characterized its mission as promoting open, interoperable standards to prevent vendor lock-in and released its first specification in June.

“What we need here are open standards, open weight models and open-source software,” said Meta’s Song. “I believe open standards are going to be critical in allowing us to manage complexity.”

Whether the buildout of gigawatt data centers, streamlined memory performance and open-source collaboration will enable the tech industry to close the gap between AI innovation and the infrastructure to support it remains to be seen. What is undeniable is that hardware engineering is drawing renewed attention, another element in the wave of transformation brought on by the rise of AI. “It feels like the early days of the internet, but it’s happening much faster,” Mark Lohmeyer, vice president and general manager of compute and AI infrastructure at Google Cloud, said during his Wednesday keynote.

“I’ve never seen hardware and infrastructure move more quickly,” said Song. “AI has made hardware engineering sexy again. Now hardware engineers get to have fun too.”

With reporting from Robert Hof

Image: Brian Penny/Pixabay; photos: Mark Albertson/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.