UPDATED 08:00 EST / NOVEMBER 06 2025

INFRA

Google unleashes Ironwood TPUs, new Axion instances as AI inference demand surges

Google LLC today announced it’s bringing its custom Ironwood chips online for cloud customers, unleashing tensor processing units that can scale up to 9,216 chips in a single pod to become the company’s most powerful AI accelerator architecture to date.

The new chips will be available to customers in the coming weeks, alongside new Arm-based Axion instances that promise up to twice the price-performance of current x86-based alternatives.

Google’s own frontier models, including Gemini, Veo and Imagen, are trained and deployed using TPUs, alongside equally sizable third-party models such as Anthropic PBC’s Claude. The company said the advent of AI agents, which require deep reasoning and advanced task management, is defining a new era where inference — the runtime intelligence of active models — has greatly increased the demand for AI compute.

Ironwood: Google’s AI powerhouse chip

The tech giant debuted Ironwood at Google Cloud Next 2025 in April and touted it as the most powerful TPU accelerator the company has ever built.

The next-generation architecture allows the company to scale up to 9,216 chips in a single server pod, linked together with inter-chip interconnect to provide up to 9.6 terabits per second of bandwidth. They can be connected to a colossal 1.77 petabytes of shared high-bandwidth or HBM memory.

Inter-chip interconnect, or ICI, acts as a “data highway” for chips, allowing them to think and act as a single AI accelerator brain. This is important because modern-day AI models require significant processing power, but they can’t fit on single chips and must be split up across hundreds or thousands of processors for parallel processing. Just like thousands of buildings crammed together in a city, the biggest problem this kind of system faces is traffic congestion. With more bandwidth, they can talk faster and with less delay.

HBM maintains the vast amount of real-time data AI models need to “remember” when training or processing queries from users. According to Google, the 1.77 petabytes of accessible data in a single, unified system is industry-leading. A single petabyte, or 1,000 terabytes, can represent around 40,000 high-definition Blu-ray movies or the text of millions of books. Making all of this accessible at once lets AI models respond instantly and intelligently with enormous amounts of knowledge.

The company said the new Ironwood-based pod architecture can deliver more than 118x more FP8 ExaFLOPS than the nearest competitor and 4x better performance for training and inference than Trillium, the previous generation of TPU.

Google included a new software layer on top of this advanced hardware co-designed to maximize Ironwood’s capabilities and memory. This includes a new Cluster Director capability in Google Kubernetes Engine, which enables advanced maintenance and topology awareness for better process scheduling.

For pretraining and post-training, the company announced enhancements to MaxText, a high-performance, open source large language model training framework for implementing reinforced learning techniques. Google also recently announced upgrades to vLLM to support inference switching between GPUs and TPUs, or a hybrid approach.

Anthropic, an early user of Ironwood, said that the chips provided impressive price-performance gains, allowing them to serve massive Claude models at scale. The leading AI model developer and provider announced late last month that it plans to access up to 1 million TPUs.

“Our customers, from Fortune 500 companies to startups, depend on Claude for their most critical work,” Anthropic’s Head of Compute James Bradbury said. “As demand continues to grow exponentially, we’re increasing our compute resources as we push the boundaries of AI research and product development.”

Axion expands with N4A and C4A metal instances

Google also announced the expansion of its Axion offerings with two new services in preview: N4A, its second-generation Axion virtual machines, and C4A metal, the company’s first Arm Ltd.-based bare-metal instances.

Axion is the company’s custom Arm-based central processing unit, designed to provide energy-efficient performance for general-purpose workloads. Google executives noted that the key to Axion’s design philosophy is its compatibility with the company’s workload-optimized infrastructure strategy. It uses Arm’s expertise in efficient CPU design to deliver significant performance and power use enhancements over traditional x86 processors.

“The Axion processors will have 30% higher performance than the fastest Arm processors available in the cloud today,” Mark Lohmeyer, vice president and general manager of AI and computing infrastructure at Google Cloud, said in an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio, during Google Cloud Next 2024. “They’ll have 50% higher performance than comparable x86 generation processors and 60% better energy efficiency than comparable x86-based instances.”

Axion provides greatly increased efficiency for modern general-purpose AI workflows and it can be coupled with the new specialized Ironwood accelerators to handle complex model serving. The new Axion instances are designed to provide operational backbone, such as high-volume data preparation, ingestion, analytics and running the virtual services that host intelligent applications.

N4A instances support up to 64 virtual CPUs and 512 gigabytes of DDR5 memory, with support for custom machine types. The new C4A metal delivers dedicated physical servers with up to 96 vCPUs and 768 gigabytes of memory. These two new services join the company’s previously announced C4A instances designed for consistent high performance.

Photo: Google

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.