UPDATED 08:00 EDT / APRIL 22 2026

INFRA

Two new TPUs to power the next wave of AI training and inference at Google

Google LLC introduced two new custom silicon chips for artificial intelligence today at Google Cloud Next 2026, unveiling two distinct Tensor Processor Unit architectures built for training and inference: the eighth-generation TPU 8t and TPU 8i. 

The company said it designed the pair of chips to tackle the next generation of AI workloads by splitting across the differing demands of the market. AI depends on two primary tasks: building the models and running them. The rise of AI agents has driven demand for powerful AI models to act as the “brains” of reasoning machines, and equally powerful hardware to run them in the cloud. 

Where the previous chip, Ironwood TPU, was pitched as a single, massive flagship platform for the inference era, Google is now splitting its latest generation into separate architectures for large-scale training and high-concurrency reasoning to support the agentic era. 

Transforming training with TPU 8t

Google said it optimized TPU 8t as a workhorse for massive pretraining and embedding-heavy workloads by using 3D torus network topology, a technology the company said has proven to scale well at larger chip-networking sizes. Compared with the last generation, TPU 8t can network 9,600 chips in a single pod, versus 9,216 chips for Ironwood. 

TPU 8t uses SparseCore, a specialized accelerator that handles the irregular memory access common to large language model lookups, along with native four-bit floating point to overcome memory-bandwidth optimization problems. This allows training to happen faster and with better model compression, doubling throughput while maintaining accuracy with smaller memory footprints. 

By reducing the bits per parameter through a process called quantization, it becomes possible to run larger models on less powerful systems. This reduces energy use and allows larger models to fit on local hardware, take up less space and reach peak utilization. 

The company said it’s aiming to capture the training market at a much lower cost. Google claimed that TPU 8t delivers up to 2.7 times the performance-per-dollar improvement of the Ironwood TPU for large-scale training. 

Deploying models faster with TPU 8i

After models have been trained and prepared, they need to be put to work. That’s where inference comes into play, and where Google said the new TPU 8i chip shines, helping serve large models by optimizing post-training and high-concurrency reasoning using high-bandwidth memory and a specialized network topology. 

TPU 8i employs three times more static random-access memory than Ironwood, allowing it to host a larger key-value cache at inference time for LLMs, which significantly speeds up text generation. In addition, the company said, it built a reasoning system called the Collectives Acceleration Engine that processes the reduction and synchronization steps required during autoregressive decoding and “chain-of-thought.” 

To connect more chips together and weave them into a system where all chips can “see” each other, Google developed a custom network topology called Boardfly ICI. It can interconnect up to 1,152 chips, reducing network latency by shrinking the network diameter and the number of hops a data packet must take to cross the system. Google said it cuts the hops required for all-to-all communication — a necessity for mixture-of-experts LLM and reasoning model inference — by up to 50% overall. 

As for cost savings, the company said TPU 8i targets about an 80% performance-per-dollar improvement over Ironwood at low-latency targets, especially when serving extremely large MoE frontier models. 

Google added that both chips have proven to deliver twice the performance-per-watt boost over the previous generation. 

 Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.