INFRA
INFRA
INFRA
Meta Platforms Inc. has reportedly agreed to a multibillion-dollar deal to rent Google Cloud’s custom artificial intelligence chips, according to a report in The Information today.
The report cited anonymous sources as saying that the chips, known as tensor processing units, will be used by Meta to train and run its next-generation large language models.
Large enterprises like Meta have been lavishing billions of dollars on advanced processors as they race to build up the infrastructure they need to power AI workloads, and Google has identified it as a significant opportunity for growth. The AI chip industry is currently dominated by Nvidia Corp.’s graphics processing units, which power the vast majority of AI applications in the world today, but Google’s TPUs are a compelling, lower-cost alternative.
In recent years, TPUs have become one of the most important growth engines for Google’s cloud infrastructure platform, and the company believes there’s a real opportunity to increase its market share. Google launched its most advanced TPUs, called Ironwood, in November. Customers have the option to scale up to 9,216 Ironwood TPUs into a single server pod that’s linked by high-speed interconnects to provide up to 9.6 terabits per second of bandwidth. The chips can be connected to a colossal 1.77 petabytes of shared high-bandwidth memory or HBM.
According to Google, the Ironwood chips can deliver more than 118 times the FP8 ExaFLOPS of its nearest competitor and four-times better performance for training and inference than Trillium, its previous generation TPU.
One of the first companies to adopt the new TPUs was Anthropic PBC, which touted the massive price-performance gains they provide, enabling it to serve massive Claude models at scale. Earlier, Google announced it had struck a deal with Anthropic worth “tens of billions of dollars” to give it access to 1 million TPUs through its cloud infrastructure platform.
Today’s deal is important for Meta as it looks to diversify its AI hardware away from Nvidia. The social media giant is one of Nvidia’s biggest customers, and earlier this month announced a multibillion-dollar deal to buy millions of the company’s next-generation Vera Rubin GPUs when they become available later this year.
But Meta doesn’t want to be tied to a single chip provider, and has also established a relationship with one of Nvidia’s biggest rivals in Advanced Micro Devices Inc. Earlier this week, the Facebook parent said it’s going to buy billions of dollars’ worth of AMD’s AI chips, including its newest Instinct MI400 series GPUs. In addition, Meta also received an option to buy a 10% stake in AMD, but that will only happen if its collaboration with the chipmaker meets agreed-upon performance milestones.
Meta’s deal with Google allows it to diversify its AI hardware supplier base even more, and there are good reasons for it to want to do so. Each type of AI processors has its advantages and disadvantages, so the social media giant will be able to pair the optimal silicon with each kind of AI workload. In addition, by playing off different chipmakers against one another, it can potentially secure favorable prices as it looks to build out its AI infrastructure.
Meanwhile, Google has ambitions to break Nvidia’s stranglehold on the AI chip market. It knows that enterprises want alternatives to Nvidia’s GPUs, and it’s determined to take advantage of that. Historically, its TPUs have only been available through the Google Cloud platform, meaning customers could only rent access to them.
But now Google wants to sell the chips directly to customers so they can run them in their own, private data centers. By doing so, it believes it can grab up to 10% of Nvidia’s data center revenue in the next few years.
The Information’s report said Meta is currently talking to Google about buying millions of TPUs for its own data centers. That would be a separate deal to today’s agreement, which only relates to cloud access. However, no agreement has been reached so far, the report said.
Less clear is what today’s deal means for Meta’s own in-house chips. The company last updated its custom Meta Training and Inference Accelerator or MTIA chips in 2024, and has reportedly been working with Taiwan Semiconductor Manufacturing Co. on a new design that was expected to launch sometime this year.
The next-generation MTIA chips are reportedly optimized for AI model training, which is an area where its first-generation processors struggled. However, it’s believed that the company has been experiencing “technical challenges” with the new chips that appear to have delayed their rollout.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.