UPDATED 11:35 EST / DECEMBER 03 2024

INFRA

AWS unveils next-gen Trainium3 custom AI chips and cloud Trainium2 instances

Amazon Web Services Inc. today unveiled its next-generation custom chip for high-efficiency artificial intelligence training and delivery with Trainium3 and announced the general availability of AWS Trainium2-powered cloud instances that will put high performance AI capabilities in the hands of customers.

Amazon revealed Trainium3 today during AWS re:Invent, the company’s annual conference on cloud computing, saying that it will be the first AWS chip made with a three-nanometer process, becoming a new standard for power efficiency and density. The chips will provide two times more performance and 40% better energy efficiency than the current Trainium2 chips.

The Trainium family of custom silicon by AWS allow enterprise businesses to keep up with the rapidly increasing size of AI foundation models and large language models behind today’s generative AI applications. As they increase in size, they require increased processing power to deal with massive datasets for training and deployment. The largest and most advanced models can scale from hundreds of billions to trillions of data points.

To assist with the training and deployment of these growing models, Amazon announced the general availability of Elastic Compute Cloud Trn2 instances featuring 16 Trainium2 chips that will provide 20.8 petaflops of compute in peak performance. The company said these Trn2 instances offer 30% more compute and 25% more high bandwidth memory than the next most powerful EC2 instances for the same cost.

In testing, Meta Platforms Inc.’s Llama 405B, a model with 405 billion parameters, or data points that it be customized with, delivered more than three times higher token-generation throughput using Trn2 EC2 instances on Amazon Bedrock compared to similar offerings by rival major cloud providers. Token generation is done when a large language model is deployed and providing text answers to questions, the higher the throughput the faster it can produce answers, summarize documents and generate responses.

For LLMs that scale even bigger in size, Amazon is releasing a second tier Trianium2 instance called Trn2 UltraServers that will allow customers to go beyond the limits of s single Trn2 server. In an interview with SiliconANGLE, Gadi Hutt, senior director business development at Annapurna Labs, the subsidiary of AWS that designs and builds the company’s custom chips, said this will allow customers to reduce training time, get to market faster and improve model accuracy.

“Next, we break that [16-chip] boundary and provide 64 chips in the UltraServer and that is for extremely large models,” said Hutt. “So if you have a 7 billion-parameter model, that used to be large, but not anymore — or an extremely large model let’s call it 200 billion or 400 billion. You want to serve at the fastest latency possible. So, you use the UltraServer.”

Hutt explained the new Trn2 UltraServers use NeuronLink interconnect to hook up four Trn2 servers into one giant server. This allows customers to scale up workloads across all four, providing 64 Trainium2 chips at once for AI model training or inference. An UltraServer can deliver up to 83.2 peak petaflops of compute, this can provide enough compute power to serve trillion-parameter models in production.

Amazon said UltraServers built on the upcoming Trainium3 are expected to deliver four times the performance of Trn2 UltraServers, which will allow for superior real-time performance for training and deploying extremely-large AI models. The first Trainium3-based instances are expected to be available in 2025.

In the same vein of ultra-large models, AWS is working with its partner Anthropic PBC to build an EC2 UltraCluster of Trn2 UltraServers named Project Ranier. Hutt said that this cluster of UltraServers will comprise hundreds of thousands of Trainium2 chips interconnected together with third-generation, low-latency networking. The intent is to provide enough scaled distributed compute power to train the company’s next-generation large language model.

“This is by far the largest cluster we’ve built,” Hutt said. “Compared to what Anthropic had until today, it is five times larger.”

Anthropic introduced its most advanced LLM, Claude 3.5 Sonnet, in October. The company has continuously released upgraded models with enhanced capabilities on a regular drumbeat.

Engineering, training and deploying these models takes a significant amount of computing power for the company to stay competitive against its rivals such as OpenAI and Google LLC. Amazon recently announced plans to double its investment in Anthropic to $8 billion and tightened the company’s partnership with the AI model provider.

Image: Amazon

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU