UPDATED 15:04 EDT / SEPTEMBER 03 2024

AI

Elon Musk’s xAI launches ‘Colossus’ AI training system with 100,000 Nvidia chips

Elon Musk’s xAI Corp. has completed the assembly of an artificial intelligence training system that features 100,000 graphics cards.

Musk announced the milestone in a Monday post on X. The system, which xAI calls Colossus, came online over the weekend. 

The CEO launched xAI last year to compete with OpenAI, which he is currently suing for alleged breach of contract. The startup develops a line of large language models called Grok. In May, xAI raised $6 billion at a $24 billion valuation to finance its AI development efforts.

In this week’s X post, Musk described the newly launched Colossus as the “most powerful AI training system in the world.” That suggests the cluster is faster than the U.S. Energy Department’s Aurora system, which ranks as the world’s fastest AI supercomputer. In a May benchmark test, Aurora reached a top speed of 10.6 exaflops with 87% of its hardware active. 

Musk detailed that Colossus is equipped with 100,000 of Nvidia’s H100 graphics cards. The H100 debuted in 2022 and ranked as the chipmaker’s most powerful AI processor for more than a year. It can run language models up to 30 times faster than Nvidia’s previous-generation GPUs.

One contributor to the H100’s performance is its so-called Transformer Engine module. It’s a set of circuits optimized to run AI models based on the Transformer neural network architecture. The architecture underpins GPT-4o, Meta Platforms Inc.’s Llama 3.1 405B and many other frontier LLMs.

Musk detailed that xAI plans to double Colossus’ chip count to 200,000 within a few months. He said 50,000 of the new processors will be H200s. The H200 is an upgraded, significantly faster version of the H100 that Nvidia debuted last November.

AI models shuffle information between the logic circuits of the chip on which they run and its memory more often than many other workloads. As a result, accelerating the movement of data between the memory and logic modules can boost AI models’ performance. Nvidia’s H200 carries out such data transfers significantly faster than the H100.

The H200’s speed advantage is the result of two architectural upgrades. First, Nvidia swapped the HBM3 memory in the H100 with a newer type of RAM called HBM3e that facilitates faster data transfers to and from the chip’s logic circuits. Second, the company nearly doubled the onboard memory capacity to 141 gigabytes, which allows the H200 to keep more of an AI model’s data near its logic circuits.

Grok-2, xAI’s flagship LLM, was trained on 15,000 GPUs. Colossus’ 100,000 chips could potentially facilitate the development of language models with significantly better capabilities. The company reportedly hopes to release the successor to Grok-2 by year’s end.

Some of Colossus’ servers may be powered by chips that were originally earmarked for Tesla Inc. In January, CNBC reported that Musk had asked Nvidia to reroute 12,000 H100s worth more than $500 million from the carmaker to xAI and AI. The same month, Musk estimated that Tesla would spend between $3 billion and $4 billion on Nvidia hardware by year’s end. 

Image: Nvidia

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU