UPDATED 09:00 EDT / JUNE 22 2022

Cerebras Systems sets AI training record with its ‘wafer-scale’ chip

Cerebras Systems Inc. today announced that its CS-2 hardware system is now capable of training neural networks with up to 20 billion parameters, which the startup says represents an industry record.

The CS-2 is a data center appliance powered by Cerebras’ WSE-2 chip. The WSE-2 (pictured) features a so-called wafer-scale architecture optimized for AI workloads and includes 2.6 trillion transistors, or 2.55 trillion more than the most advanced graphics processing unit on the market. Cerebras Systems says that chip is the world’s fastest AI processor.

The startup’s CS-2 system combines a single WSE-2 chip with cooling equipment, networking hardware and other supporting components. The system makes it easier to deploy the WSE-2 by removing the need for customers to install the necessary supporting components on their own.

One of the most advanced neural networks in the open-source ecosystem is a natural language processing model called GPT-NeoX. It features 20 billion parameters, the settings that determine how an AI model processes data. Training such a complex neural network network involves so many computations that the task can usually only be performed using a large number of GPUs.

Cerebras said today that its customers can train GPT-NeoX using just a single CS-2 appliance. According to the startup, it’s now the only market player that offers the ability to train a neural network with up to 20 billion parameters using one device. The CS-2 can also be used to train other advanced natural language processing models such as GPT-3XL and GPT-J, which feature 1.3 billion and 6 billion parameters, respectively.

The CS-2’s ability to perform a task that usually requires multiple GPUs is facilitated by its onboard WSE-2 chip. According to Cerebras, the WSE-2’s 2.6 trillion transistors are organized into 850,000 cores, which is 100 times higher than the number of cores found in the fastest GPU on the market. There’s also 40 gigabytes of onboard memory for storing the neural network that the chip is running and the data being processed.

The chip’s memory pool can store many types of neural networks. However, some AI models with upwards of a billion parameters require more than the 40 gigabytes of onboard capacity that the WSE-2 includes.

To accommodate such advanced neural networks, Cerebras has developed a technology called Weight Streaming. The technology makes it possible to attach up to 2.4 petabytes of external memory to the WSE-2, which enables the chip to run more complex AI models than could be supported otherwise. According to the startup, Weight Streaming in theory makes it possible to run AI models with upwards of one trillion parameters.

“In NLP, bigger models are shown to be more accurate,” said Cerebras co-founder and Chief Executive Officer Andrew Feldman. “But traditionally, only a very select few companies had the resources and expertise necessary to do the painstaking work of breaking up these large models and spreading them across hundreds or thousands of graphics processing units.”

Cerebras says its CS-2 system’s newly detailed ability to train AI models with up to 20 billion parameters will ease customers’ machine learning projects in several ways.

Training a neural network with 20 billion parameters typically requires using a large number of servers equipped with GPUs. If the servers are being deployed on-premises, setting them up can take a significant amount of work. Provisioning GPU-equipped compute instances in the public cloud is faster, but the training process can still take months, according to Cerebras.

Training an AI model with multiple GPUs is time-consuming because the task often requires a significant amount of custom code. Developers must implement software that distributes the computations involved in the AI training process across the GPUs. Moreover, the software must be extensively optimized to ensure that computations are carried out efficiently.

According to Cerebras, its CS-2 system eases developers’ work because it contains a single WSE-2 chip instead of multiple processors. Developers don’t have to write complex software to distribute computing tasks across multiple chips. The result, according to the startup, is that AI models can be trained more efficiently.

The startup has managed to incorporate 2.6 trillion transistors into a single chip by developing an interconnect technology that makes it possible to combine 84 smaller chips into a single system. It says its chips have been adopted by more than a dozen customers including enterprises and research institutions.

Cerebras has raised $720 million from investors and received a $4 billion valuation after its most recent funding round last November.

Photo: Cerebras Systems

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Cerebras Systems sets AI training record with its ‘wafer-scale’ chip

Photo: Cerebras Systems

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Cerebras Systems sets AI training record with its ‘wafer-scale’ chip

Photo: Cerebras Systems

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

SUSECON 2026

Oracle Data Deep Dive NYC 2026

Cookies