Cerebras releases seven large language models for generative AI, trained on its specialized hardware
Artificial intelligence chipmaker Cerebras Systems Inc. today announced it has trained and now released seven GPT-based large language models for generative AI, making them available to the wider research community.
The new LLMs are notable as they are the first to be trained using CS-2 systems in the Cerebras Andromeda AI supercluster, which are powered by the Cerebras WSE-2 chip that is specifically designed to run AI software. In other words, they’re among the first LLMs to be trained without relying on graphics processing unit-based systems. Cerebras said it’s sharing not only the models, but also the weights and training recipe that was used, via a standard Apache 2.0 license.
The Sunnyvale, California-based startup is backed by more than $720 million in venture funding. The company sells a chip called the WSE-2 that’s specifically designed to run AI software. It’s the WSE-2 that sits at the heart of the Cerebras Andromeda supercomputer optimized to run AI applications, boasting more than 13.5 million processor cores.
Cerebras said that the rapid growth of generative AI, led by OpenAI LP’s ChatGPT, has sparked a race among AI hardware makers to create more powerful and specialized chips for the task. But although many companies have promised alternatives to Nvidia Corp.’s GPUs, until now none has been able to demonstrate the ability to train large-scale models and open source those efforts with permissive licenses. On the contrary, Cerebras says that competitive pressures have resulted in less willingness to make LLMs publicly available, meaning they remain largely inaccessible.
That’s what Cerebras is hoping to address with today’s release. It’s open-sourcing a series of seven GPT models with 111 million, 256 million, 590 million, 1.3 billion, 2.7 billion, 6.7 billion and 13 billion parameters, making them available on GitHub and Hugging Face. Training these models would normally take many months, but Cerebras said the speed of the Cerebras CS-2 systems in Andromeda, combined with a unique weight streaming architecture, helped reduce this time to just a few weeks.
Cerebras co-founder and Chief Software Architect Sean Lie said today’s release is important because very few organizations have the ability to train truly large-scale models by themselves. “Releasing seven fully trained GPT models into the open-source community shows just how efficient clusters of Cerebras CS-2 systems can be and how they can rapidly solve the largest scale AI problems – problems that typically require hundreds or thousands of GPUs,” he said.
The company said its release marks the first time that an entire suite of GPT models trained using state-of-the-art efficiency techniques has been made publicly available. It explained that they have a lower training time, lower training cost and use less energy than any existing LLMs available.
Because the Cerebras LLMs are open source, they can be used for both research and commercial purposes, the company explained. They also provide several benefits, with their training weights resulting in an extremely accurate pre-trained model that can be fine-tuned for different tasks with modest amounts of custom data, enabling anyone to build a powerful, generative AI application with minimal effort.
The release also demonstrates the effectiveness of what Cerebras calls a “simple, data-parallel only approach to training.” In traditional LLM training on GPUs, a complex amalgam of pipeline, model and data parallelism techniques is required. However, Cerebras’ weight streaming architecture shows how it can be done with a simpler, data-parallel only model that requires no code or model modification to scale to very large models.
Analyst Karl Freund of Cambrian AI said today’s release not only demonstrates the capability of Cerebras’ CS-2 systems as a premier AI training platform, but also elevates the company into the upper echelon of AI practitioners.
“There are a handful of companies in the world capable of deploying end-to-end AI training infrastructure and training the largest of LLMs to state-of-the-art accuracy,” Freund said. “Cerebras must now be counted among them. Moreover, by releasing these models into the open-source community with the permissive Apache 2.0 license, Cerebras shows commitment to ensuring that AI remains an open technology that broadly benefits humanity.”
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One-click below supports our mission to provide free, deep and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.