UPDATED 10:00 EST / JUNE 22 2023

AI

MosaicML releases open-source 30B parameter AI model for enterprise applications

MosaicML Inc., a generative artificial intelligence startup that provides infrastructure for companies to run machine learning services, announced the open-source availability of MPT-30B, the company’s most advanced MosaicML Pretrained Model foundation series for commercially licensed AI applications.

The company said that the MPT-30B model surpasses the quality of the original GPT-3 released by OpenAI LP in 2020. And because it’s built on a sixth of the number of parameters at 30 billion – GPT-3 has 175 billion parameters – it can be trained more quickly and it can be deployed on local hardware more easily.

This means that starting today, developers and enterprises can fine-tune and deploy their own generative AI models at GPT-3 grade quality in-house at orders of magnitude lower compute than the original. It puts generative AI applications within reach of more businesses without giving up data privacy or security.

MPT-30B was also trained on longer sequences than GPT-3, up to 8,000 tokens, and it also can handle much longer data context windows in practice, which makes it better for data-heavy enterprise applications. It also puts it ahead of many of the current models in its weight class in the market, such as the popular LLaMA family from Meta Platforms Inc. and the recent Falcon model from the Technology Innovation Institute that was trained on 2,000 tokens.

The news follows the launch of MosaicML’s MPT-7B foundation models in early May, including Base, Instruct, Chat and StoryWriter. Since then, these models have been downloaded more than 3 million times.

MosaicML Chief Scientist Jonathan Frankle told SiliconANGLE said that building the new model was a learning experience for the company about scaling AI models. “Scaling up is hard,” Frankle said. “I think it’s underestimated how difficult it is to scale. We’ve seen some of the other folks in the open-source space run into challenges. We’ve seen that across the board. Certainly, we’ve had our challenges.”

As for the model, he said, it’s especially strong for coding, although it wasn’t specifically designed for that. Developers will also find that it works better as a chatbot and an instruction set for inference when it comes to generating summarization and answering questions.

Why 30 billion parameters? Frankle explained that it’s all about making sure it can run easily on local hardware while still maintaining or surpassing GPT-3 quality. “So, 30 billion tends to be the magic number for GPT-3 quality,” he said. “Obviously GPT-3 was trained on a bigger model, this is fewer parameters, but we’ve learned a lot since then about the right balance. Also, if you play your cards right, it fits on a single A100 for inference.”

In this case, Frankle is referring to the Nvidia A100 high-performance graphics processing unit, which is used to perform the calculations needed to provide the generative AI workloads. Anything above the 30 billion limit requires breaking the model up into parallel segments or other tricks of the trade in order to get it to fit. Other models, such as the Falcon 40B model will not fit onto an A100, showing that there’s a threshold where an expensive multi-GPU setup is required.

Developers can use the open-source MPT-30B foundation model today and it is available for download from the HuggingFace Hub. Using it, developers can fine-tune the model with their own data on their own hardware or deploy it for inference on their own infrastructure.

MosaicML also offers its own AI infrastructure-as-a-service for inference using an MPT-3B-Instruct managed endpoint, which means that developers don’t need to worry about their own GPUs. At a price point of $0.002 per 1,000 tokens, it’s 10 times cheaper than OpenAI’s DaVinci.

Frankle added that open-source model releases will continue to be in the future of MosaicML. “At the end of the day, it all comes back to what we do as a business, as a researcher, open source is very important to me, values perspective,” he said. “In some sense, this is a demo track. We’re training models of this scale for customers.”

Scaling up from 7 billion to 30 billion is only the first step, Frankle said calling it the “MPT 1 process.” It’s proof that the research team can build a big model and do that repeatedly, and what comes next is an even bigger, higher-quality models.

“What comes after that is building bigger models and our MPT 2 process, which we’re now working on,” Frankle said. “It will reduce costs and build better models from a bunch of different perspectives, better optimization, better architecture and the plan is to keep marching forward.”

Image: Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU