MosaicML launches Inference to reduce enterprise generative AI deployment costs
Generative artificial intelligence provider MosaicML Inc. today announced the launch of MosaicML Inference for enterprises, which greatly reduces the costs for developers to scale and deploy AI models for their own applications.
MosaicML provides infrastructure for enterprise customers to easily and affordably deploy large scale large language models, such as underpins OpenAI LP’s ChatGPT AI chatbot, and diffusion models, which power the AI image generator Stable Diffusion. With these new inference capabilities, the company now offers a complete solution for generative AI training and deployment.
Chief Executive Naveen Rao told SiliconANGLE that the value proposition of the company for enterprise customers is twofold: keeping customer data private and reducing costs.
A lot of current AI interest leans toward custom AI models that depend on specific training sets for fields such as the ability to do coding, legal work, healthcare and others. Many of these industries depend on tight compliance and control of their data. Enterprise customers also want models that are highly capable of a particular task, something that generic models aren’t going to compete with.
“We provide tools that work in any cloud that enable customers to pretrain, fine-tune and serve models within their own private tenancy to enable and empower model ownership,” Rao said. “If a customer trains a model they can be rest assured that they own all the iterations of it, that model is theirs. We claim no ownership of that.”
Using MosaicML’s new Inference offering customers can deploy AI models for text completion and text embedding at four times lower cost than using OpenAI’s LLM, and 15 times cheaper for image generation than using OpenAI’s DALL-E 2.
With the launch of Inference, MosaicML is providing access to a number of curated open-source LLM models that will allow developers the best-of-breed that they can fine-tune to their needs, including Instructor-XL, Dolly and GPTNeoX. All of the models will gain the same optimization and affordability that allows them to run cheaply when deployed with Inference.
“These are open-source models, so by definition customers can customize them and fine-tune them with our tools and serve them with our tools,” Rao said. “We can serve them either publicly with a public facing URL or completely within the closed tenancy of a customer. So they can host private models for their own internal use if they want that. I think this is very important from a privacy standpoint.”
The company is also launching the MosaicML Foundational Model based on its own engineering expertise and efficiency knowledge that developers can build on. One of the benefits of the new model is that it has an extremely large “context window,” meaning that it can take a great deal of initial text fed into it.
That’s much larger than many other models on the market, at more than 64,000 tokens, or around 50,000 words. For comparison, GPT-4 maxes out at 32,768 tokens, or about 25,000 words. To show it off, Rao prompted the model with the contents of “The Great Gatsby” by F. Scott Fitzgerald and asked it to write an epilogue.
Organizations such as Replit, the browser-based integrated development environment, AI-based video search company Twelve Labs and Stanford University have built atop MosaicML’s platform in order to gain better control over their models and to customize domain-specific AIs for their needs.
Image: Pixabay
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU