UPDATED 09:00 EST / JUNE 14 2023

AI

OctoML debuts self-optimizing compute service for generative AI applications

Artificial intelligence optimization startup OctoML Inc. is shifting gears today with the launch of what it says is the industry’s first self-optimizing compute service for AI models.

The new service is called OctoAI, which the company explained is a new foundational infrastructure for developers looking to build and scale AI applications with the model of their choice, including open-source or custom-built models. It’s fully managed and provides developers with easy access to the cost-effective and scalable accelerated computing infrastructure required to create, customize and run AI models for very specific applications, the company said.

The launch of OctoAI marks a bit of a departure for OctoML, which first launched in 2019 with an AI optimization platform based on the open-source Apache TVM framework. OctoML was all about helping developers boost the performance of their models, but now it’s expanding its focus to running AI applications at a time when enterprises are racing to take advantage of the latest developments in generative AI.

Alongside OctoAI, the company is offering a library that contains the world’s fastest and most affordable generative AI models, accelerated by its optimization platform. It includes model templates such as Stable Diffusion 2.1, Dolly v2, Llama 65B, Whisper, FlanUL and Vicuna.

OctoML Chief Executive Luis Ceze explained the company’s shift, saying efficient compute is critical to making generative AI applications viable.

“Every company is scrambling to build AI-powered solutions, yet the process of model from development to production is incredibly complex and often requires costly, specialized talent and infrastructure,” he said. “OctoAI makes models work for business, not the other way around. We abstract away all the complexity so developers can focus on building great applications, instead of worrying about managing infrastructure.”

With OctoAI, companies can simply take the model template they wish, or design their own, then fine-tune it to meet very specific requirements, and integrate the finished model with their application development workflows. Customers can then balance costs by choosing from a variety of different hardware options to run their models, with a clear view of the price/performance tradeoff.

“It offers freedom because it allows users to go choose their model or bring their own custom models,” Ceze told SiliconANGLE during an interview on theCUBE (below). “Second, it offers efficiency because we optimize the models and we choose the right hardware and make sure it gets the right performance-efficiency tradeoffs. And we make it very easy for folks to get started. We offer a collection of super-optimized models.”

Cost-effective inference

OctoML exists in a very competitive machine learning deployment platform space, and today’s announcement helps it differentiate itself by providing users with a way to optimize AI models for cost-effective inference, said Andy Thurai, vice president and principal analyst of Constellation Research Inc.

Thurai explained that while there’s a lot of focus on the excessive cost of AI model training, very few people speak of the inference costs, or essentially the cost of keeping AI models up and running in production. According to Thurai, the inference costs can often become orders of magnitude higher than AI training costs, especially when applications gain millions of users.

“It becomes very inefficient to scale up AI operations at such cost,” Thurai said. “OctoML’s compute service provides an optimization structure on the cloud for companies to run their AI models efficiently. Because there is no need to change the code or retrain the models, this option is more appealing for running production version machine learning models.”

Thurai said one of the biggest advantages is that some of OctoML’s optimized models can run almost as efficiently on older Nvidia Corp. A10G graphics processing units as they can on the newer A100 GPUs. This is something that should work in the company’s favor, he said, since there is currently a shortage of A100 GPUs available because of such high demand.

“Given the surplus availability of the A10G, companies can use these to run their AI applications at a similar performance to what the A100 GPU provides, rather than waiting for access to these resources,” Thurai continued. “Customers also have the option to fine-tune commonly available models on their own datasets. OctoML’s main competitor here is Hugging Face, and it’s claim to be five-times cheaper and 33% faster is very compelling.”

OctoML said early adopters of OctoAI have already used generative AI models such as Stable Diffusion and FlanUL to build a huge variety of applications.

“They share two things in common,” Ceze said. “First, model customization is fundamental to delivering unique experiences for their customers, which is how they differentiate. Second, they require the ability to scale their services quickly, leveraging flexible hardware options from Nvidia GPUs to specialized silicon like AWS Inferentia 2.”

Where OctoML helps developers using these “cocktails” of AI models, including open-source ones, is making them easier to manage. “It’s been really hard to get started with that until now, and to manage it and to run it all.” Jon Turow, a partner with Madrona Venture Group, an investor in OctoML, told SiliconANGLE. “What’s exciting about what Octo and Luis and the team are doing is they’re going to be able to give, for the first time, the kind of ease of use with open-source AI that you’re getting with the closed models.”

Ceze and Turow spoke recently with John Furrier, host of the SiliconANGLE Media video studio the CUBE. Here’s the full interview:

Image: OctoML

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU