UPDATED 18:16 EDT / APRIL 02 2024

OctoAI debuts OctoStack platform for powering AI inference environments

OctoAI Inc. today introduced OctoStack, a software platform that enables companies to host artificial intelligence models on their in-house infrastructure.

Many large language models are delivered through a cloud-based application programming interface. Such models are hosted on their respective developers’ infrastructure, which requires customers to send their data to that infrastructure for processing. Hosting a neural network on in-house hardware removes the need to share data with an external provider, which can simplify cybersecurity and regulatory compliance for enterprises.

OctoAI says that its new OctoStack platform makes it easier to host AI models on a company’s internal infrastructure. The platform can run on on-premises hardware, the major public clouds and AI-optimized infrastructure-as-a-service platforms such as CoreWeave. OctoStack likewise works with multiple AI accelerators from Nvidia Corp. and Advanced Micro Devices Inc., as well as the AWS Inferentia chips available in Amazon Web Services.

The platform is partly based on an open-source technology called Apache TVM that was developed by OctoAI’s founders. It’s a compiler framework that eases the task of optimizing AI models to run on multiple chips.

After creating the initial version of a neural network, developers can optimize it in various ways to boost performance. One technique, operator fusion, makes it possible to compress some of the calculations an AI performs into fewer, more hardware-efficient computations. Another technique, quantization, reduces the amount of data a neural network must crunch to produce accurate results.

Such optimizations are not always portable across different hardware types. As a result, an AI model optimized for one graphics card doesn’t necessarily run as efficiently on a processor from a different chipmaker. TVM, the open-source technology OctoStack incorporates, can automate the process of optimizing a neural network for different chips.

OctoAI says its platform can help customers run their AI infrastructure more efficiently. According to the company, an OctoStack-powered inference environment provides four times higher graphics card utilization than an AI cluster built from scratch. The company is also promising a 50% reduction in operating costs.

“Enabling customers to build viable and future-proof Generative AI applications requires more than just affordable cloud inference,” said OctoAI co-founder and Chief Executive Officer Luis Ceze. “Hardware portability, model onboarding, fine-tuning, optimization, load balancing — these are full-stack problems that require full-stack solutions.”

OctoStack supports popular open-source LLMs such as Meta Platforms Inc.’s Llama and the Mixtral mixture-of-experts model developed by startup Mistral AI. Companies can also run internally developed neural networks. According to OctoAI, OctoStack makes it possible to update the AI models in an inference environment over time without making major changes to the applications they support.

Image: OctoAI

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

OctoAI debuts OctoStack platform for powering AI inference environments

Image: OctoAI

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Supermicro Open Storage Summit 2025

World of Workato 2025

Future of Data Platforms Summit

VMware Explore 2025

CrowdStrike Fal.Con 2025

RECENT CUBE EVENTS

Supermicro Open Storage Summit 2025

Black Hat USA 2025

theCUBE + NYSE Wired: AI + Cloud Leaders Media Week 2025

AWS Summit NYC 2025

AWS Mid-Year Leadership Summit 2025

OctoAI debuts OctoStack platform for powering AI inference environments

Image: OctoAI

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Supermicro Open Storage Summit 2025

World of Workato 2025

Future of Data Platforms Summit

VMware Explore 2025

CrowdStrike Fal.Con 2025

Supermicro Open Storage Summit 2025

Black Hat USA 2025

theCUBE + NYSE Wired: AI + Cloud Leaders Media Week 2025

AWS Summit NYC 2025

AWS Mid-Year Leadership Summit 2025

Cookies