UPDATED 09:00 EST / MARCH 14 2024

AI

Databricks is the latest investor in red-hot generative AI startup Mistral AI 

Citing a commitment to open-source software for generative artificial intelligence development, Databricks Inc. said today it has made an unspecified strategic investment and partnership with large language model developer Mistral AI SAS.

Databricks said it will natively integrate Mistral’s open models with the Databricks Data Intelligence Platform, provide access in the Databricks Marketplace, include them in its Mosaic AI Playground and allow customers to fine-tune them with their own data.

Databricks said the deal makes it possible for enterprises to quickly apply Mistral AI’s models for generative AI applications while taking advantage of the security, data privacy and governance features of the Databricks platform.

Paris-based Mistral launched last May and raised $113 million in seed funding just four weeks later. Last December it raised an additional $415 million from a consortium led by elite venture capitalists Andreessen Horowitz LLC and Lightspeed Venture Partners. The deal valued the company at $2 billion at the time. Just over two weeks ago, it partnered with Microsoft Corp., which also invested $16.3 million.

Small and large models

The company’s product portfolio comprises two open-source language models — Mistral 7B and Mixtral 8x7B — with 7 billion and 46.7 billion parameters, respectively. Databricks called Mistral 7B “a small yet powerful dense transformer model [that’s] very efficient to serve, due to its relatively small size and model architecture that leverages grouped query attention and sliding window attention.”

GQA applies attention mechanisms to groups of queries or handles multiple queries simultaneously. Attention mechanisms are a basic part of many modern neural network architectures, allowing models to focus on the input data that are most relevant for a given task.

SWA is a variant of the attention mechanism, allowing models to focus on different parts of the input data by assigning weights to data segments. It’s used to process data sequences such as text or time series data efficiently.

Mixtral 8x7B is a “sparse mixture of experts” model that can understand English, French, Italian, German and Spanish. SPoE is an advanced machine learning architecture for scaling model capacity and optimizing performance by dividing the learning task among multiple specialized sub-models, known as “experts.” Databricks said Mixtral 8x7B outperforms Meta Platforms Inc.’s Llama 2 70B model on most benchmarks and is six times faster at inferencing.

Inclusion in the Databricks Marketplace lets customers investigate Mistral AI’s models and review ways to leverage them across the Databricks platform. When used in model serving, the Databricks Mosaic AI Foundation Model application program interfaces allow customers to query Mixtral 8x7B without creating and maintaining deployments and endpoints.

Customers can use Mosaic AI to adapt Mistral AI and other foundational models with their own datasets. Once a model is tuned or adapted, it can be quickly deployed to custom endpoints with Mosaic AI Model Serving to create a distinctive model.

Photo: Databricks

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU