UPDATED 15:04 EST / JANUARY 30 2025

AI

Mistral, Ai2 release new open-source LLMs

Mistral AI and the Allen Institute for AI today released new large language models that they claim are among the most advanced in their respective categories.

Mistral’s model is called Mistral Small 3. The new LLM from the Allen Institute for AI, or Ai2 as it’s commonly referred to, is called Tülu 3 405B. Both are available under an open-source license.

Mistral Small 3 includes 24 billion parameters, significantly less than the most advanced LLMs on the market. That makes it small enough to run on certain MacBooks when quantization is enabled. Quantization is a method of configuring LLMs that trades off some output quality for lower hardware usage.

In an internal evaluation, Mistral compared Mistral Small 3 against Llama 3.3 70B Instruct, an open-source LLM from Meta Platforms Inc. that has more than three times as many parameters. Mistral Small 3 delivered comparable output quality with significantly faster response times. In another test, the new LLM delivered higher output quality and lower latency than OpenAI’s GPT-4o mini.

Developers usually build LLMs by creating a base model, then refining its output quality using several different training methods. While building Mistral Small 3, the company developed the base model but skipped the subsequent refinement process. This allows users to carry out their own fine-tuning to align Mistral Small 3 with their project requirements.

The company sees developers applying the LLM to a range of tasks. According to Mistral, the model is useful for powering AI automation tools that require the ability to carry out tasks in external applications with low latency. The company says that several of its customers are also harnessing Mistral Small 3 for industry-specific use cases in segments such as robotics, financial services and manufacturing. 

“Mistral Small 3 is a pre-trained and instructed model catered to the ‘80%’ of generative AI tasks — those that require robust language and instruction following performance, with very low latency,” Mistral researchers wrote in a blog post

The debut of Mistral Small 3 today coincided with a new LLM release from A2I, a nonprofit AI institute. Tülu 3 405B is a customized version of the open-source Llama 3.1 405B model that Meta rolled out last June. In testing carried out by Ai2, Tülu 3 405B achieved better performance than the original Llama model across more than a half-dozen benchmarks.

The research group created the LLM using a development process that it first detailed in November. The workflow incorporates multiple LLM training methods, including one that Ai2 invented in-house.

The first step of the workflow is dedicated to supervised fine-tuning. This is a training method that involves providing an LLM with sample prompts and the corresponding answers, which helps it learn how it should respond to user queries. Next, Ai2 used another training technique called DPO to align Tülu 3 405B’s output with a set of user preferences.

Ai2 further honed the model’s capabilities using an internally developed training method called RLVR. It’s a variation of reinforcement learning, a widely used AI training technique. Ai2 says that RLVR makes AI models better at tasks such as solving math problems. 

Tülu 3 405B represents “the first application of fully open post-training recipes to the largest open-weight models,” Ai2 researchers wrote in a blog post. “With this release, we demonstrate the scalability and effectiveness of our post-training recipe applied at 405B parameter scale.”

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU