UPDATED 12:20 EDT / MARCH 26 2026

AI

Mistral releases an open-weights ‘speaking’ AI model with Voxtral TTS

The Paris-based Mistral AI SAS today announced the release of Voxtral TTS, its first text-to-speech artificial intelligence model aimed at unseating the best-known and most powerful voice models on the market.

The new model is very lightweight, with 4 billion parameters, which makes it a size that can be run on most consumer hardware, including modern laptops, mid-range desktop graphics processing units and even some high-end mobile devices at high compression. The company is releasing it with open weights, which means that it’s an open-source model.

Mistral said the highlights of the model make it highly adaptable for new voices and it has a very low delay time for new audio, producing a quick response.

Although the model is small, it still creates powerful voices. The company said it not only recites but interprets text accurately, a must for any text-to-speech generation. It’s capable of producing emotionality and tonality fitting to oration, for example neutral, happy, sarcastic and so on. The objective is to capture how a person would naturally speak.

Even in English, the voice capability includes American, English and French dialects.

Competition against proprietary large language speech models is intense, so Mistral compared it with ElevenLabs Inc., the incumbent to beat. For voice agents, the company said human evaluations show Voxtral TTS shows naturalness compared to ElevenLabs Flash v2.5 and also performs at parity to the larger v3 model in more lifelike interactions.

Although the English market is quite large, Mistral is a French company; as a result, Voxtral TTS is a multilingual model. The company said it was trained on a large speech dataset and was built for global applications. It supports state-of-the-art performance in nine languages: English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi and Arabic.

The model can be trained to adapt and voice-clone with a reference of as little as three seconds. It can capture not just the voice but nuances such as subtle accent, inflections, intonations and even casual vocal fillers such as “ums,” “ahs,” other interruptions, pauses and repetitions natural to the speaker’s rhythm and cadence.

This level of fidelity, in addition to the small size and open weights, means that Mistral is betting that enterprise companies will want to own their own voice models and run them on their own systems locally. It also provides the foundation for more powerful text-to-speech AI models that provide even more texture, customization and power in the future that Mistral can provide for enterprise environments.

Users can get started with the model today in Mistral Studio or Le Chat. The open model is available for developers with several reference voices and can be downloaded from Hugging Face under a Creative Commons license.

Images: SiliconANGLE/Mistral AI Le Chat; Mistral AI

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.