UPDATED 11:00 EDT / MARCH 29 2022

BIG DATA

Exclusive: Deepgram adds 23 new languages and dialects to voice recognition engine

Deepgram Inc., developer of a voice-recognition engine that it delivers as a service via application program interfaces, announced today that it has added 23 new language and dialect models to its original U.S. English model.

The company promotes its service as being the fastest and most accurate on the market, capable of recognizing and transcribing speech in less than one-third of a second with accuracy rates better than 90%. Founded in 2015, the company has raised more than $56 million. It says it has transcribed over 100 billion words from audio into text.

Deepgram wrote in a post on its blog today that the new suite is a “significant step toward delivering a global language experience that is on par with the success we’ve seen from our U.S. English model.” In addition to delivering via the cloud, Deepgram is also available for on-premises deployment in software containers with pre-built virtual machine images that can be deployed on most clouds.

Application programming interface integration enables developers to add voice recognition to their applications without requiring significant revisions. “Developers can embed the API into their software so that the integration is seamless,” said Chief Operating Officer Shadi Baqleh. “For example, you can talk into a microphone on a software app, and text appears on the user’s screen in less than one second.”

The 23 new language and dialect models are Dutch, versions of English from Australia, Great Britain, New Zealand and India, French, French Canadian, German, Hindi, Indonesian, Italian, Japanese, Korean, traditional and simplified Mandarin, Portuguese, Brazilian Portuguese, Russian, Spanish, Latin American Spanish, Swedish, Turkish and Ukrainian. The company is making several of those language models free to use for a limited time.

Real-world training

Deepgram uses transfer learning backed by a proprietary architecture and training with real-world audio datasets that it says yields accuracy rates of up to 98% in optimal conditions. It also has linguists on its staff to perform quality control checks. “We have tested our new languages against the big tech providers and beaten their models every time,” Baqleh said.

In addition to real-time voice recognition, the service provides batch transcription at the rate of one hour every three seconds. “A large call center can transcribe 10,000 hours of daily calls in less than 10 hours to find customers who may churn, ones they can upsell, or products with issues,” Baqleh said. Deepgram said its transcription speed is the same across all languages.

Other features include automatic punctuation and capitalization, the ability to identify up to 10 different speakers at one time, acoustic pattern-matching for search, profanity filtering and automatic redaction of sensitive data. Support for representational state transfer APIs enables the engine to be connected to any audio data source and to deliver results a wide variety of output options.

Deepgram offers a free tier and also standard and premium editions beginning at 1.25 cents per minute for batch transcription.

Image: Racool_studio/Freepik

A message from John Furrier, co-founder of SiliconANGLE:

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

Join Our Community 

Click here to join the free and open Startup Showcase event.

“TheCUBE is part of re:Invent, you know, you guys really are a part of the event and we really appreciate your coming here and I know people appreciate the content you create as well” – Andy Jassy

We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.

Click here to join the free and open Startup Showcase event.