UPDATED 11:30 EST / SEPTEMBER 19 2024

AI

Exclusive: Deepgram launches voice agent API that brings AI conversations to life

Deepgram Inc., the developer of a speech recognition engine that provides its service via application programming interfaces, today announced a powerful addition to its platform that enables natural-sounding conversations between humans and artificial intelligence agents at large scale in real time.

Using speech recognition and voice synthesis AI models, Deepgram’s voice agent systems ensure human-like responsiveness by making. In this release, the company is offering a system that packages all the pieces together under a single API.

All users have to do is set up a prompt and tell it what they want it to do such as tell it what they want it to do and the system manages the rest. In the past, using Deepgram developers would have had to connect together multiple parts of the system such as hooking in a large language model provider, the company’s voice-to-text speech recognition model and the speech synthesis model.

“We have a big shift that’s happening in the world right now,” Scott Stephenson, co-founder and chief executive of Deepgram, said in an interview with SiliconANGLE. “AI went mainstream over the last two years and voice AI has gone mainstream over the last two to six months. There’s a fundamental shift around the nature of how work is going to be done.”

Deepgram’s system allows users to listen to AI-synthesized speech and reply just like they’re talking to another human being. It’s also highly responsive on a conversational level, waits for appropriate times to break in and doesn’t interrupt the train of thought. It’s interruptible just like another person and doesn’t lose track of the conversation, allowing for smooth interactions.

Stephenson said that having voice intractability fits into any place where a device has a microphone and a speaker such as websites, phones, mobile, AI pendants and even the drive-through. One example of where AI agents are already in use across the industry is in call centers, where agents can pick up the phone quickly so that there’s little to no wait time for customers so that they can have questions answered or easy situations resolved.

“If you can service a customer’s need without having them to talk to a live agent that can save costs and that leads to a very satisfied customer,said Stephenson.If they can call in and they’re instantly connected with an AI agent and that agent can immediately ask questions, get information and get the conversation going, essentially filling out CRM information so that when a live agent is available now, they’re contextualized. Now they can complete their job in one minute.”

Developers can choose any LLM they want to connect the API with, including models from OpenAI, Anthropic PBC and Meta Platforms Inc. That makes it easy for them to choose what they want to run the underlying AI experience. Deepgram’s voice synthesis options include 12 different voices for customers to choose from.

“As we watch our children use their smartphones, it’s obvious that voice-to-voice will become a standard method of human and machine interactions,”said  Kevin Petrie, vice president of research at BARC US.Deepgram’s Voice Agent API addresses this market opportunity and makes customer service — already a top use case for gen AI — easier by converting text conversations to speech. Deepgram also broadens the market opportunity by integrating with a wide array of large language models.”

This year saw the launch of several LLMs that can deliver natural voice conversation capabilities. The biggest examples include OpenAI’s GPT-4o, Gemini Live from Google LLC and Tenyx Voice from Tenyx Inc.

Stephenson said that Deepgram doesn’t necessarily need to be voice-to-voice, it can also integrate easily with text-to-voice as well, allowing people to maintain privacy. For example, when they are wearing a headset on a crowded train and just want to type on their phone and listen to a reply on their headset. Not everyone will want to have one-sided conversations with their phones, he said, on the other hand, some people might dive into long-winded talks with AI models.

“The initial phase will be adding the voice option to text boxes,Stephenson said. Once people realize you can have a human-like interruptible talking experience with a voice agent, we think that people will use it a lot.”

Image: Microsoft Designer/SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU