UPDATED 19:01 EST / MARCH 29 2024

AI

OpenAI details Voice Engine speech generation AI

OpenAI today detailed Voice Engine, an artificial intelligence model that can generate synthetic speech based on user-provided audio samples.

The company developed the model in late 2022. OpenAI uses it to power ChatGPT features that enable customers to interact with the chatbot with voice commands and have it read text aloud. Additionally, the company made the model available to a limited number of partners last year through a pilot program. 

Voice Engine can analyze a sample of a user’s voice and then generate synthetic speech that closely resembles it. According to OpenAI, the AI requires only 15 seconds of audio to imitate the speaker. The company described Voice Engine as a “small model” in a blog post, which suggests it requires limited computing infrastructure to run.

OpenAI has not yet made Voice Engine publicly available. However, it opened access to the model for a limited number of partners in late 2023. OpenAI says those partners have successfully applied Voice Engine to tasks such as generating voiceovers for educational content and translating videos.

The company says the pilot program participants agreed to replicate individuals’ voices only with their permission. Additionally, customers are required to add disclosures to AI-generated speech specifying that it’s synthetic.

OpenAI detailed that it’s taking multiple steps to ensure the pilot participants comply with the usage terms. Its engineers configured Voice Engine to incorporate a watermark into synthetic speech files. Additionally, OpenAI has launched a “proactive monitoring” initiative to ensure the AI model is used responsibly.

“We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities,” OpenAI staffers wrote in the blog post detailing Voice Engine. “Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”

If OpenAI decides to make Voice Engine commercially available, the AI model could create more competition for the existing synthetic speech services on the market. Eleven Labs Inc., one of the startups competing in this segment, recently raised $80 million Andreessen Horowitz and other investors. The company says its user base includes over 40% of the Fortune 500.

TechCrunch reported today that Voice Engine is priced significantly lower than Eleven Labs’ service but offers fewer customization options. OpenAI could develop a new, more advanced version of Voice Engine to address those limitations before making it commercially available. The tradeoff is that a more capable version might take more hardware resources to run, which would likely increase its price.

OpenAI could potentially also open-source Voice Engine. In 2022, the year it developed the model, the company released the code for a second AI system called Whisper that can transcribe and translate speech. OpenAI detailed at the time that the latter model produces 50% fewer errors than earlier neural networks in the category. 

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU