UPDATED 19:01 EDT / MARCH 29 2024

AI

OpenAI details Voice Engine speech generation AI

OpenAI today detailed Voice Engine, an artificial intelligence model that can generate synthetic speech based on user-provided audio samples.

The company developed the model in late 2022. OpenAI uses it to power ChatGPT features that enable customers to interact with the chatbot with voice commands and have it read text aloud. Additionally, the company made the model available to a limited number of partners last year through a pilot program. 

Voice Engine can analyze a sample of a user’s voice and then generate synthetic speech that closely resembles it. According to OpenAI, the AI requires only 15 seconds of audio to imitate the speaker. The company described Voice Engine as a “small model” in a blog post, which suggests it requires limited computing infrastructure to run.

OpenAI has not yet made Voice Engine publicly available. However, it opened access to the model for a limited number of partners in late 2023. OpenAI says those partners have successfully applied Voice Engine to tasks such as generating voiceovers for educational content and translating videos.

The company says the pilot program participants agreed to replicate individuals’ voices only with their permission. Additionally, customers are required to add disclosures to AI-generated speech specifying that it’s synthetic.

OpenAI detailed that it’s taking multiple steps to ensure the pilot participants comply with the usage terms. Its engineers configured Voice Engine to incorporate a watermark into synthetic speech files. Additionally, OpenAI has launched a “proactive monitoring” initiative to ensure the AI model is used responsibly.

“We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities,” OpenAI staffers wrote in the blog post detailing Voice Engine. “Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”

If OpenAI decides to make Voice Engine commercially available, the AI model could create more competition for the existing synthetic speech services on the market. Eleven Labs Inc., one of the startups competing in this segment, recently raised $80 million Andreessen Horowitz and other investors. The company says its user base includes over 40% of the Fortune 500.

TechCrunch reported today that Voice Engine is priced significantly lower than Eleven Labs’ service but offers fewer customization options. OpenAI could develop a new, more advanced version of Voice Engine to address those limitations before making it commercially available. The tradeoff is that a more capable version might take more hardware resources to run, which would likely increase its price.

OpenAI could potentially also open-source Voice Engine. In 2022, the year it developed the model, the company released the code for a second AI system called Whisper that can transcribe and translate speech. OpenAI detailed at the time that the latter model produces 50% fewer errors than earlier neural networks in the category. 

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.