

Amazon.com Inc. today debuted a new foundation model, Amazon Nova Sonic, that is optimized for voice interactions such as customer support calls.
The company says it’s using components of the model to power Alexa+. Introduced in February, the latest iteration of Amazon’s voice assistant can automatically perform actions such as ordering takeout and booking flights. When necessary, it’s capable of interacting with third-party applications to carry out those tasks.
Usually, processing speech involves three steps. First, an application has to use a speech recognition model to transcribe the audio. It then feeds the transcript into a large language model, which generates a text-based response, and a third algorithm subsequently turns the outputted text into speech.
Using three different neural networks complicates software development. It can also slow artificial intelligence applications’ performance. Data takes time to move from one neural network to another, which adds latency to prompt responses.
Amazon says that its new Nova Sonic model simplifies the workflow. It allows companies to replace the three different neural networks usually needed to process speech with one, which eases development. Amazon is also promising performance benefits: Nova Sonic starts responding to user input in 1.09 seconds on average. According to Amazon, that makes it faster than competing products from OpenAI and Google LLC.
Nova Sonic adapts the synthetic speech it generates based on user behavior. According to Amazon, the model can switch tones in the middle of a conversation and ask follow-up questions if more information is needed to fulfill a request.
When the information that Nova Sonic requires isn’t provided during a conversation, it can retrieve data from external systems. The model could, for example, check an inventory management application to determine if a product requested by a customer is in stock. Nova Sonic can also perform tasks such as placing orders in the applications with which it’s integrated.
In the background, the model generates a transcript of the speech it processes. That transcript can be streamed to other artificial intelligence models through an application programming interface. An electronics maker, for example, could send contact center transcripts to an AI application that measures customer sentiment.
On launch, Nova Sonic supports English and multiple accents. Additional languages and accents will be added in the future. Developers can access Nova Sonic through Amazon Web Services Inc.’s Amazon Bedrock service, which provides access to hosted foundation models from the company and third-party providers.
“We are releasing a new foundation model in Amazon Bedrock that makes it simpler for developers to build voice-powered applications that can complete tasks for customers with higher accuracy, while being more natural, and engaging,” said Rohit Prasad, Amazon’s senior vice president of artificial general intelligence.
Nova Sonic is rolling out a day after AWS released an update to its Amazon Nova Reel video generator. The latter AI now enables users to generate clips up to two minutes in length. A week earlier, AWS introduced a research preview of Amazon Nova Act, a model that can automatically perform actions in a browser on the user’s behalf.
THANK YOU