UPDATED 11:55 EST / JUNE 23 2022

AI

Amazon demos Alexa reading ‘The Wizard of Oz’ in a dead relative’s voice

Amazon.com Inc.’s Alexa speakers are capable of myriad amazing voice capabilities backed up by artificial intelligence, and during the company’s re:MARS event in Las Vegas Wednesday, the company revealed an upcoming feature: the ability to mimic voices.

To demonstrate the feature, Rohit Prasad, Alexa’s senior vice-president and head scientist, played a clip of Alexa copycatting the voice of a child’s recently deceased grandmother reading “The Wizard of Oz.”

Prasad explained that the company has been seeking ways to make AI more empathetic and compassionate to human needs, in light of the “companionship relationship” the people have with Alexa. He specifically harkened to people losing people that they love.

“While AI can’t eliminate that pain of loss, it can definitely make the memories last,” he said.

Prasad explained that in the demonstrationm Alexa had learned to speak with the child’s grandmother’s voice with less than a minute of a high-quality voice recording of the grandmother’s voice versus hours of studio recordings required for other AI models.

During the presentation, he explained that it does have some technical challenges, however, because it required approaching it as a voice conversion task instead of a speech generation task done by the AI. Voice conversion is a method for generating synthetic speech from recorded speech through filtering as described in a white paper published by Amazon’s research team.

“We are unquestionably living in the golden era of AI, where our dreams and science fiction are becoming a reality,” Prasad added.

Amazon has given no details on how long it will take to develop and deploy this particular feature.

Security experts and AI ethics experts have long questioned the emergence and role of AI audio “deepfakes.” Although its use is somewhat rare in scam calls currently, the technology could have wide-ranging concerns if it were released as part of consumer-grade technology.

“The phone attacking implications of this tool are not good at all — this will likely be used for impersonation,” tweeted Rachel Tobac, chief executive of SocialProof Security. “I know this technology already exists, I’ve talked about this risk with other orgs tools. But the easier it is to use, the more it will be abused. And this sounds like it may be pretty user friendly.”

For example, most hacking doesn’t operate the way that it’s portrayed in cinema, with fingers tapping rapidly on keyboards and scripts flying across screens. Often, information for breaking into networks is extracted from people inside organizations who have special access through social engineering by pretending to be someone else who they should trust. If you can sound like someone’s boss asking for a password or access, it’s that much easier.

One recent example of an audio deepfake was used by fraudsters to steal $35 million from a United Arab Emirates company in the second known use of the technology to pull this sort of heist.

Image: Amazon

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU