UPDATED 22:54 EDT / JUNE 10 2019

Facebook AI researchers have cloned Bill Gates’ voice with uncanny accuracy

Researchers at Facebook Inc. have managed to clone Microsoft Corp. Bill Gates’ voice so well you won’t be able to tell it’s machine-generated speech.

Sean Vasquez and Mike Lewis at Facebook AI Research said Monday they’ve been working on trying to mimic human speech for some time, something that’s clearly difficult given that even the arguably most well-known speaking machine of Stephen Hawking still sounded very much like a machine.

It seems now progress has been made, and if you listen to the clone of Gates (pictured), you’ll agree. It sounds like him, and you’d be hard-pressed to tell the difference from the machine and his real voice.

Here the machine says, as Gates, “The glow deepened in the eyes of the sweet girl.” Here it clones the words, “Write a fond note to the friend you cherish.” What’s perhaps uncanny about the last sentence is how the machine gets right Gates’ unmistakable rising inflection when saying “cherish.”

The technology used to do this, called MelNet, can be used to copy human intonation. Gates’ voice and many others’ voices have so far been reproduced with such perfection. The cloned audio was taken from various Ted Talks, said Vasquez and Lewis.

The researchers said that up until recently, the reason why text-to-speech software hasn’t worked very well is that it used waveform recordings. These show how sounds change in scale in a matter for seconds. If you hear that word “cherish” uttered by Gates, the tone shifts quite a lot. The deep-learning machine when trying to mimic a person must guess all these small shifts, no easy task.

Vasquez and Lewis said they managed to clone voices much more accurately by using something called a spectrogram to train the machine.

“The temporal axis of a spectrogram is orders of magnitude more compact than that of a waveform, meaning dependencies that span tens of thousands of timesteps in waveforms only span hundreds of timesteps in spectrograms,” said the researchers. “This enables our spectrogram models to generate unconditional speech and music samples with consistency over multiple seconds.”

There are some setbacks, though. The team said that though they can reproduce a sentence almost perfectly, it won’t be able to replicate “intonation to indicate changes in topic or mood as stories evolve over tens of seconds or minutes.” Still, when it comes to human and computer interaction, the team said, this technology could be transformative in terms of conversations that involve only short phrases.

Photo: Gisela Giardino/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Facebook AI researchers have cloned Bill Gates’ voice with uncanny accuracy

Photo: Gisela Giardino/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Securing the AI Factory with Dell Technologies and Intel 2026

Atlassian Team 2026

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

Facebook AI researchers have cloned Bill Gates’ voice with uncanny accuracy

Photo: Gisela Giardino/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Securing the AI Factory with Dell Technologies and Intel 2026

Atlassian Team 2026

Appian World 2026

Google Cloud Next 2026

Phi Moments @ Next 2026

Cookies