UPDATED 20:05 EST / NOVEMBER 25 2024

AI

Nvidia’s new music generation model Fugatto creates ‘never before heard sounds’

Nvidia Corp. today joined the likes of Meta Platforms Inc., OpenAI and Runway AI Inc. in releasing a generative artificial intelligence model that’s designed to create “new” music and audio from human language prompts.

According to the chipmaker, the new model, called Fugatto (for Foundational Generative Audio Transformer Opus 1), is uniquely able to modify human voices and create “novel sounds” that no other model can produce.

Nvidia, which is better known for making the powerful graphics processing units that power AI models, has not publicly released the model yet, o account of concerns around safety.

The company said Fugatto is different from other music and audio generation models because it has the ability to absorb and modify existing sounds. For instance, it can listen to a musical segment played on a piano, and transform that sound into notes sung by a human voice, or an alternative instrument like a violin. It can also take a human voice recording and alter the accent and mood expressed in the singing.

It’s perhaps deceiving to say that Fugatto’s sounds are entirely novel, because like all AI models, the outputs come from an algorithm that uses existing data sources to try to create something that satisfies the user’s prompted requests. Even so, Nvidia says Fugatto is able to “create soundscapes it’s never seen before” by overlaying two distinct audio effects to create something original.

In a video posted on YouTube, the company demonstrates how Fugatto can generate the sound of a train that slowly morphs into an orchestral performance, change happy voices into angry ones, and so on:

Such capabilities haven’t been seen before in an audio generation model, Nvidia claims. Furthermore, beyond basic prompt engineering, Fugatto comes with more fine-grained controls for users to edit the soundscapes they create.

Bryan Catanzaro, Nvidia’s vice president of applied deep learning research, told Reuters that generative AI has the potential to affect music production in the same way that electronic synthesizers did.

“If we think about synthetic audio over the past 50 years, music sounds different now because of computers,” he said. “Generative AI is going to bring new capabilities to music, to video games and to ordinary folks that want to create things.”

Nvidia isn’t the first company to try its hand at generative AI music creation. Last month, Meta debuted a new model called Movie Gen, which can create both video and soundscapes for the short movies it generates.

Nvidia didn’t say much about the data used to train Fugatto, other than it’s made up of “millions of audio samples” that come from open-source data. The company also confirmed that it doesn’t have any plans to make Fugatto available to AI developers just yet, similar to Meta, which also declined to do so. According to Catazaro, his team is still debating how it can release the model to the public safely.

“Any generative technology always carries some risks, because people might use that to generate things that we would prefer they don’t,” he said. “We need to be careful about that, which is why we don’t have immediate plans to release this.”

In addition to the safety concerns, Nvidia is no doubt mindful of potential copyright issues. In June, record labels representing plaintiffs including Sony Music Entertainment, Warner Music Group Inc. and Universal Music Group N.V., filed lawsuits against the generative AI music startups Suno Inc. and Uncharted Labs Inc., accusing them of “widespread infringement” of copyrighted sound recordings at an “almost unimaginable scale.”

The relationship between AI and Hollywood is just as tense. While some AI firms, like OpenAI, are trying to negotiate with Hollywood studios over the use of their data, the actress Scarlett Johansson has openly accused OpenAI of cloning her voice and has threatened to take legal action against the company.

Image: SiliconANGLE/Luma AI

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU