Nvidia claims a new AI audio generator can make sounds never heard before

The Verge - Artificial Intelligences 2024年11月26日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

英伟达推出了一款名为Fugatto的AI音乐编辑器，能够根据文本和音频输入生成前所未闻的音乐、声音和语音。Fugatto可以根据奇特的提示创作歌曲，例如“创作一首萨克斯风嚎叫、然后是电子音乐和狗叫声的音乐”。它还能根据描述生成独特的声音效果，例如“低沉的低音脉冲与间歇的高音数字鸣叫声，就像一台巨大的有感知力的机器醒来时的声响”。此外，Fugatto还可以改变声音的音调和语调，例如改变口音或使声音变得愤怒或平静，以及编辑音乐，例如分离歌曲中的主唱、添加乐器或更换乐器等。虽然市场上已有其他AI音频工具，但Fugatto声称能够创造出全新的、前所未闻的声音。

🤔Fugatto能够根据文本和音频输入生成音乐、声音和语音，即使这些输入从未在训练数据中出现过，例如创作“萨克斯风嚎叫、然后是电子音乐和狗叫声的音乐”。

🎤Fugatto可以根据描述生成独特的声音效果，例如“低沉的低音脉冲与间歇的高音数字鸣叫声，就像一台巨大的有感知力的机器醒来时的声响”。

🗣️Fugatto可以改变声音的音调和语调，例如改变口音或使声音变得愤怒或平静，还可以编辑音乐，例如分离歌曲中的主唱、添加乐器或更换乐器等。

📚Fugatto的训练数据包括数百万个音频样本，其中包括BBC的声音效果库。

⏳目前尚不清楚Fugatto何时或是否会广泛可用。

Cath Virginia / The Verge | Photo from Getty Images

Nvidia says its new AI music editor can create “sounds never heard before” — like a trumpet that meows. The tool, called Fugatto, is capable of generating music, sounds, and speech using text and audio inputs it’s never been trained on.

As shown in this video embedded below, this allows Fugatto to put together songs based on wild prompts, like “Create a saxophone howling, barking then electronic music with dogs barking.”

Some other examples shared by the company include the ability to produce unique sound effects based on a description, like “Deep, rumbling bass pulses paired with intermittent, high-pitched digital chirps, like the sound of a massive sentient machine waking up.”

It can even transform the sound of someone’s voice, changing their accent or giving them a different tone, like angry or calm. There are ways to edit music, too, as Fugatto can isolate the vocals in a song, add instruments, and even change up a melody by swapping out a piano for an opera singer.

A paper released with the announcement shows the long list of all the datasets Nvidia says Fugatto was trained on, one of which includes a library of sound effects from the BBC.

There are already several other AI audio tools out there, including those from Stability AI, OpenAI, Google DeepMind, ElevenLabs, and Adobe, but not ones claiming to create completely new and unheard-of sounds. Some AI startups are even facing copyright lawsuits over their music creation tools, while a recent report found that Nvidia and other companies trained AI models on subtitles from thousands of YouTube videos.

To build Fugatto, Nvidia says researchers had to put together a dataset with millions of audio samples. They then created instructions “that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.” Nvidia doesn’t say when — or if — the tool will be widely available.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签