MIT Technology Review » Artificial Intelligence 01月16日
Meta’s new AI model can translate speech from more than 100 languages
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Meta发布了名为SeamlessM4T的新AI模型,可翻译101种不同语言的语音,朝着实时同声传译迈进。该模型采用更直接的翻译方式,避免了传统的多步骤流程,减少了错误和误译的发生。SeamlessM4T在文本翻译方面比现有模型精确度高23%,并能翻译成36种其他语言。其关键在于并行数据挖掘技术,通过匹配视频或音频中的声音和字幕,学习不同语言间的对应关系。虽然仍需人工翻译进行文化背景的校对,但该模型为未来即时跨语言交流提供了可能。

🗣️SeamlessM4T模型直接进行语音到语音的翻译,避免了传统的多步骤流程,提高了效率并减少了错误。

🌐该模型支持101种语言的语音翻译,并能将翻译结果输出为36种其他语言,超越了现有模型的能力。

⚙️并行数据挖掘技术是该模型的关键,它通过匹配不同语言的音频和字幕,学习翻译规则,从而扩大了训练数据。

🧐尽管SeamlessM4T在技术上取得了进步,但人工翻译在处理文化差异和确保翻译准确性方面仍然至关重要。

Meta has released a new AI model that can translate speech from 101 different languages. It represents a step toward real-time, simultaneous interpretation, where words are translated as soon as they come out of someone’s mouth. 

Typically, translation models for speech use a multistep approach. First they translate speech into text. Then they translate that text into text in another language. Finally, that translated text is turned into speech in the new language. This method can be inefficient, and at each step, errors and mistranslations can creep in. But Meta’s new model, called SeamlessM4T, enables more direct translation from speech in one language to speech in another. The model is described in a paper published today in Nature

Seamless can translate text with 23% more accuracy than the top existing models. And although another model, Google’s AudioPaLM, can technically translate more languages—113 of them, versus 101 for Seamless—it can translate them only into English. SeamlessM4T can translate into 36 other languages.

The key is a process called parallel data mining, which finds instances when the sound in a video or audio matches a subtitle in another language from crawled web data. The model learned to associate those sounds in one language with the matching pieces of text in another. This opened up a whole new trove of examples of translations for their model.

“Meta has done a great job having a breadth of different things they support, like text-to-speech, speech-to-text, even automatic speech recognition,” says Chetan Jaiswal, a professor of computer science at Quinnipiac University, who was not involved in the research. “The mere number of languages they are supporting is a tremendous achievement.”

Human translators are still a vital part of the translation process, the researchers say in the paper, because they can grapple with diverse cultural contexts and make sure the same meaning is conveyed from one language into another. This step is important, says Lynne Bowker of the University of Ottawa’s School of Translation & Interpretation, who didn’t work on Seamless. “Languages are a reflection of cultures, and cultures have their own ways of knowing things,” she says. 

When it comes to applications like medicine or law, machine translations need to be thoroughly checked by a human, she says. If not, misunderstandings can result. For example, when Google Translate was used to translate public health information about the covid-19 vaccine from the Virginia Department of Health in January 2021, it translated “not mandatory” in English into “not necessary” in Spanish, changing the whole meaning of the message.

AI models have much more examples to train on in some languages than others. This means current speech-to-speech models may be able to translate a language like Greek into English, where there may be many examples, but cannot translate from Swahili to Greek. The team behind Seamless aimed to solve this problem by pre-training the model on millions of hours of spoken audio in different languages. This pre-training allowed it to recognize general patterns in language, making it easier to process less widely spoken languages because it already had some baseline for what spoken language is supposed to sound like.  

The system is open-source, which the researchers hope will encourage others to build upon its current capabilities. But some are skeptical of how useful it may be compared with available alternatives. “Google’s translation model is not as open-source as Seamless, but it’s way more responsive and fast, and it doesn’t cost anything as an academic,” says Jaiswal.

The most exciting thing about Meta’s system is that it points to the possibility of instant interpretation across languages in the not-too-distant future—like the Babel fish in Douglas Adams’ cult novel The Hitchhiker’s Guide to the Galaxy. SeamlessM4T is faster than existing models but still not instant. That said, Meta claims to have a newer version of Seamless that’s as fast as human interpreters. 

“While having this kind of delayed translation is okay and useful, I think simultaneous translation will be even more useful,” says Kenny Zhu, director of the Arlington Computational Linguistics Lab at the University of Texas at Arlington, who is not affiliated with the new research.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Meta AI翻译 SeamlessM4T 同声传译 并行数据挖掘
相关文章