MarkTechPost@AI 2024年07月19日
Researchers from Google DeepMind Introduce YouTube-SL-25: A Multilingual Corpus with Over 3,000 Hours of Sign Language Videos Covering 25+ Languages
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Google DeepMind的研究人员发布了YouTube-SL-25,这是一个包含超过3000小时手语视频的多语言语料库,涵盖了25种手语。该数据集是同类中最大、最全面的数据集,包含来自25种手语的3000多名手语者的视频内容。YouTube-SL-25的创建涉及一个严格的两步过程,首先使用自动分类器从YouTube中识别出潜在的手语视频,然后由研究人员对视频进行审核和优先排序,以确保其质量和对齐。该数据集包括对齐良好的字幕,涵盖216万个字幕,共计1.04亿个字符,为手语数据集树立了新的标准。

🤔 **数据集规模和多样性:** YouTube-SL-25是同类中最大、最全面的数据集,包含超过3000小时的视频内容,涵盖了25种手语,包含来自25种手语的3000多名手语者的视频内容,为手语识别和翻译任务提供了丰富的资源。

🚀 **数据收集和处理:** 该数据集的创建涉及一个严格的两步过程,首先使用自动分类器从YouTube中识别出潜在的手语视频,然后由研究人员对视频进行审核和优先排序,以确保其质量和对齐。这种方法确保了数据集的高质量,并提高了其可扩展性。

📊 **数据集评估和应用:** 研究人员使用基于T5的统一多语言多任务模型对数据集进行了基准测试,结果表明,多语言迁移对高资源和低资源手语都有显著的益处。该数据集可以用于各种应用,包括手语识别、翻译、字幕对齐等。

🤝 **对聋哑人和听力障碍者的影响:** YouTube-SL-25的发布对聋哑人和听力障碍者具有重大意义,它为开发更有效的手语技术提供了基础资源,有助于提高聋哑人和听力障碍者的沟通和生活质量。

🌐 **未来的发展方向:** YouTube-SL-25的发布标志着手语研究取得了重大进展,为未来手语技术的发展提供了强大的驱动力。未来,研究人员将继续努力,开发更先进的手语识别和翻译技术,以更好地服务于聋哑人和听力障碍者。

Sign language research aims to advance technology that improves the understanding, translation, and interpretation of sign languages used by Deaf and hard-of-hearing communities globally. This field involves creating extensive datasets, developing sophisticated machine-learning models, and enhancing tools for translation and identification in various applications. By bridging communication gaps, this research supports better inclusion and accessibility for individuals who rely on sign language for daily communication.

A significant challenge in this field is more data for many sign languages. Unlike spoken languages, sign languages lack a standardized written form, complicating data collection and processing. This data bottleneck restricts the development of effective translation and interpretation tools, particularly for lesser-studied sign languages. The lack of substantial datasets hinders the progress of machine learning models tailored to these unique visuospatial languages.

Existing methods for processing sign languages include specialized datasets like YouTube-ASL for American Sign Language (ASL) and BOBSL for British Sign Language (BSL). While these datasets represent significant strides, they are often limited to individual languages and involve labor-intensive manual annotation processes. Automatic content-based annotations and skilled human filtering are common practices, yet these methods must be more easily scalable to accommodate the vast diversity of sign languages worldwide.

Google and Google DeepMind researchers introduced YouTube-SL-25, a comprehensive, open-domain multilingual corpus of sign language videos. This dataset is the largest and most diverse of its kind, comprising over 3,000 hours of video content and featuring over 3,000 unique signers across 25 sign languages. By providing well-aligned captions, YouTube-SL-25 significantly expands the resources for sign language translation and identification tasks.

The creation of YouTube-SL-25 involved a meticulous two-step process. First, automatic classifiers identified potential sign language videos from YouTube. Unlike previous datasets that required extensive manual review, this step was followed by a triage process where researchers audited and prioritized videos based on content quality and alignment. This approach enabled the efficient collection of 81,623 candidate videos, then refined to 39,197 high-quality videos totaling 3,207 hours of content. This dataset includes well-aligned captions covering 2.16 million captions with 104 million characters, setting a new standard for sign language datasets.

The dataset’s utility was demonstrated through benchmarks using a unified multilingual multitask model based on T5. The researchers extended this model to support multiple source and target languages, enhancing its sign language identification and translation capability. The results showed substantial benefits from multilingual transfer, with notable improvements in high-resource and low-resource sign languages. For instance, the model’s performance on benchmarks for ASL, Swiss German Sign Language, Swiss-French Sign Language, and Swiss Italian Sign Language demonstrated significant advancements, with BLEURT scores of 40.1 for ASL and 37.7 for Swiss German Sign Language.

The researchers provided detailed statistics to evaluate YouTube-SL-25’s performance. The dataset consists of 3,207 hours of video content across more than 25 sign languages, more than three times larger than YouTube-ASL, which had 984 hours. This scale allows for a more comprehensive representation of sign languages, including those with at least 15 hours of content, ensuring even low-resource languages are better supported. Including 3,072 unique channels highlights this dataset’s diversity of signers and contexts.

YouTube-SL-25 significantly impacts, offering a foundational resource for developing sign language technologies. This dataset addresses critical gaps in multilingual sign language data availability by enabling better pretraining for sign-to-text translation models and enhancing sign language identification tasks. The dataset’s open-domain nature allows for broad applications, from general sign language pretraining to medium-quality finetuning for specific tasks such as translation and caption alignment.

In conclusion, YouTube-SL-25 is a pivotal advancement in sign language research, addressing the longstanding data scarcity issue. With its extensive and diverse collection of sign language videos, the dataset facilitates the development of more effective translation and interpretation tools. This resource supports higher-quality machine learning models and fosters greater inclusivity for Deaf and hard-of-hearing communities worldwide, ensuring that technology continues to advance toward broader accessibility and understanding.


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post Researchers from Google DeepMind Introduce YouTube-SL-25: A Multilingual Corpus with Over 3,000 Hours of Sign Language Videos Covering 25+ Languages appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

手语 多语言 数据集 Google DeepMind YouTube-SL-25
相关文章