Researchers from Google DeepMind Introduce YouTube-SL-25: A Multilingual Corpus with Over 3,000 Hours of Sign Language Videos Covering 25+ Languages

Sign language research aims to advance technology that improves the understanding, translation, and interpretation of sign languages used by Deaf and hard-of-hearing communities globally. This field involves creating extensive datasets, developing sophisticated machine-learning models, and enhancing tools for translation and identification in various applications. By bridging communication gaps, this research supports better inclusion and accessibility for individuals who rely on sign language for daily communication.

A significant challenge in this field is more data for many sign languages. Unlike spoken languages, sign languages lack a standardized written form, complicating data collection and processing. This data bottleneck restricts the development of effective translation and interpretation tools, particularly for lesser-studied sign languages. The lack of substantial datasets hinders the progress of machine learning models tailored to these unique visuospatial languages.

Existing methods for processing sign languages include specialized datasets like YouTube-ASL for American Sign Language (ASL) and BOBSL for British Sign Language (BSL). While these datasets represent significant strides, they are often limited to individual languages and involve labor-intensive manual annotation processes. Automatic content-based annotations and skilled human filtering are common practices, yet these methods must be more easily scalable to accommodate the vast diversity of sign languages worldwide.

Google and Google DeepMind researchers introduced YouTube-SL-25, a comprehensive, open-domain multilingual corpus of sign language videos. This dataset is the largest and most diverse of its kind, comprising over 3,000 hours of video content and featuring over 3,000 unique signers across 25 sign languages. By providing well-aligned captions, YouTube-SL-25 significantly expands the resources for sign language translation and identification tasks.

The creation of YouTube-SL-25 involved a meticulous two-step process. First, automatic classifiers identified potential sign language videos from YouTube. Unlike previous datasets that required extensive manual review, this step was followed by a triage process where researchers audited and prioritized videos based on content quality and alignment. This approach enabled the efficient collection of 81,623 candidate videos, then refined to 39,197 high-quality videos totaling 3,207 hours of content. This dataset includes well-aligned captions covering 2.16 million captions with 104 million characters, setting a new standard for sign language datasets.

The dataset’s utility was demonstrated through benchmarks using a unified multilingual multitask model based on T5. The researchers extended this model to support multiple source and target languages, enhancing its sign language identification and translation capability. The results showed substantial benefits from multilingual transfer, with notable improvements in high-resource and low-resource sign languages. For instance, the model’s performance on benchmarks for ASL, Swiss German Sign Language, Swiss-French Sign Language, and Swiss Italian Sign Language demonstrated significant advancements, with BLEURT scores of 40.1 for ASL and 37.7 for Swiss German Sign Language.

The researchers provided detailed statistics to evaluate YouTube-SL-25’s performance. The dataset consists of 3,207 hours of video content across more than 25 sign languages, more than three times larger than YouTube-ASL, which had 984 hours. This scale allows for a more comprehensive representation of sign languages, including those with at least 15 hours of content, ensuring even low-resource languages are better supported. Including 3,072 unique channels highlights this dataset’s diversity of signers and contexts.

YouTube-SL-25 significantly impacts, offering a foundational resource for developing sign language technologies. This dataset addresses critical gaps in multilingual sign language data availability by enabling better pretraining for sign-to-text translation models and enhancing sign language identification tasks. The dataset’s open-domain nature allows for broad applications, from general sign language pretraining to medium-quality finetuning for specific tasks such as translation and caption alignment.

In conclusion, YouTube-SL-25 is a pivotal advancement in sign language research, addressing the longstanding data scarcity issue. With its extensive and diverse collection of sign language videos, the dataset facilitates the development of more effective translation and interpretation tools. This resource supports higher-quality machine learning models and fosters greater inclusivity for Deaf and hard-of-hearing communities worldwide, ensuring that technology continues to advance toward broader accessibility and understanding.

Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter.

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post Researchers from Google DeepMind Introduce YouTube-SL-25: A Multilingual Corpus with Over 3,000 Hours of Sign Language Videos Covering 25+ Languages appeared first on MarkTechPost.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签