On Barriers to Archival Audio Processing

cs.AI updates on arXiv.org 07月14日 12:08

本研究利用联合国教科文组织收藏的中世纪20世纪无线电录音，评估现代语言识别和说话人识别方法的鲁棒性，发现LID系统在处理第二语言和口音语方面日益成熟，但说话人嵌入仍易受通道、年龄和语言偏见的影响。

arXiv:2507.08768v1 Announce Type: cross Abstract: In this study, we leverage a unique UNESCO collection of mid-20th century radio recordings to probe the robustness of modern off-the-shelf language identification (LID) and speaker recognition (SR) methods, especially with respect to the impact of multilingual speakers and cross-age recordings. Our findings suggest that LID systems, such as Whisper, are increasingly adept at handling second-language and accented speech. However, speaker embeddings remain a fragile component of speech processing pipelines that is prone to biases related to the channel, age, and language. Issues which will need to be overcome should archives aim to employ SR methods for speaker indexing.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签