Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages

cs.AI updates on arXiv.org 4小时前

Speech LLMs in Low-Resource Scenarios: Data Volume Requirements and the Impact of Pretraining on High-Resource Languages

本文探讨在低资源环境下利用语音LLM进行自动语音识别，通过SLAM-ASR框架和轻量级投影器连接语音编码器和LLM，评估训练数据量，并展示利用多语言投影器预训练的方法，以优化低资源语言和多语言环境的语音LLM性能。

arXiv:2508.05149v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated potential in handling spoken inputs for high-resource languages, reaching state-of-the-art performance in various tasks. However, their applicability is still less explored in low-resource settings. This work investigates the use of Speech LLMs for low-resource Automatic Speech Recognition using the SLAM-ASR framework, where a trainable lightweight projector connects a speech encoder and a LLM. Firstly, we assess training data volume requirements to match Whisper-only performance, re-emphasizing the challenges of limited data. Secondly, we show that leveraging mono- or multilingual projectors pretrained on high-resource languages reduces the impact of data scarcity, especially with small training sets. Using multilingual LLMs (EuroLLM, Salamandra) with whisper-large-v3-turbo, we evaluate performance on several public benchmarks, providing insights for future research on optimizing Speech LLMs for low-resource languages and multilinguality.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

语音LLM 低资源语言自动语音识别 SLAM-ASR 多语言

相关文章

VEON Pledges Support to Expand the Use of AI in Under-resourced Local Languages

Google’s Advanced AI Models: Gemini, PaLM, and Bard

TaskUs and Mavenoid Join Hands To Enable AI-Powered Product Support

Show HN: 塞壬--特色美人鱼 DSL，以 4 种语言和寓言为目标

Meet Tsinghua University’s GLM-4-9B-Chat-1M: An Outstanding Language Model Challenging GPT 4V, Gemini Pro (on vision), Mistral and Llama 3 8B

Meet Qwen2-72B: An Advanced AI Model With 72B Parameters, 128K Token Support, Multilingual Mastery, and SOTA Performance

Breaking the Language Barrier for All: Sparsely Gated MoE Models Bridge the Gap in Neural Machine Translation

LLM Spotlight: Falcon

plantidentify-plant-detector - Free plant identifier app

微软开源的一个文本编码器Glyph-ByT5-v2。支持使用十多种语言生成图片。还搭配了一个使用这个文本编码器的 SDXL 模型，可以直接生成中文海报和内容。从演示来...