Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

cs.AI updates on arXiv.org 前天 12:07

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla

研究对比了OpenAI的Whisper和Facebook的Wav2Vec-BERT在孟加拉语上的表现，Wav2Vec-BERT在多个评估指标上优于Whisper，为低资源语言语音识别系统开发提供启示。

arXiv:2507.01931v1 Announce Type: cross Abstract: In recent years, neural models trained on large multilingual text and speech datasets have shown great potential for supporting low-resource languages. This study investigates the performances of two state-of-the-art Automatic Speech Recognition (ASR) models, OpenAI's Whisper (Small & Large-V2) and Facebook's Wav2Vec-BERT on Bangla, a low-resource language. We have conducted experiments using two publicly available datasets: Mozilla Common Voice-17 and OpenSLR to evaluate model performances. Through systematic fine-tuning and hyperparameter optimization, including learning rate, epochs, and model checkpoint selection, we have compared the models based on Word Error Rate (WER), Character Error Rate (CER), Training Time, and Computational Efficiency. The Wav2Vec-BERT model outperformed Whisper across all key evaluation metrics, demonstrated superior performance while requiring fewer computational resources, and offered valuable insights to develop robust speech recognition systems in low-resource linguistic settings.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ASR模型低资源语言 Whisper Wav2Vec-BERT 性能比较

相关文章

VEON Pledges Support to Expand the Use of AI in Under-resourced Local Languages

玩具 "计算机上的云性能：从 Python 到 Rust

Breaking the Language Barrier for All: Sparsely Gated MoE Models Bridge the Gap in Neural Machine Translation

按简单 DML 的 CPU 指令比较 SQL 引擎

TransFusion: An Artificial Intelligence AI Framework To Boost a Large Language Model’s Multilingual Instruction-Following Information Extraction Capability

搞定语音识别，畅享高效处理 | 开源专题 No.78

半小时教你手搓AI视频通话，还有懒人版代码已开源

【IT之家评测室】OpenAI Whisper 使用体验：改变游戏规则的优雅语音转写工具

OpenAI掀小模型血战！苹果DCLM强势登场，碾压Mistral 7B全开源

MMS Zero-shot Released: A New AI Model to Transcribe the Speech of Almost Any Language Using Only a Small Amount of Unlabeled Text in the New Language