Beyond Modality Limitations: A Unified MLLM Approach to Automated Speaking Assessment with Effective Curriculum Learning

cs.AI updates on arXiv.org 13小时前

Beyond Modality Limitations: A Unified MLLM Approach to Automated Speaking Assessment with Effective Curriculum Learning

本文首度系统地研究了多模态大型语言模型在全面口语评估中的应用，提出了一种新的训练方法，显著提升了评估性能。

arXiv:2508.12591v1 Announce Type: cross Abstract: Traditional Automated Speaking Assessment (ASA) systems exhibit inherent modality limitations: text-based approaches lack acoustic information while audio-based methods miss semantic context. Multimodal Large Language Models (MLLM) offer unprecedented opportunities for comprehensive ASA by simultaneously processing audio and text within unified frameworks. This paper presents a very first systematic study of MLLM for comprehensive ASA, demonstrating the superior performance of MLLM across the aspects of content and language use . However, assessment on the delivery aspect reveals unique challenges, which is deemed to require specialized training strategies. We thus propose Speech-First Multimodal Training (SFMT), leveraging a curriculum learning principle to establish more robust modeling foundations of speech before cross-modal synergetic fusion. A series of experiments on a benchmark dataset show MLLM-based systems can elevate the holistic assessment performance from a PCC value of 0.783 to 0.846. In particular, SFMT excels in the evaluation of the delivery aspect, achieving an absolute accuracy improvement of 4% over conventional training approaches, which also paves a new avenue for ASA.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

MLLM 口语评估多模态

相关文章

Greg 录制了新的ChatGPT实时语音和多模态的演示。最后ChatGPT还即兴创作了一首短歌,歌词涵盖了房间的装饰风格、人物的穿着特点、期间发生的趣味插曲等。真的这...

和@歸藏一起视频会议看完 OpenAI 的发布，讨论了一会，背脊发凉… 1️⃣ 没想到卷推理卷到了这种程度? 现实交流场景下300ms 左右的体验奇点真没想到就这样被...

OpenAI 很鸡贼，提前一天开发布会，让 Google I/O 的气势弱了很多。再加上 Ilya 的官宣离职又分走了不少流量。果然今早一早起来，媒体的报道和用户的关注相比昨...

This AI newsletter is all you need #99

中信建投：OpenAI发布GPT-4o，AGI向前一步

XGen-MM: A Series of Large Multimodal Models (LMMS) Developed by Salesforce Al Research

周鸿祎：留给谷歌的时间不多了，建议把所有产品都开源

Lumina-T2X: A Unified AI Framework for Text to Any Modality Generation

AmbientGPT: An Open-Source and Multimodal MacOS Foundation Model GUI

昆仑万维与北京联通达成战略合作