cs.AI updates on arXiv.org 07月18日 12:13
Voxtral
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了Voxtral Mini和Voxtral Small两种多模态音频聊天模型,它们在音频基准测试中取得了最先进的性能,同时保持强大的文本处理能力。Voxtral Small在性能上优于多个闭源模型,且可本地运行。模型支持长达40分钟的音频文件和长对话处理,并贡献了三个语音理解模型评估基准。

arXiv:2507.13264v1 Announce Type: cross Abstract: We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while preserving strong text capabilities. Voxtral Small outperforms a number of closed-source models, while being small enough to run locally. A 32K context window enables the model to handle audio files up to 40 minutes in duration and long multi-turn conversations. We also contribute three benchmarks for evaluating speech understanding models on knowledge and trivia. Both Voxtral models are released under Apache 2.0 license.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

多模态模型 音频聊天 Voxtral 性能提升 本地运行
相关文章