MarkTechPost@AI 2024年11月11日
MOS-Bench: A Comprehensive Collection of Datasets for Training and Evaluating Subjective Speech Quality Assessment (SSQA) Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨主观语音质量评估(SSQA)中模型跨域泛化的挑战,介绍了当前SSQA方法及存在的局限性,提出MOS-Bench和SHEET来解决问题,包括多样的数据集、标准化工作流程等,使SSQA模型在不同领域的应用更有效。

🎤SSQA模型在跨域泛化上存在挑战,现有方法有局限性。

📚MOS-Bench包含多种训练和测试数据集,解决泛化问题。

🛠SHEET提供标准化工作流程,改进SSQA模型性能。

🌟MOS-Bench和SHEET使SSQA模型泛化能力提升。

A critical challenge in Subjective Speech Quality Assessment (SSQA) is enabling models to generalize across diverse and unseen speech domains. General SSQA models evaluate many models in performing poorly outside their training domain, mainly because such a model is often met with cross-domain difficulty in performance, however, due to the quite distinct data characteristics and scoring systems that exist among different types of SSQA tasks including TTS, VC, and speech enhancement, it is equally challenging. Effective generalization of SSQA is necessary to ensure alignment of human perception in these fields, however, many such models remain limited to the data on which they have been trained, thus constraining them in their real-world utility in applications such as automated speech evaluation for TTS and VC systems.

Current SSQA approaches include both reference-based and model-based methods. Reference-based models evaluate quality by comparing speech samples with a reference. On the other hand, model-based methods, especially DNNs, learn directly from human-annotated datasets. Model-based SSQA has a strong potential for capturing human perception much more precisely but, at the same time, shows some very significant limitations:

    Generalization Constraints: SSQA models often break down while tested over new out-of-domain data, resulting in inconsistent performance.Dataset Bias and Corpus Effect: The models then may become too adapted to the characteristics of the dataset with all its peculiarities, such as scoring biases or data types, which might then make them less effective across different datasets.Computational Complexity: The ensemble models increase the robustness of SSQA, but at the same time increase the computational cost compared to the baseline model, reducing it to impractical possibilities for real-time assessment in low-resource settings. The limitations mentioned above collectively hound the development of good SSQA models, with the ability to generalize well across different datasets and application contexts.

To address these limitations, researchers introduce MOS-Bench, a benchmark collection that includes seven training datasets and twelve test datasets across varied speech types, languages, and sampling frequencies. In addition to MOS-Bench, SHEET is a toolkit proposed that provides a standardized workflow for training, validation, and testing of SSQA models. Such a combination of MOS-Bench with SHEET allows SSQA models to be evaluated systematically, and those specifically entail the generalization ability of models. MOS-Bench incorporates the multi-dataset approach, combining data across different sources to expand the exposure of the model to varying conditions. Besides that, a best score difference/ratio new performance metric is also introduced to provide a holistic assessment of the SSQA model’s performance on these datasets. This doesn’t just provide a framework for consistent evaluation but generalizes better as the models are brought in agreement with the variability of the real world, which is a pretty notable contribution towards SSQA.

The MOS-Bench dataset collection consists of a wide range of datasets that have diversity in their sampling frequencies and listener labels to capture cross-domain variability in SSQA. Major datasets are:

Using MOS-Bench and SHEET, both make tremendous improvements in the generalization of SSQA across synthetic and non-synthetic test sets to the point where models learn to achieve high ranks and highly faithful quality predictions even for out-of-domain data. Models trained on MOS-Bench datasets, like PSTN and NISQA, are highly robust on synthetic test sets, and the need for synthetic-focused data as previously required for generalization becomes obsolete. Further, this incorporation of visualizations firmly established that models trained on MOS-Bench captured a wide variety of data distributions and reflected better adaptability and consistency. In this regard, the introduction of these results by MOS-Bench further establishes a reliable benchmark, allowing SSQA models to apply accurate performance across different domains with greater effectiveness and applicability of automated speech quality assessment.

This methodology, through MOS-Bench and SHEET, was to challenge the generalization problem of SSQA through several datasets as well as by introducing a new metric of evaluation. Providing a reduction in dataset-specific biases and cross-domain applicability, this methodology will move the frontiers of SSQA research to make it possible for models to generalize across applications effectively. An important advancement is that cross-domain datasets have been gathered by MOS-Bench and with its standardized toolkit. Rather excitingly, the resources are now available for researchers to develop SSQA models that are robust in the presence of a variety of speech types and the presence of real-world applications.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[AI Magazine/Report] Read Our Latest Report on ‘SMALL LANGUAGE MODELS

The post MOS-Bench: A Comprehensive Collection of Datasets for Training and Evaluating Subjective Speech Quality Assessment (SSQA) Models appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

主观语音质量评估 MOS-Bench SHEET 模型泛化
相关文章