A Framework for Synthetic Audio Conversations Generation using Large Language Models

cs.AI updates on arXiv.org 07月08日 13:53

本文介绍了ConversaSynth，一种利用大型语言模型生成合成对话音频的框架。该框架通过创建多主题的多样且连贯的文本对话，再利用语音合成技术转化为音频，有效提升了音频标记、分类和多说话者语音识别模型的训练与评估。

arXiv:2409.00946v3 Announce Type: replace-cross Abstract: In this paper, we introduce ConversaSynth, a framework designed to generate synthetic conversation audio using large language models (LLMs) with multiple persona settings. The framework first creates diverse and coherent text-based dialogues across various topics, which are then converted into audio using text-to-speech (TTS) systems. Our experiments demonstrate that ConversaSynth effectively generates highquality synthetic audio datasets, which can significantly enhance the training and evaluation of models for audio tagging, audio classification, and multi-speaker speech recognition. The results indicate that the synthetic datasets generated by ConversaSynth exhibit substantial diversity and realism, making them suitable for developing robust, adaptable audio-based AI systems.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签