RepeaTTS: Towards Feature Discovery through Repeated Fine-Tuning

cs.AI updates on arXiv.org 07月14日 12:08

RepeaTTS: Towards Feature Discovery through Repeated Fine-Tuning

本文提出一种基于提示的语音合成模型，通过利用模型不可控的方差进行微调，同时解决控制受限和过度灵活的问题。通过主成分分析确定关键特征，提高模型的可控性。

arXiv:2507.08012v1 Announce Type: cross Abstract: A Prompt-based Text-To-Speech model allows a user to control different aspects of speech, such as speaking rate and perceived gender, through natural language instruction. Although user-friendly, such approaches are on one hand constrained: control is limited to acoustic features exposed to the model during training, and too flexible on the other: the same inputs yields uncontrollable variation that are reflected in the corpus statistics. We investigate a novel fine-tuning regime to address both of these issues at the same time by exploiting the uncontrollable variance of the model. Through principal component analysis of thousands of synthesised samples, we determine latent features that account for the highest proportion of the output variance and incorporate them as new labels for secondary fine-tuning. We evaluate the proposed methods on two models trained on an expressive Icelandic speech corpus, one with emotional disclosure and one without. In the case of the model without emotional disclosure, the method yields both continuous and discrete features that improve overall controllability of the model.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

语音合成模型优化微调不可控方差

相关文章

When More is More? When For an LLM is Enough?

正面硬刚OpenAI与谷歌？微软竟然偷偷自研出5000亿参数大模型

Data Augmentation and Optimized Architectures for Computer Vision with Fatih Porikli - #635

Neural Synthesis of Binaural Speech From Mono Audio with Alexander Richard - #514

Rethinking Model Size: Train Large, Then Compress with Joseph Gonzalez - #378

Alignment Lab AI Releases ‘Buzz Dataset’: The Largest Supervised Fine-Tuning Open-Sourced Dataset

Web-Instruct’s Instruction Tuning for MAmmoTH2 and MAmmoTH2-Plus Models: The Power of Web-Mined Data in Enhancing Large Language Models

AI News Weekly - Issue #386: Best AI Voice Generators 2024: What Scarlett Johansson's AI Dispute Taught Us - May 23rd 2024

Mistral-finetune: A Light-Weight Codebase that Enables Memory-Efficient and Performant Finetuning of Mistral’s Models

快来感受一下，大早上震撼到我了，这也太真实了。这个视频里的声音是推上一个人用开源 TTS https://github.com/2noise/ChatTTS 生成的。 B站这里还有个作者演示...