MarkTechPost@AI 02月11日
Zyphra Introduces the Beta Release of Zonos: A Highly Expressive TTS Model with High Fidelity Voice Cloning
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Zyphra推出了Zonos-v0.1的测试版,这是一个具有高保真语音克隆功能的实时TTS模型。该版本包括一个16亿参数的transformer模型和一个类似规模的混合模型,两者均在Apache 2.0许可下提供。Zonos-v0.1模型经过大约20万小时的语音数据训练,涵盖中性和表达性语音模式。它支持多种语言,包括英语、中文、日语、法语、西班牙语和德语,并允许用户通过提供短的说话者样本和文本输入来生成语音,从而实现语音克隆。该模型还提供对语速、音高变化、音频质量和情绪的控制,使其成为内容创作和辅助技术的灵活工具。

🗣️Zonos-v0.1提供零样本TTS语音克隆功能,用户只需提供一段简短的说话人样本和文本输入,即可合成具有该说话人特征的语音,大大降低了语音合成的数据需求。

🌐该系统支持包括英语、日语、中文、法语和德语在内的多种语言,通过集成多语言数据集,Zonos-v0.1 扩展了其应用范围,使其能够服务于更广泛的国际用户群体。

🎶Zonos-v0.1 允许用户精细调整音高、频率范围和情感基调等参数,从而创建更具表现力和自然感的语音输出,满足用户在不同应用场景下的个性化需求。

🚀该模型在RTX 4090上以接近两倍实时速度运行,经过优化,适用于实时应用,Zonos-v0.1 的高效性能使其能够快速响应用户的语音合成请求,提供流畅的用户体验。

Text-to-speech (TTS) technology has made significant strides in recent years, but challenges remain in creating natural, expressive, and high-fidelity speech synthesis. Many TTS systems struggle to replicate the nuances of human speech, such as intonation, emotion, and accent, often resulting in artificial-sounding voices. Additionally, precise voice cloning remains difficult, limiting the ability to generate personalized or diverse speech outputs. These challenges have driven continued research into more sophisticated TTS models capable of producing real-time, expressive, and realistic speech.

Zyphra has introduced the beta release of Zonos-v0.1, featuring two real-time TTS models with high-fidelity voice cloning. The release includes a 1.6 billion-parameter transformer model and a similarly sized hybrid model, both available under the Apache 2.0 license. This open-source initiative seeks to advance TTS research by making high-quality speech synthesis technology more accessible to developers and researchers.

The Zonos-v0.1 models are trained on approximately 200,000 hours of speech data, encompassing both neutral and expressive speech patterns. While the primary dataset consists of English-language content, significant portions of Chinese, Japanese, French, Spanish, and German speech have been incorporated, allowing for multilingual support. The models generate lifelike speech from text prompts using either speaker embeddings or audio prefixes. They can perform voice cloning with as little as 5 to 30 seconds of sample speech and offer controls over parameters such as speaking rate, pitch variation, audio quality, and emotions like sadness, fear, anger, happiness, and surprise. The synthesized speech is produced at a 44 kHz sample rate, ensuring high audio fidelity.

Zonos-v0.1 includes several key features:

These features make Zonos-v0.1 a flexible tool for various TTS applications, from content creation to accessibility tools.

Early evaluations suggest that Zonos-v0.1 delivers high-quality speech generation, often comparable to or exceeding leading proprietary systems. While objective audio evaluation remains complex, comparisons with other models—including proprietary solutions such as ElevenLabs and Cartesia, as well as open-source alternatives like FishSpeech-v1.5—highlight Zonos’s ability to produce clear, natural, and expressive speech. The hybrid model, in particular, offers reduced latency and lower memory usage compared to the transformer variant, benefiting from its Mamba2-based architecture, which minimizes reliance on attention mechanisms.

The beta release of Zonos-v0.1 represents an important step forward in open-source TTS development. By providing a high-fidelity, expressive, and real-time speech synthesis tool under an accessible license, Zyphra offers developers and researchers a powerful resource for advancing TTS applications. Its combination of voice cloning, multilingual support, and fine-grained audio control makes it a versatile addition to the field, with potential applications in assistive technologies, content creation, and beyond.


Check out the Technical details, GitHub Page, Zyphra/Zonos-v0.1-transformer and Zyphra/Zonos-v0.1-hybrid. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 75k+ ML SubReddit.

Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System(Promoted)

The post Zyphra Introduces the Beta Release of Zonos: A Highly Expressive TTS Model with High Fidelity Voice Cloning appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

TTS 语音克隆 开源 Zonos Zyphra
相关文章