TechCrunch News 03月14日
Sesame, the startup behind the viral virtual assistant Maya, releases its base AI model
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AI公司Sesame发布了其语音助手Maya的基础模型CSM-1B,该模型拥有10亿参数,并采用Apache 2.0许可,允许商业用途。CSM-1B基于Meta的Llama模型,结合音频解码器,能从文本和音频输入生成RVQ音频代码。虽然该模型未经特定声音微调,但具备生成多种声音的能力。Sesame同时呼吁开发者和用户不要在未经允许的情况下使用该模型模仿他人声音,或用于制造虚假信息等恶意行为。该技术克隆声音非常迅速,引发了关于AI语音安全性的讨论。Sesame由Oculus联合创始人Brendan Iribe共同创立,并获得了来自多家知名投资机构的资金。

🤖 Sesame发布了名为CSM-1B的开源语音模型,该模型拥有10亿参数,基于Apache 2.0许可,允许商业用途,为AI语音技术的发展提供了新的可能性。

🗣️ CSM-1B模型通过RVQ(残差矢量量化)技术,能够从文本和音频输入中生成音频代码,实现语音合成,其底层架构基于Meta的Llama模型,并结合了音频解码器组件。

⚠️ Sesame呼吁开发者和用户在使用CSM-1B模型时遵守道德规范,避免未经授权模仿他人声音、制造虚假信息等行为,但实际上该模型缺乏有效的安全保障措施。

👓 Sesame公司除了语音助手技术外,还在研发配备定制AI模型的AI眼镜,旨在实现全天候的AI辅助体验,展现了其在AI领域的广泛布局。

AI company Sesame has released the base model that powers Maya, the impressively realistic voice assistant.

The model, which is 1 billion parameters in size (“parameters” referring to individual components of the model), is under an Apache 2.0 license, meaning it can be used commercially with few restrictions. Called CSM-1B, the model generates “RVQ audio codes” from text and audio inputs, according to Sesame’s description on the AI dev platform Hugging Face.

RVQ refers to “residual vector quantization,” a technique for encoding audio into discrete tokens called codes. RVQ is used in a number of recent AI audio technologies, including Google’s SoundStream and Meta’s Encodec.

CSM-1B uses a model from Meta’s Llama family as its backbone paired with an audio “decoder” component. A fine-tuned variant of CSM powers Maya, Sesame says.

“The model open-sourced here is a base generation model,” Sesame writes in CSM-1B’s Hugging Face and GitHub repositories. “It is capable of producing a variety of voices, but it has not been fine-tuned on any specific voice […] The model has some capacity for non-English languages due to data contamination in the training data, but it likely won’t do well.”

It’s unclear what data Sesame used to train CSM-1B. The company didn’t say.

The model has no real safeguards to speak of, it’s worth noting. It’s an “honor system” situation. Sesame is merely urging developers and users not to use the model to mimic a person’s voice without their consent, create misleading content like fake news, or engage in “harmful” or “malicious” activities.

I tried the demo on Hugging Face, and cloning my voice took less than a minute. From there, it was easy to generate speech to my heart’s desire, including on controversial topics like the election and Russian propaganda:

Sesame, co-founded by Oculus co-creator Brendan Iribe, went viral in late February for its assistant tech, which comes close to clearing uncanny valley territory. Maya and Sesame’s other assistant, Miles, take breaths and speak with disfluencies, and can be interrupted while speaking, much like OpenAI’s Voice Mode.

Sesame has raised an undisclosed amount of capital from Andreessen Horowitz, Spark Capital, and Matrix Partners. In addition to building voice assistant tech, the company says it’s prototyping AI glasses “designed to be worn all day” that’ll be equipped with its custom models.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Sesame CSM-1B 语音助手 开源模型 AI伦理
相关文章