TechCrunch News 03月06日
A year later, OpenAI still hasn’t released its voice cloning tool
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

OpenAI于去年三月宣布推出语音引擎Voice Engine的“小规模预览”,该引擎仅需15秒的语音即可克隆一个人的声音。然而,一年过去了,该工具仍处于预览阶段,OpenAI并未透露何时发布,甚至是否会发布。这可能源于对滥用的担忧,或规避监管审查。OpenAI曾被指责为追求“闪亮产品”而牺牲安全性,并仓促发布以击败竞争对手。目前,OpenAI正与有限的“可信合作伙伴”测试Voice Engine,并根据他们的使用反馈改进模型的实用性和安全性。尽管该技术在语音治疗、语言学习和客户支持等领域展现出潜力,但其发布之路仍然充满不确定性。

🔑 OpenAI的Voice Engine语音引擎,只需15秒的语音样本,即可克隆出一个人的声音,具有高度的语音还原能力,最初计划于2024年3月向开发者开放API,但最终推迟。

🛡️ OpenAI推迟发布Voice Engine的主要原因在于对潜在滥用的担忧,例如在2024年美国大选期间可能被用于制造虚假信息,因此,OpenAI采取了包括水印技术在内的多项安全措施,以追踪音频来源,并要求开发者获得明确的授权和披露。

🤝 目前,OpenAI仅与少数合作伙伴合作测试Voice Engine,例如Livox公司正在开发帮助残疾人更自然地交流的设备,尽管Livox由于Voice Engine的在线要求未能将其集成到产品中,但对其语音质量和多语言能力印象深刻。

💰 OpenAI曾计划对Voice Engine进行收费,标准音质为每百万字符15美元,高清音质为每百万字符30美元,但目前尚未开始收费,也未给出明确的发布时间表。

Late last March, OpenAI announced a “small-scale preview” of an AI service, Voice Engine, that the company claimed could clone a person’s voice with just 15 seconds of speech. Roughly a year later, the tool remains in preview, and OpenAI has given no indication as to when it might launch — or whether it’ll launch at all.

The company’s reluctance to roll out the service widely may point to fears of misuse, but it could also reflect an effort to avoid inviting regulatory scrutiny. OpenAI has historically been accused of prioritizing “shiny products” at the expense of safety, and of rushing releases to beat rival firms to market.

In a statement, an OpenAI spokesperson told TechCrunch that the company is continuing to test Voice Engine with a limited set of “trusted partners.”

“[We’re] learning from how [our partners are] using the technology so we can improve the model’s usefulness and safety,” the spokesperson said. “We’ve been excited to see the different ways it’s being used, from speech therapy, to language learning, to customer support, to video game characters, to AI avatars.”

Voice Engine, which powers the voices available in OpenAI’s text-to-speech API as well as ChatGPT’s Voice Mode, generates natural-sounding speech that closely resembles the original speaker. The tool converts written characters to speech, limited only by certain guardrails on content. But it was subject to delays and shifting release windows from the start.

As OpenAI explained in a June 2024 blog post, the Voice Engine model learns to predict the most probable sounds a speaker will make for a given text transcript, taking into account different voices, accents, and speaking styles. After this, the model can generate not just spoken versions of text, but also “spoken utterances” that reflect how different types of speakers would read text aloud.

OpenAI had initially intended to bring Voice Engine, originally called Custom Voices, to its API on March 7, 2024, according to a draft blog post seen by TechCrunch. The plan was to give a group of up to 100 “trusted developers” access ahead of a wider debut, with priority given to devs building apps that provided a “social benefit” or showed “innovative and responsible” uses of the technology. OpenAI had even trademarked and priced it: $15 per million characters for “standard” voices and $30 per million characters for “HD quality” voices.

Then, at the eleventh hour, the company postponed the announcement. OpenAI ended up unveiling Voice Engine a few weeks later without a sign-up option. Access to the tool would remain limited to a cohort of around 10 devs the company began working with in late 2023, OpenAI said.

“We hope to start a dialogue on the responsible deployment of synthetic voices and how society can adapt to these new capabilities,” OpenAI wrote in Voice Engine’s announcement blog post in late March 2024. “Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”

Voice Engine has been in the works since 2022, according to OpenAI. The company claims it demoed the tool to “global policymakers at the highest levels” in summer 2023 to showcase its potential — and risks.

Several partners have access to Voice Engine today, including startup Livox, which is building devices that enable people with disabilities to communicate more naturally. CEO Carlos Pereira told TechCrunch while Livox ultimately couldn’t build Voice Engine into a product due to the tool’s online requirement (many of Livox’s customers don’t have internet), he found the technology to be “really impressive.”

“The quality of the voice and the possibility of having the voices speaking in different languages is unique — especially for people with disabilities, our customers,” Pereira told TechCrunch via email. “It is really the most impressive and easy-to-use [tool to] create voices that I’ve seen […] We hope that OpenAI develops an offline version soon.”

Pereira says he hasn’t received guidance from OpenAI on a possible Voice Engine launch, nor has he seen any signs the company plans to begin charging for the service. So far, Livox hasn’t had to pay for its usage.

In that aforementioned June 2024 post, OpenAI hinted that one of its considerations in delaying Voice Engine was the potential for abuse during last year’s U.S. election cycle. Informed by discussions with stakeholders, Voice Engine has several mitigatory safety measures, including watermarking to trace the provenance of generated audio.

Developers must obtain “explicit consent” from the original speaker before using Voice Engine, according to OpenAI, and they must make “clear disclosures” to their audience that voices are AI-generated. The company hasn’t said how it’s enforcing these policies, however. Doing so at scale could prove to be immensely challenging, even for a company with OpenAI’s resources.

In its blog posts, OpenAI also implied that it hoped to build a “voice authentication experience” to verify speakers and a “no-go” list that prevents the creation of voices that sound too similar to prominent figures. Both are technologically ambitious projects, and getting them wrong would reflect poorly on a company that’s often been accused of sidelining safety initiatives.

Effective filtering and ID verification are fast becoming baseline requirements for responsible voice cloning tech releases. AI voice cloning was the third fastest-growing scam of 2024, according to one source. It’s led to fraud and bank security checks being bypassed as privacy and copyright laws struggle to keep up. Malicious actors have used voice cloning to create incendiary deepfakes of celebrities and politicians, and those deepfakes have spread like wildfire across social media.

OpenAI could release Voice Engine next week — or never. The company has repeatedly said that it’s weighing keeping the service small in scope. But one thing’s clear: for optics reasons, safety reasons, or both, Voice Engine’s limited preview has become one of the longest in OpenAI’s history.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

OpenAI Voice Engine 语音克隆 人工智能安全
相关文章