TechCrunch News 04月08日 21:02
Amazon unveils a new AI voice model, Nova Sonic
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

亚马逊推出了全新的生成式AI模型Nova Sonic,该模型能够原生处理语音并生成自然的语音。亚马逊声称,在速度、语音识别和对话质量的基准测试中,Nova Sonic的表现与OpenAI和谷歌的领先语音模型相当。Nova Sonic通过Amazon Bedrock平台提供,适用于企业AI应用开发,并已应用于Alexa+。亚马逊强调Nova Sonic具有成本效益,并且在处理用户请求、实时信息获取和多语言语音识别方面表现出色。Nova Sonic是亚马逊构建通用人工智能(AGI)战略的一部分,未来将发布更多支持多模态的AI模型。

🗣️ Nova Sonic是一款由亚马逊推出的生成式AI语音模型,它能够原生处理语音并生成自然的语音。亚马逊声称,在速度、语音识别和对话质量的基准测试中,Nova Sonic的表现与OpenAI和谷歌的领先语音模型相当。

💡 Nova Sonic通过Amazon Bedrock平台提供,这是一个为构建企业AI应用而设计的开发者平台,通过新的双向流API提供服务。亚马逊表示,Nova Sonic是市场上“最具成本效益”的AI语音模型,成本大约比OpenAI的GPT-4o低80%。

🌐 Nova Sonic在理解用户意图方面表现出色,即使在嘈杂的环境中,也能准确识别语音。在多语言语音识别基准测试中,Nova Sonic在英语、法语、意大利语、德语和西班牙语中的平均词错率(WER)仅为4.2%。在处理多人交互的测试中,Nova Sonic的准确率比OpenAI的GPT-4o-transcribe模型高出46.7%。

🚀 Nova Sonic具有行业领先的速度,平均感知延迟为1.09秒,比OpenAI的Realtime API更快。亚马逊计划发布更多能够理解不同模态的AI模型,包括图像、视频和语音,以及其他与物理世界相关的数据。

On Tuesday, Amazon debuted a new generative AI model, Nova Sonic, capable of natively processing voice and generating natural-sounding speech. Amazon claims that Sonic’s performance is competitive with frontier voice models from OpenAI and Google on benchmarks measuring speed, speech recognition, and conversational quality.

Nova Sonic is Amazon’s answer to newer AI voice models such as the model powering ChatGPT’s Voice Mode, which feel more natural to speak with than the more rigid models from Amazon Alexa’s early days. Recent technological breakthroughs have made legacy models and the digital assistants they underpin, such as Alexa and Apple’s Siri, seem incredibly stilted by comparison.

Nova Sonic is available through Bedrock, Amazon’s developer platform for building enterprise AI applications, via a new bi-directional streaming API. In a press release, Amazon called Nova Sonic “the most cost-efficient” AI voice model on the market, and around 80% less expensive than OpenAI’s GPT-4o.

Components of Nova Sonic are already powering Alexa+, Amazon’s upgraded digital voice assistant, according to Amazon SVP and Head Scientist of AGI Rohit Prasad.

In an interview, Prasad told TechCrunch that Nova Sonic builds on Amazon’s expertise in “large orchestration systems,” the technical scaffolding that makes up Alexa. Compared to rival AI voice models, Nova Sonic excels at routing user requests to different APIs, said Prasad. This capability helps Nova Sonic “know” when it needs to fetch real-time information from the internet, parse a proprietary data source, or take action in an external application — and use the appropriate tool to do it.

During a two-way dialogue, Nova Sonic waits to speak “at the appropriate time,” taking into account a speaker’s pauses and interruptions, says Amazon. It also generates a text transcript for the user’s speech, which developers can use for various applications.

Nova Sonic is less prone to speech recognition errors than other AI voice models, according to Prasad, meaning the model is relatively good at understanding a user’s intent even if they mumble, misspeak, or are in a noisy setting. On a benchmark measuring speech recognition across languages and dialects, Multilingual LibriSpeech, Amazon says Nova Sonic achieved a word error rate (WER) of just 4.2% when averaged across English, French, Italian, German, and Spanish. That means that roughly four out of every 100 words from the model differed from a human transcription in those languages.

On another benchmark measuring loud interactions with multiple participants, Augmented Multi Party Interaction, Amazon says Nova Sonic was 46.7% more accurate in terms of WER than OpenAI’s GPT-4o-transcribe model. Nova Sonic also has industry-leading speed, with an average perceived latency of 1.09 seconds, according to Amazon. That makes it faster than the GPT-4o model powering OpenAI’s Realtime API, which responds in 1.18 seconds, per benchmarking by Artificial Analysis.

Prasad says Nova Sonic is a part of Amazon’s broader strategy to build AGI (artificial general intelligence), which the company defines as “AI systems that can do anything a human can do on a computer.” Moving forward, Prasad says Amazon plans to release more AI models that can understand different modalities, including image, video, and voice, as well as “other sensory data that are relevant if you bring things into the physical world.”

Amazon’s AGI division, which Prasad oversees, seems to be playing a larger role in the company’s product strategy these days. Just last week, Amazon launched a preview of Nova Act, a browser-using AI model that appears to be powering elements of Alexa+ and Amazon’s Buy for Me feature. Starting with Nova Sonic, Prasad says the company wants to offer more of its internal AI models for developers to build with.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Nova Sonic 亚马逊 AI语音模型 人工智能
相关文章