TechCrunch News 02月27日
ElevenLabs is launching its own speech-to-text model
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

ElevenLabs是一家以音频生成技术闻名的AI初创公司,最近推出了其首个独立的语音转文本模型Scribe,标志着该公司在技术方向上的新进展。Scribe支持超过99种语言,并在25种语言中表现出卓越的准确性,错误率低于5%。该公司声称,在多语言的FLEURS和Common Voice基准测试中,Scribe的性能优于Google Gemini 2.0 Flash和Whisper Large V3。Scribe还具备智能扬声器区分说话人、字级别时间戳和自动标记声音事件等功能。目前,Scribe仅支持预先录制的音频格式,但ElevenLabs计划很快发布低延迟的实时版本。

🚀ElevenLabs发布了名为Scribe的独立语音转文本模型,标志着该公司在语音识别领域迈出了重要一步,进一步扩展了其AI能力。

🗣️Scribe模型支持超过99种语言,其中25种语言的准确率极高,错误率低于5%,包括英语、法语、德语、印地语等,充分展示了其强大的多语言处理能力。

⏱️Scribe具备智能扬声器区分说话人、字级别时间戳和自动标记声音事件等高级功能,为用户提供更精准、更便捷的语音转文本服务。

💰Scribe的定价为每小时转录音频0.40美元,虽然具有竞争力,但部分竞争对手目前提供更低的音频转录价格,并且在功能上有所差异。

ElevenLabs, an AI startup that just raised a $180 million mega funding round, has been primarily known for its audio generation prowess. The company took a step in another technological direction by launching its first standalone speech-to-text model called Scribe.

The startup, valued at $3.3 billion, has aided many other companies in providing speech-to-text services through its vast library of voices. However, the company is now looking to get into speech detection and compete with the likes of Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI’s Whisper models.

ElevenLabs’ Scribe model supports over 99 languages at launch. The company categorizes over 25 languages in excellent accuracy category for the model where the word error rate is less than 5%. This list includes English (claimed accuracy rate of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese. Other languages are ranked in different categories with high (5-10% word error rate), good (10 to 20% word error rate), and moderate (25 to 50%) word error rates.

The company said that the model outperformed Google Gemini 2.0 Flash and Whisper Large V3 across multiple languages in FLEURS & Common Voice benchmark tests.

ElevenLabs had developed the speech-to-text component for its AI conversational agent platform, which was released last year. However, this is the first time the company is releasing a standalone speech detection model. In a conversation with TechCrunch last month, CEO Mati Staniszewski talked about improving speech detection models.

“We want to understand what’s being said by you in a conversation better. We are working on ways to move away from only generating content and understanding and transcribing speech,” Staniszewski said at that time. “Many people say that speech-to-text is a solved problem. But for many languages, it is pretty bad. We think we can build better speech detection models because we have in-house teams to annotate data and give us quick feedback.”

The model also has smart speaker diarization to tell you who is speaking, timestamp at word level for accurate subtitles, and auto-tagging sound events like audience laughters. The startup is providing a way for customers to directly transcribe video content to add subtitles or captions in its studio.

Scribe currently only works with pre-recorded audio formats. The company said it will release a low-latency real-time version of the model soon. That means it is not yet effective for meeting transcriptions or voice note-taking.

ElevenLabs is pricing Scribe at $0.40 for an hour of transcribed audio. While the rate is competitive, some of its rivals offer a lower price for audio transcriptions at the moment with some feature differentiation.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ElevenLabs Scribe 语音识别 AI 语音转文本
相关文章