MarkTechPost@AI 2024年09月28日
Voyage AI Introduces Voyage-3 and Voyage-3-Lite: A New Generation of Small Embedding Models that Outperforms OpenAI v3 Large by 7.55%
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Voyage AI发布Voyage-3和Voyage-3-Lite模型,在多领域性能超越现有标准,具成本效益和广泛适用性

🎯Voyage-3和Voyage-3-Lite模型在技术、法律、金融等多领域表现出色,超越现有行业标准。如Voyage-3在所有测试领域中比OpenAI的V3大型模型平均高出7.55%,Voyage-3-Lite的检索准确性比OpenAI的V3大型模型高3.82%。

💲成本效益是新Voyage-3系列模型的核心。Voyage-3以较低成本为企业提供高质量检索,其成本为每百万令牌0.06美元,比Cohere English V3便宜1.6倍。Voyage-3-Lite成本更低,为每百万令牌0.02美元。

🌐Voyage-3系列模型具有广泛的适用性。过去九个月,Voyage AI已发布一系列Voyage-2系列嵌入模型,包括针对特定领域的模型。Voyage-Multilingual-2在多语言检索中表现优异。

📋该系列模型有多项技术创新。其改进架构,利用从大型模型的蒸馏和在超过2万亿高质量令牌上的预训练。通过人类反馈完善检索结果对齐,提高模型的准确性和相关性。

💼Voyage-3系列模型适用于多个行业,如技术文档、代码、法律、金融、多语言应用等领域,能满足不同需求。

Voyage AI is proud to announce the release of its new generation of embedding models, Voyage-3 and Voyage-3-Lite. The Voyage-3 and Voyage-3-Lite models are designed to outperform existing industry standards in various domains, including technology, law, finance, multilingual applications, and long-context understanding. According to Voyage AI’s evaluations, Voyage-3 outperforms OpenAI’s V3 large model by an average of 7.55% across all tested domains, which include technical documentation, code, law, finance, web content, multilingual datasets, long documents, and conversational data. Moreover, Voyage-3 achieves this with 2.2 times lower costs and a 3x smaller embedding dimension, translating to significantly reduced vector database (vectorDB) costs. Similarly, Voyage-3-Lite offers 3.82% better retrieval accuracy than OpenAI’s V3 large model, with 6x lower costs and a 6x smaller embedding dimension.

Cost Efficiency Without Compromising Quality

Cost efficiency is at the heart of the new Voyage-3 series models. With a context length of 32,000 tokens, four times more than OpenAI’s offering, Voyage-3 is a cost-effective solution for businesses requiring high-quality retrieval without breaking the bank. For example, Voyage-3 costs $0.06 per million tokens, making it 1.6x cheaper than Cohere English V3 and substantially more affordable than OpenAI’s large V3 model. Also, Voyage-3’s smaller embedding dimension (1024 vs. OpenAI’s 3072) results in lower vectorDB costs, enabling companies to scale their applications efficiently.

Voyage-3-Lite, the model’s lighter variant, is optimized for low-latency operations. At $0.02 per million tokens, it is 6.5x cheaper than OpenAI’s V3 large model and has a 6-8x smaller embedding dimension (512 vs. OpenAI’s 3072). This makes Voyage-3-Lite a viable option for organizations looking to maintain high retrieval quality at a fraction of the cost.

Versatility Across Multiple Domains

The success of the Voyage-3 series models extends beyond general-purpose embeddings. Over the past nine months, Voyage AI has released a suite of its Voyage-2 series embedding models, including domain-specific models like Voyage-Large-2, Voyage-Code-2, Voyage-Law-2, Voyage-Finance-2, and Voyage-Multilingual-2. These models have been extensively trained on data from their respective domains, demonstrating exceptional performance in specialized use cases.

For example, Voyage-Multilingual-2 delivers superior retrieval quality in French, German, Japanese, Spanish, and Korean while maintaining best-in-class performance in English. These achievements testify to Voyage AI’s commitment to developing robust models tailored to specific business needs.

Technical Specifications and Innovations

Several research innovations underpin the development of Voyage-3 and Voyage-3-Lite. The models feature an improved architecture, leveraging distillation from larger models and pre-training on over 2 trillion high-quality tokens. Additionally, retrieval result alignment is refined through human feedback, further enhancing the accuracy and relevance of the models.

Key technical specifications of the Voyage-3 series models include:

Voyage-3:

Voyage-3-Lite:

The models’ ability to handle a 32,000-token context length, compared to OpenAI’s 8,000 tokens and Cohere’s 512 tokens, makes them suitable for applications requiring comprehensive understanding and retrieval of large documents, such as technical manuals, academic papers, and legal case summaries.

Applications and Use Cases

The Voyage-3 series models cater to a wide range of industries, enabling applications in domains like:

Recommendations for Users

Voyage AI recommends that any general-purpose embedding users upgrade to Voyage-3 for enhanced retrieval quality at a low cost. Voyage-3-Lite offers an excellent balance between performance and affordability for those looking for further cost savings. Domain-specific use cases, such as code, law, and finance, can still benefit from Voyage-2 series models like Voyage-Code-2, Voyage-Law-2, and Voyage-Finance-2, although Voyage-3 provides highly competitive performance in these areas as well.

Future Developments

The Voyage AI team is continuously working to expand the capabilities of the Voyage-3 series models. In the coming weeks, the release of Voyage-3-Large is expected to set a new standard for large-scale general-purpose embeddings, further solidifying Voyage AI’s position as a leader in the field. For those interested in exploring the potential of the Voyage-3 series, the first 200 million tokens are free to try. Users can use these models immediately by specifying “voyage-3” or “voyage-3-lite” as the model parameter in Voyage API calls. Voyage AI’s release of Voyage-3 and Voyage-3-Lite represents a giant leap forward in embedding technology, offering a unique combination of high performance, low cost, and versatility. With these new models, Voyage AI continues to lead the way in creating state-of-the-art solutions for businesses and developers worldwide.


Check out the Models on Hugging Face and Details here. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

The post Voyage AI Introduces Voyage-3 and Voyage-3-Lite: A New Generation of Small Embedding Models that Outperforms OpenAI v3 Large by 7.55% appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Voyage AI 嵌入模型 成本效益 多领域应用 技术创新
相关文章