AWS Machine Learning Blog 2024年08月20日
Cohere Rerank 3 Nimble now generally available on Amazon SageMaker JumpStart
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Cohere Rerank 3 Nimble 是 Cohere Rerank 模型系列中的最新基础模型,旨在增强企业搜索和检索增强生成 (RAG) 系统。该模型在准确性方面保持了高水平,同时速度提高了 3-5 倍,比其前身 Cohere Rerank 3 更快,这使其成为希望在不牺牲性能的情况下增强搜索功能的企业的理想选择。

🚀 **Cohere Rerank 3 Nimble 是 Cohere Rerank 模型系列的最新成员,旨在通过其前身 Cohere Rerank 3 提高速度和效率。** 据 Cohere 的基准测试(包括用于准确性的 BEIR(基准测试 IR)和内部基准测试数据集)显示,Cohere Rerank 3 Nimble 在保持高准确性的同时,速度比 Cohere Rerank 3 快约 3-5 倍。速度提升旨在满足希望在不牺牲性能的情况下增强搜索功能的企业的需求。

💡 **Cohere Rerank 3 Nimble 旨在根据其与给定查询的相关性重新排序由初始搜索算法检索到的文档。** 重新排序模型,也称为交叉编码器,是一种模型类型,它在给定查询和文档对的情况下,将输出一个相似度分数。对于 FM,单词、句子或整个文档通常被编码为语义空间中的密集向量。通过计算这些向量之间角度的余弦,可以量化它们的语义相似性并输出为单个相似度分数。可以使用此分数根据与查询的相关性对文档进行重新排序。

💻 **SageMaker JumpStart 提供对广泛的公开可用 FM 的访问权限。** 这些预训练模型充当强大的起点,可以进行深度定制以解决特定用例。现在,您可以使用最先进的模型架构(例如语言模型、计算机视觉模型等)而无需从头开始构建它们。

🤖 **Amazon SageMaker 是一个全面的、完全托管的机器学习 (ML) 平台,它彻底改变了整个 ML 工作流程。** 它提供无与伦比的工具套件,可以满足 ML 生命周期各个阶段的需求,从数据准备到模型部署和监控。数据科学家和开发人员可以使用 SageMaker 集成开发环境 (IDE) 访问大量预构建算法,自定义自己的模型,并无缝扩展其解决方案。该平台的优势在于其能够抽象掉基础设施管理的复杂性,使您能够专注于创新而不是运营开销。SageMaker 的自动化 ML 功能(包括自动化机器学习 (AutoML) 功能)通过使即使是非专家也能构建复杂的模型来实现 ML 的民主化。此外,其强大的治理功能有助于组织对其 ML 项目保持控制和透明度,解决围绕法规遵从性的关键问题。

🌐 **Cohere Rerank 3 Nimble 提供强大的多语言支持。** 该模型有英语和多语言版本,支持 100 多种语言。

🚀 **在 RAG 架构的第一个检索阶段,根据与查询相关的知识库返回一组候选文档。** 在第二个阶段,Cohere Rerank 3 Nimble 分析查询与每个检索到的文档之间的语义相关性,并将其重新排序,从最相关到最不相关。排名靠前的文档使用额外的上下文增强原始查询。此过程通过识别最相关的文档来提高搜索结果质量。将 Cohere Rerank 3 Nimble 集成到 RAG 系统中,使用户可以向语言模型发送更少但质量更高的文档以进行基础生成。这将提高搜索结果的准确性和相关性,而不会增加延迟。

💻 **您可以使用 Amazon SageMaker Studio 中的 SageMaker JumpStart 访问 Cohere Rerank 3 模型系列。** 部署从您选择部署开始,您可能会被提示通过 AWS Marketplace 订阅此模型。如果您已经订阅,您可以再次选择部署以部署模型。部署完成后,您将看到创建了一个端点。您可以通过传递示例推理请求有效负载或使用 SDK 选择测试选项来测试端点。

💻 **要订阅模型包,请完成以下步骤:** 根据您要部署的模型,打开 cohere-rerank-nimble-english 或 cohere-rerank-nimble-multilingual 的模型包列表页面。在 AWS Marketplace 列表中,选择继续订阅。在订阅此软件页面上,查看并选择接受优惠,如果您和您的组织同意 EULA、定价和支持条款。选择继续配置,然后选择 AWS 区域。将显示产品 ARN。这是您在使用 Boto3 创建可部署模型时需要指定的模型包 ARN。

💻 **要使用 SDK 部署模型,请从上一步复制产品 ARN,并在以下代码中的 model_package_arn 中指定它:** 指定模型包 ARN 后,您可以创建端点,如以下代码所示。指定端点的名称、实例类型以及正在使用的实例数量。确保您拥有使用 ml.g5.xlarge 作为端点使用一个或多个实例的帐户级服务限制。要请求服务配额增加,请参阅 AWS 服务配额。

💻 **如果端点已创建,您只需使用以下代码连接到它:** 按照前面详细介绍的类似过程在 SageMaker JumpStart 上部署 Cohere Rerank 3。

💻 **以下代码示例说明了如何使用 Cohere Rerank 3 Nimble-English 执行实时推理:** Cohere Rerank 3 Nimble 的 top_n 推理参数指定在重新排序输入文档后返回的排名靠前的结果数量。它允许您控制最终输出中包含多少个最相关的文档。要确定 top_n 的最佳值,请考虑因素,例如您的文档集的多样性、查询的复杂性以及企业搜索或 RAG 的精度和延迟之间的预期平衡。

💻 **以下是 Cohere Rerank 3 Nimble-English 的输出:**

The Cohere Rerank 3 Nimble foundation model (FM) is now generally available in Amazon SageMaker JumpStart. This model is the newest FM in Cohere’s Rerank model series, built to enhance enterprise search and Retrieval Augmented Generation (RAG) systems.

In this post, we discuss the benefits and capabilities of this new model with some examples.

Overview of Cohere Rerank models

Cohere’s Rerank family of models are designed to enhance existing enterprise search systems and RAG systems. Rerank models improve search accuracy over both keyword-based and embedding-based search systems. Cohere Rerank 3 is designed to reorder documents retrieved by initial search algorithms based on their relevance to a given query. A reranking model, also known as a cross-encoder, is a type of model that, given a query and document pair, will output a similarity score. For FMs, words, sentences, or entire documents are often encoded as dense vectors in a semantic space. By calculating the cosine of the angle between these vectors, you can quantify their semantic similarity and output as a single similarity score. You can use this score to reorder the documents by relevance to your query.

Cohere Rerank 3 Nimble is the newest model from Cohere’s Rerank family of models, designed to improve speed and efficiency from its predecessor Cohere Rerank 3. According to Cohere’s benchmark tests including BEIR (Benchmarking IR) for accuracy and internal benchmarking datasets, Cohere Rerank 3 Nimble maintains high accuracy while being approximately 3–5 times faster than Cohere Rerank 3. The speed improvement is designed for enterprises looking to enhance their search capabilities without sacrificing performance.

The following diagram represents the two-stage retrieval of a RAG pipeline and illustrates where Cohere Rerank 3 Nimble is incorporated into the search pipeline.

In the first stage of retrieval in the RAG architecture, a set of candidate documents are returned based on the knowledge base that’s relevant to the query. In the second stage, Cohere Rerank 3 Nimble analyzes the semantic relevance between the query and each retrieved document, reordering them from most to least relevant. The top-ranked documents augment the original query with additional context. This process improves search result quality by identifying the most pertinent documents. Integrating Cohere Rerank 3 Nimble into a RAG system enables users to send fewer but higher-quality documents to the language model for grounded generation. This results in improved accuracy and relevance of search results without adding latency.

Overview of SageMaker JumpStart

SageMaker JumpStart offers access to a broad selection of publicly available FMs. These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.

Amazon SageMaker is a comprehensive, fully managed machine learning (ML) platform that revolutionizes the entire ML workflow. It offers an unparalleled suite of tools that cater to every stage of the ML lifecycle, from data preparation to model deployment and monitoring. Data scientists and developers can use the SageMaker integrated development environment (IDE) to access a vast array of pre-built algorithms, customize their own models, and seamlessly scale their solutions. The platform’s strength lies in its ability to abstract away the complexities of infrastructure management, allowing you to focus on innovation rather than operational overhead. The automated ML capabilities of SageMaker, including automated machine learning (AutoML) features, democratize ML by enabling even non-experts to build sophisticated models. Furthermore, its robust governance features help organizations maintain control and transparency over their ML projects, addressing critical concerns around regulatory compliance.

Prerequisites

Make sure your SageMaker AWS Identity and Access Management (IAM) service role has the AmazonSageMakerFullAccess permission policy attached.

To deploy Cohere Rerank 3 Nimble successfully, confirm one of the following:

Deploy Cohere Rerank 3 Nimble on SageMaker JumpStart

You can access the Cohere Rerank 3 family of models using SageMaker JumpStart in Amazon SageMaker Studio, as shown in the following screenshot.

Deployment starts when you choose Deploy, and you may be prompted to subscribe to this model through AWS Marketplace. If you are already subscribed, you can choose Deploy again to deploy the model. After deployment finishes, you will see that an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK.

Subscribe to the model package

To subscribe to the model package, complete the following steps:

    Depending on the model you want to deploy, open the model package listing page for cohere-rerank-nimble-english or cohere-rerank-nimble-multilingual. On the AWS Marketplace listing, choose Continue to subscribe. On the Subscribe to this software page, review and choose Accept Offer if you and your organization agree with EULA, pricing, and support terms. Choose Continue to configuration and then choose an AWS Region.

A product ARN will be displayed. This is the model package ARN that you need to specify while creating a deployable model using Boto3.

Deploy Cohere Rerank 3 Nimble using the SDK

To deploy the model using the SDK, copy the product ARN from the previous step and specify it in the model_package_arn in the following code:

from cohere_aws import Clientimport boto3region = boto3.Session().region_namemodel_package_arn = "Specify the model package ARN here"

After you specify the model package ARN, you can create the endpoint, as shown in the following code. Specify the name of the endpoint, the instance type, and the number of instances being used. Make sure you have the account-level service limit for using ml.g5.xlarge for endpoint usage as one or more instances. To request a service quota increase, refer to AWS service quotas.

co = Client(region_name=region)co.create_endpoint(arn=model_package_arn, endpoint_name="cohere-rerank-3/cohere-rerank-nimble-multilingual", instance_type="ml.g5.xlarge", n_instances=1)

If the endpoint is already created, you just need to connect to it with the following code:

co.connect_to_endpoint(endpoint_name="cohere-rerank-3/cohere-rerank-nimble-multilingual-v3")

Follow a similar process as detailed earlier to deploy Cohere Rerank 3 on SageMaker JumpStart.

Inference example with Cohere Rerank 3 Nimble

Cohere Rerank 3 Nimble offers robust multilingual support. The model is available in both English and multilingual versions supporting over 100 languages.

The following code example illustrates how to perform real-time inference using Cohere Rerank 3 Nimble-English:

documents = [    {"Title":"Incorrect Password","Content":"Hello, I have been trying to access my account for the past hour and it keeps saying my password is incorrect. Can you please help me?"},    {"Title":"Confirmation Email Missed","Content":"Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?"},    {"Title":"Questions about Return Policy","Content":"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."},    {"Title":"Customer Support is Busy","Content":"Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?"},    {"Title":"Received Wrong Item","Content":"Hi, I have a question about my recent order. I received the wrong item and I need to return it."},    {"Title":"Customer Service is Unavailable","Content":"Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?"},    {"Title":"Return Policy for Defective Product","Content":"Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."},    {"Title":"Wrong Item Received","Content":"Good morning, I have a question about my recent order. I received the wrong item and I need to return it."},    {"Title":"Return Defective Product","Content":"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."}]

In the following code, the top_n inference parameter for Cohere Rerank 3 and Rerank 3 Nimble specifies the number of top-ranked results to return after reranking the input documents. It allows you to control how many of the most relevant documents are included in the final output. To determine an optimal value for top_n, consider factors such as the diversity of your document set, the complexity of your queries, and the desired balance between precision and latency for enterprise search or RAG.

response = co.rerank(documents=documents, query='What emails have been about returning items?', rank_fields=["Title","Content"], top_n=2)

The following is the output from Cohere Rerank 3 Nimble-English:

Documents: [RerankResult<document: {'Title': 'Received Wrong Item', 'Content': 'Hi, I have a question about my recent order. I received the wrong item and I need to return it.'}, index: 4, relevance_score: 0.0068771075>, RerankResult<document: {'Title': 'Wrong Item Received', 'Content': 'Good morning, I have a question about my recent order. I received the wrong item and I need to return it.'}, index: 7, relevance_score: 0.0064131636>]

Cohere Rerank 3 Nimble multilingual support

The multilingual capabilities of Cohere Rerank 3 Nimble-Multilingual enable global organizations to provide consistent, improved search experiences to users across different Regions and language preferences.

In the following example, we create an input payload for a list of emails in multiple languages. We can take the same set of emails from earlier and translate them to different languages. These examples are available under the SageMaker JumpStart model card and are randomly generated for this example.

documents = [    {"Title":"Contraseña incorrecta","Content":"Hola, llevo una hora intentando acceder a mi cuenta y sigue diciendo que mi contraseña es incorrecta. ¿Puede ayudarme, por favor?"},    {"Title":"Confirmation Email Missed","Content":"Hi, I recently purchased a product from your website but I never received a confirmation email. Can you please look into this for me?"},    {"Title":"أسئلة حول سياسة الإرجاع","Content":"مرحبًا، لدي سؤال حول سياسة إرجاع هذا المنتج. لقد اشتريته قبل بضعة أسابيع وهو معيب"},    {"Title":"Customer Support is Busy","Content":"Good morning, I have been trying to reach your customer support team for the past week but I keep getting a busy signal. Can you please help me?"},    {"Title":"Falschen Artikel erhalten","Content":"Hallo, ich habe eine Frage zu meiner letzten Bestellung. Ich habe den falschen Artikel erhalten und muss ihn zurückschicken."},    {"Title":"Customer Service is Unavailable","Content":"Hello, I have been trying to reach your customer support team for the past hour but I keep getting a busy signal. Can you please help me?"},    {"Title":"Return Policy for Defective Product","Content":"Hi, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."},    {"Title":"收到错误物品","Content":"早上好,关于我最近的订单,我有一个问题。我收到了错误的商品,需要退货。"},    {"Title":"Return Defective Product","Content":"Hello, I have a question about the return policy for this product. I purchased it a few weeks ago and it is defective."}]

Use the following code to perform real-time inference using Cohere Rerank 3 Nimble-Multilingual:

response = co.rerank(documents=documents, query='What emails have been about returning items?', rank_fields=['Title','Content'], top_n=2)print(f'Documents: {response}')

The following is the output from Cohere Rerank 3 Nimble-Multilingual:

Documents: [RerankResult<document: {'Title': '收到错误物品', 'Content': '早上好,关于我最近的订单,我有一个问题。我收到了错误的商品,需要退货。'}, index: 7, relevance_score: 0.034553625>, RerankResult<document: {'Title': 'أسئلة حول سياسة الإرجاع', 'Content': 'مرحبًا، لدي سؤال حول سياسة إرجاع هذا المنتج. لقد اشتريته قبل بضعة أسابيع وهو معيب'}, index: 2, relevance_score: 0.00037263767>]

The output translated to English is as follows:

Documents: [RerankResult<document: {'Title': 'Received Wrong Item', 'Content': 'Good morning, I have a question about my recent order. I received the wrong item and need to return it.'}, index: 7, relevance_score: 0.034553625>, RerankResult<document: {'Title': 'Questions about Return Policy', 'Content': 'Hello, I have a question about the return policy for this product. I bought it a few weeks ago and it's defective'}, index: 2, relevance_score: 0.00037263767>]

In both examples, the relevance scores are normalized to be in the range [0, 1]. Scores close to 1 indicate a high relevance to the query, and scores closer to 0 indicate low relevance.

Use cases suitable for Cohere Rerank 3 Nimble

The Cohere Rerank 3 Nimble model provides an option that prioritizes efficiency. The model is ideal for enterprises looking to enable their customers to accurately search complex documentation, build applications that understand over 100 languages, and retrieve the most relevant information from various data stores. In industries such as retail, where website drop-off increases with every 100 milliseconds added to search response time, having a faster AI model like Cohere Rerank 3 Nimble powering the enterprise search system translates to higher conversion rates.

Conclusion

Cohere Rerank 3 and Rerank 3 Nimble are now available on SageMaker JumpStart. To get started, refer to Train, deploy, and evaluate pretrained models with SageMaker JumpStart.

Interested in diving deeper? Check out the Cohere on AWS GitHub repo.


About the Authors

Breanne Warner is an Enterprise Solutions Architect at Amazon Web Services supporting healthcare and life science (HCLS) customers. She is passionate about supporting customers to use generative AI on AWS and evangelizing model adoption. Breanne is also on the Women@Amazon board as co-director of Allyship with the goal of fostering inclusive and diverse culture at Amazon. Breanne holds a Bachelor’s of Science in Computer Engineering from University of Illinois at Urbana Champaign (UIUC)

Nithin Vijeaswaran is a Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.

Karan Singh is a Generative AI Specialist for third-party models at AWS, where he works with top-tier third-party foundational model providers to define and run join GTM motions that help customers train, deploy, and scale foundational models. Karan holds a Bachelor’s of Science in Electrical and Instrumentation Engineering from Manipal University and a Master’s in Science in Electrical Engineering from Northwestern University, and is currently an MBA Candidate at the Haas School of Business at University of California, Berkeley.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Cohere Rerank 3 Nimble Amazon SageMaker JumpStart 企业搜索 RAG 检索增强生成
相关文章