AWS Machine Learning Blog 2024年11月19日
Build cost-effective RAG applications with Binary Embeddings in Amazon Titan Text Embeddings V2, Amazon OpenSearch Serverless, and Amazon Bedrock Knowledge Bases
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

亚马逊宣布为Amazon Titan文本嵌入V2提供二进制嵌入功能,并将其集成到Amazon Bedrock知识库和Amazon OpenSearch Serverless中。通过使用二进制嵌入和二进制向量存储,可以构建检索增强生成(RAG)应用程序,从而减少内存使用和整体成本。Amazon Bedrock是一个完全托管的服务,提供了一个单一的API来访问和使用来自领先AI公司的各种高性能基础模型(FM)。Amazon Titan文本嵌入模型生成文档、段落和句子的有意义的语义表示,并提供两种优化方式:延迟优化和吞吐量优化。Amazon OpenSearch Serverless是一个无服务器部署选项,支持精确和近似最近邻算法,并支持多种存储和匹配引擎。通过将kNN向量字段类型设置为二进制,可以将Amazon Titan文本嵌入V2生成的二进制嵌入存储在OpenSearch Serverless中,以降低成本。二进制嵌入在降低延迟和减少存储成本方面表现出色,同时保持了较高的检索准确率。

🤔 **Amazon Titan文本嵌入V2新增二进制嵌入功能,并集成到Amazon Bedrock知识库和Amazon OpenSearch Serverless中。** 这使得构建检索增强生成(RAG)应用变得更加高效,可以显著降低内存使用和成本。

🚀 **Amazon Titan文本嵌入模型生成文档、段落和句子的语义表示,并提供延迟优化和吞吐量优化两种模式。** 延迟优化模式适用于检索步骤,而吞吐量优化模式则适用于索引过程。

🔍 **Amazon OpenSearch Serverless支持精确和近似最近邻算法,并支持多种存储和匹配引擎。** 此外,它现在还支持16位(FP16)和二进制向量,以及32位浮点向量(FP32)。

💰 **二进制嵌入显著降低了OpenSearch Serverless和Amazon Bedrock知识库的存储成本和内存需求,同时保持了较高的检索准确率。** 测试结果显示,二进制嵌入在某些情况下可以将延迟降低25倍,并保持98.5%的检索准确率。

💲 **OpenSearch Serverless使用二进制向量和HNSW算法进行基准测试,结果显示搜索OpenSearch计算单元(OCU)减少了50%,从而降低了用户成本。** 这是因为二进制索引使用汉明距离进行计算,比传统的L2和余弦距离更有效率。

Today, we are happy to announce the availability of Binary Embeddings for Amazon Titan Text Embeddings V2 in Amazon Bedrock Knowledge Bases and Amazon OpenSearch Serverless. With support for binary embedding in Amazon Bedrock and a binary vector store in OpenSearch Serverless, you can use binary embeddings and binary vector store to build Retrieval Augmented Generation (RAG) applications in Amazon Bedrock Knowledge Bases, reducing memory usage and overall costs.

Amazon Bedrock is a fully managed service that provides a single API to access and use various high-performing foundation models (FMs) from leading AI companies. Amazon Bedrock also offers a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock Knowledge Bases, FMs and agents can retrieve contextual information from your company’s private data sources for RAG. RAG helps FMs deliver more relevant, accurate, and customized responses.

Amazon Titan Text Embeddings models generate meaningful semantic representations of documents, paragraphs, and sentences. Amazon Titan Text Embeddings takes as an input a body of text and generates a 1,024 (default), 512, or 256 dimensional vector. Amazon Titan Text Embeddings are offered through latency-optimized endpoint invocation for faster search (recommended during the retrieval step) and throughput-optimized batch jobs for faster indexing. With Binary Embeddings, Amazon Titan Text Embeddings V2 will represent data as binary vectors with each dimension encoded as a single binary digit (0 or 1). This binary representation will convert high-dimensional data into a more efficient format for storage and computation.

Amazon OpenSearch Serverless is a serverless deployment option for Amazon OpenSearch Service, a fully managed service that makes it simple to perform interactive log analytics, real-time application monitoring, website search, and vector search with its k-nearest neighbor (kNN) plugin. It supports exact and approximate nearest-neighbor algorithms and multiple storage and matching engines. It makes it simple for you to build modern machine learning (ML) augmented search experiences, generative AI applications, and analytics workloads without having to manage the underlying infrastructure.

The OpenSearch Serverless kNN plugin now supports 16-bit (FP16) and binary vectors, in addition to 32-bit floating point vectors (FP32). You can store the binary embeddings generated by Amazon Titan Text Embeddings V2 for lower costs by setting the kNN vector field type to binary. The vectors can be stored and searched in OpenSearch Serverless using PUT and GET APIs.

This post summarizes the benefits of this new binary vector support across Amazon Titan Text Embeddings, Amazon Bedrock Knowledge Bases, and OpenSearch Serverless, and gives you information on how you can get started. The following diagram is a rough architecture diagram with Amazon Bedrock Knowledge Bases and Amazon OpenSearch Serverless.

You can lower latency and reduce storage costs and memory requirements in OpenSearch Serverless and Amazon Bedrock Knowledge Bases with minimal reduction in retrieval quality.

We ran the Massive Text Embedding Benchmark (MTEB) retrieval data set with binary embeddings. On this data set, we reduced storage, while observing a 25-times improvement in latency. Binary embeddings maintained 98.5% of the retrieval accuracy with re-ranking, and 97% without re-ranking. Compare these results to the results we got using full precision (float32) embeddings. In end-to-end RAG benchmark comparisons with full-precision embeddings, Binary Embeddings with Amazon Titan Text Embeddings V2 retain 99.1% of the full-precision answer correctness (98.6% without reranking). We encourage customers to do their own benchmarks using Amazon OpenSearch Serverless and Binary Embeddings for Amazon Titan Text Embeddings V2.

OpenSearch Serverless benchmarks using the Hierarchical Navigable Small Worlds (HNSW) algorithm with binary vectors have unveiled a 50% reduction in search OpenSearch Computing Units (OCUs), translating to cost savings for users. The use of binary indexes has resulted in significantly faster retrieval times. Traditional search methods often rely on computationally intensive calculations such as L2 and cosine distances, which can be resource-intensive. In contrast, binary indexes in Amazon OpenSearch Serverless operate on Hamming distances, a more efficient approach that accelerates search queries.

In the following sections we’ll discuss the how-to for binary embeddings with Amazon Titan Text Embeddings, binary vectors (and FP16) for vector engine, and binary embedding option for Amazon Bedrock Knowledge Bases To learn more about Amazon Bedrock Knowledge Bases, visit Knowledge Bases now delivers fully managed RAG experience in Amazon Bedrock.

Generate Binary Embeddings with Amazon Titan Text Embeddings V2

Amazon Titan Text Embeddings V2 now supports Binary Embeddings and is optimized for retrieval performance and accuracy across different dimension sizes (1024, 512, 256) with text support for more than 100 languages. By default, Amazon Titan Text Embeddings models produce embeddings at Floating Point 32 bit (FP32) precision. Although using a 1024-dimension vector of FP32 embeddings helps achieve better accuracy, it also leads to large storage requirements and related costs in retrieval use cases.

To generate binary embeddings in code, add the right embeddingTypes parameter in your invoke_model API request to Amazon Titan Text Embeddings V2:

import jsonimport boto3import numpy as nprt_client = boto3.client("bedrock-runtime")response = rt_client.invoke_model(modelId="amazon.titan-embed-text-v2:0",           body=json.dumps(               {                   "inputText":"What is Amazon Bedrock?",                   "embeddingTypes": ["binary","float"]               }))['body'].read()embedding = np.array(json.loads(response)["embeddingsByType"]["binary"], dtype=np.int8)

As in the request above, we can request either the binary embedding alone or both binary and float embeddings. The preceding embedding above is a 1024-length binary vector similar to:

array([0, 1, 1, ..., 0, 0, 0], dtype=int8)

For more information and sample code, refer to Amazon Titan Embeddings Text.

Configure Amazon Bedrock Knowledge Bases with Binary Vector Embeddings

You can use Amazon Bedrock Knowledge Bases, to take advantage of the Binary Embeddings with Amazon Titan Text Embeddings V2 and the binary vectors and Floating Point 16 bit (FP16) for vector engine in Amazon OpenSearch Serverless, without writing a single line of code. Follow these steps:

    On the Amazon Bedrock console, create a knowledge base. Provide the knowledge base details, including name and description, and create a new or use an existing service role with the relevant AWS Identity and Access Management (IAM) permissions. For information on creating service roles, refer to Service roles. Under Choose data source, choose Amazon S3, as shown in the following screenshot. Choose Next.
    Configure the data source. Enter a name and description. Define the source S3 URI. Under Chunking and parsing configurations, choose Default. Choose Next to continue.
    Complete the knowledge base setup by selecting an embeddings model. For this walkthrough, select Titan Text Embedding v2. Under Embeddings type, choose Binary vector embeddings. Under Vector dimensions, choose 1024. Choose Quick Create a New Vector Store. This option will configure a new Amazon Open Search Serverless store that supports the binary data type.

You can check the knowledge base details after creation to monitor the data source sync status. After the sync is complete, you can test the knowledge base and check the FM’s responses.

Conclusion

As we’ve explored throughout this post, Binary Embeddings are an option in Amazon Titan Text Embeddings V2 models available in Amazon Bedrock and the binary vector store in OpenSearch Serverless. These features significantly reduce memory and disk needs in Amazon Bedrock and OpenSearch Serverless, resulting in fewer OCUs for the RAG solution. You’ll also experience better performance and improvement in latency, but there will be some impact on the accuracy of the results compared to using the full float data type (FP32). Although the drop in accuracy is minimal, you have to decide if it suits your application. The specific benefits will vary based on factors such as the volume of data, search traffic, and storage requirements, but the examples discussed in this post illustrate the potential value.

Binary Embeddings support in Amazon Open Search Serverless, Amazon Bedrock Knowledge Bases, and Amazon Titan Text Embeddings v2 are available today in all AWS Regions where the services are already available. Check the Region list for details and future updates. To learn more about Amazon Knowledge Bases, visit the Amazon Bedrock Knowledge Bases product page. For more information regarding Amazon Titan Text Embeddings, visit Amazon Titan in Amazon Bedrock. For more information on Amazon OpenSearch Serverless, visit the Amazon OpenSearch Serverless  product page. For pricing details, review the Amazon Bedrock pricing page.

Give the new feature a try in the Amazon Bedrock console today. Send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS contacts and engage with the generative AI builder community at community.aws.


About the Authors

Shreyas Subramanian is a principal data scientist and helps customers by using generative AI and deep learning to solve their business challenges using AWS services. Shreyas has a background in large-scale optimization and ML and in the use of ML and reinforcement learning for accelerating optimization tasks.

Ron Widha is a Senior Software Development Manager with Amazon Bedrock Knowledge Bases, helping customers easily build scalable RAG applications.

Satish Nandi is a Senior Product Manager with Amazon OpenSearch Service. He is focused on OpenSearch Serverless and has years of experience in networking, security and AI/ML. He holds a bachelor’s degree in computer science and an MBA in entrepreneurship. In his free time, he likes to fly airplanes and hang gliders and ride his motorcycle.

Vamshi Vijay Nakkirtha is a Senior Software Development Manager working on the OpenSearch Project and Amazon OpenSearch Service. His primary interests include distributed systems.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Titan 文本嵌入 二进制嵌入 Amazon Bedrock OpenSearch Serverless
相关文章