AWS Machine Learning Blog 2024年07月19日
Intelligent document processing using Amazon Bedrock and Anthropic Claude
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何使用 Amazon Bedrock 和 Anthropic Claude 3 Sonnet 模型构建智能文档处理 (IDP) 解决方案,以从扫描的文档中提取数据并将其插入数据库。该解决方案利用 Anthropic Claude 3 Sonnet 模型的强大功能,能够理解各种视觉格式,包括照片、图表、图形和技术图表,并以结构化的 JSON 格式提取相关数据。文章还介绍了如何使用 S3、Lambda、SQS 和 DynamoDB 等 AWS 服务来构建无服务器架构,以实现高效、可扩展和可靠的 IDP 工作流程。

🤔 该解决方案使用 Amazon Bedrock 和 Anthropic Claude 3 Sonnet 模型来实现 IDP 功能。架构包括几个与 Amazon Bedrock 无缝集成的 AWS 服务,从而能够有效地从扫描的文档中准确提取数据。

🚀 该解决方案从将扫描的文档上传到 Amazon S3 存储桶开始,当对象上传时,会触发 S3 事件通知。

🤖 该事件会调用一个 AWS Lambda 函数,该函数负责在 Amazon Bedrock 上调用 Anthropic Claude 3 Sonnet 模型。Anthropic Claude 3 Sonnet 模型利用其先进的多模式功能来处理扫描的文档并以结构化的 JSON 格式提取相关数据。

📨 提取自 Anthropic Claude 3 模型的数据将发送到 Amazon Simple Queue Service (Amazon SQS) 队列。Amazon SQS 充当缓冲区,允许组件可靠地发送和接收消息,而无需直接耦合,从而在系统中提供可扩展性和容错能力。

🗃️ 另一个 Lambda 函数从 SQS 队列中读取消息,解析 JSON 数据并将提取的键值对存储在 Amazon DynamoDB 表中,以便检索和进一步处理。

💡 该无服务器架构利用了 AWS 服务的可扩展性和成本效益,同时利用了 Anthropic Claude 3 Sonnet 的尖端智能。通过将 AWS 的强大基础设施与 Anthropic 的 FM 相结合,该解决方案使组织能够简化其文档处理工作流程,提取有价值的见解并提高整体运营效率。

💻 此解决方案中使用了以下服务和功能:Amazon Bedrock、Anthropic Claude 3 模型系列、Amazon DynamoDB、AWS Lambda、Amazon SQS 和 Amazon S3。

Generative artificial intelligence (AI) not only empowers innovation through ideation, content creation, and enhanced customer service, but also streamlines operations and boosts productivity across various domains. To effectively harness this transformative technology, Amazon Bedrock offers a fully managed service that integrates high-performing foundation models (FMs) from leading AI companies, such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, Mistral AI, and Amazon. By providing access to these advanced models through a single API and supporting the development of generative AI applications with an emphasis on security, privacy, and responsible AI, Amazon Bedrock enables you to use AI to explore new avenues for innovation and improve overall offerings.

Enterprise customers can unlock significant value by harnessing the power of intelligent document processing (IDP) augmented with generative AI. By infusing IDP solutions with generative AI capabilities, organizations can revolutionize their document processing workflows, achieving exceptional levels of automation and reliability. This combination enables advanced document understanding, highly effective structured data extraction, automated document classification, and seamless information retrieval from unstructured text. With these capabilities, organizations can achieve scalable, efficient, and high-value document processing that drives business transformation and competitiveness, ultimately leading to improved productivity, reduced costs, and enhanced decision-making.

In this post, we show how to develop an IDP solution using Anthropic Claude 3 Sonnet on Amazon Bedrock. We demonstrate how to extract data from a scanned document and insert it into a database.

The Anthropic Claude 3 Sonnet model is optimized for speed and efficiency, making it an excellent choice for intelligent tasks—particularly for enterprise workloads. It also possesses sophisticated vision capabilities, demonstrating a strong aptitude for understanding a wide range of visual formats, including photos, charts, graphs, and technical diagrams. Although we demonstrate this solution using the Anthropic Claude 3 Sonnet model, you can alternatively use the Haiku and Opus models if your use case requires them.

Solution overview

The proposed solution uses Amazon Bedrock and the powerful Anthropic Claude 3 Sonnet model to enable IDP capabilities. The architecture consists of several AWS services seamlessly integrated with the Amazon Bedrock, enabling efficient and accurate extraction of data from scanned documents.

The following diagram illustrates our solution architecture.

The solution consists of the following steps:

    The process begins with scanned documents being uploaded and stored in an Amazon Simple Storage Service (Amazon S3) bucket, which invokes an S3 Event Notification on object upload. This event invokes an AWS Lambda function, responsible for invoking the Anthropic Claude 3 Sonnet model on Amazon Bedrock. The Anthropic Claude 3 Sonnet model, with its advanced multimodal capabilities, processes the scanned documents and extracts relevant data in a structured JSON format. The extracted data from the Anthropic Claude 3 model is sent to an Amazon Simple Queue Service (Amazon SQS) queue. Amazon SQS acts as a buffer, allowing components to send and receive messages reliably without being directly coupled, providing scalability and fault tolerance in the system. Another Lambda function consumes the messages from the SQS queue, parses the JSON data, and stores the extracted key-value pairs in an Amazon DynamoDB table for retrieval and further processing.

This serverless architecture takes advantage of the scalability and cost-effectiveness of AWS services while harnessing the cutting-edge intelligence of Anthropic Claude 3 Sonnet. By combining the robust infrastructure of AWS with Anthropic’s FMs, this solution enables organizations to streamline their document processing workflows, extract valuable insights, and enhance overall operational efficiency.

The solution uses the following services and features:

In this solution, we use the generative AI capabilities in Amazon Bedrock to efficiently extract data. As of writing of this post, Anthropic Claude 3 Sonnet only accepts images as input. The supported file types are GIF, JPEG, PNG, and WebP. You can choose to save images during the scanning process or convert the PDF to images.

You can also enhance this solution by implementing human-in-the-loop and model evaluation features. The goal of this post is to demonstrate how you can build an IDP solution using Amazon Bedrock, but to use this as a production-scale solution, additional considerations should be taken into account, such as testing for edge case scenarios, better exception handling, trying additional prompting techniques, model fine-tuning, model evaluation, throughput requirements, number of concurrent requests to be supported, and carefully considering cost and latency implications.

Prerequisites

You need the following prerequisites before you can proceed with this solution. For this post, we use the us-east-1 AWS Region. For details on available Regions, see Amazon Bedrock endpoints and quotas.

Use case and dataset

For our example use case, let’s look at a state agency responsible for issuing birth certificates. The agency may receive birth certificate applications through various methods, such as online applications, forms completed at a physical location, and mailed-in completed paper applications. Today, most agencies spend a considerable amount of time and resources to manually extract the application details. The process begins with scanning the application forms, manually extracting the details, and then entering them into an application that eventually stores the data into a database. This process is time-consuming, inefficient, not scalable, and error-prone. Additionally, it adds complexity if the application form is in a different language (such as Spanish).

For this demonstration, we use sample scanned images of birth certificate application forms. These forms don’t contain any real personal data. Two examples are provided: one in English (handwritten) and another in Spanish (printed). Save these images as .jpeg files to your computer. You need them later for testing the solution.

Create an S3 bucket

On the Amazon S3 console, create a new bucket with a unique name (for example, bedrock-claude3-idp-{random characters to make it globally unique}) and leave the other settings as default. Within the bucket, create a folder named images and a sub-folder named birth_certificates.

Create an SQS queue

On the Amazon SQS console, create a queue with the Standard queue type, provide a name (for example, bedrock-idp-extracted-data), and leave the other settings as default.

Create a Lambda function to invoke the Amazon Bedrock model

On the Lambda console, create a function (for example, invoke_bedrock_claude3), choose Python 3.12 for the runtime, and leave the remaining settings as default. Later, you configure this function to be invoked every time a new image is uploaded into the S3 bucket. You can download the entire Lambda function code from invoke_bedrock_claude3.py. Replace the contents of the lambda_function.py file with the code from the downloaded file. Make sure to substitute {SQS URL} with the URL of the SQS queue you created earlier, then choose Deploy.

The Lambda function should perform the following actions:

s3 = boto3.client('s3')sqs = boto3.client('sqs')bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')QUEUE_URL = {SQS URL}MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0"

The following code gets the image from the S3 bucket using the get_object method and converts it to base64 data:

image_data = s3.get_object(Bucket=bucket_name, Key=object_key)['Body'].read()base64_image = base64.b64encode(image_data).decode('utf-8')

Prompt engineering is a critical factor in unlocking the full potential of generative AI applications like IDP. Crafting well-structured prompts makes sure that the AI system’s outputs are accurate, relevant, and aligned with your objectives, while mitigating potential risks.

With the Anthropic Claude 3 model integrated into the Amazon Bedrock IDP solution, you can use the model’s impressive visual understanding capabilities to effortlessly extract data from documents. Simply provide the image or document as input, and Anthropic Claude 3 will comprehend its contents, seamlessly extracting the desired information and presenting it in a human-readable format. All Anthropic Claude 3 models are capable of understanding non-English languages such as Spanish, Japanese, and French. In this particular use case, we demonstrate how to translate Spanish application forms into English by providing the appropriate prompt instructions.

However, LLMs like Anthropic Claude 3 can exhibit variability in their response formats. To achieve consistent and structured output, you can tailor your prompts to instruct the model to return the extracted data in a specific format, such as JSON with predefined keys. This approach enhances the interoperability of the model’s output with downstream applications and streamlines data processing workflows.

The following is the prompt with the specific JSON output format:

prompt = """This image shows a birth certificate application form. Please precisely copy all the relevant information from the form.Leave the field blank if there is no information in corresponding field.If the image is not a birth certificate application form, simply return an empty JSON object. If the application form is not filled, leave the fees attributes blank. Translate any non-English text to English. Organize and return the extracted data in a JSON format with the following keys:{    "applicantDetails":{        "applicantName": "",        "dayPhoneNumber": "",        "address": "",        "city": "",        "state": "",        "zipCode": "",        "email":""    },    "mailingAddress":{        "mailingAddressApplicantName": "",        "mailingAddress": "",        "mailingAddressCity": "",        "mailingAddressState": "",        "mailingAddressZipCode": ""    },    "relationToApplicant":[""],    "purposeOfRequest": "",        "BirthCertificateDetails":    {        "nameOnBirthCertificate": "",        "dateOfBirth": "",        "sex": "",        "cityOfBirth": "",        "countyOfBirth": "",        "mothersMaidenName": "",        "fathersName": "",        "mothersPlaceOfBirth": "",        "fathersPlaceOfBirth": "",        "parentsMarriedAtBirth": "",        "numberOfChildrenBornInSCToMother": "",        "diffNameAtBirth":""    },    "fees":{        "searchFee": "",        "eachAdditionalCopy": "",        "expediteFee": "",        "totalFees": ""    }   }""" 

Invoke the Anthropic Claude 3 Sonnet model using the Amazon Bedrock API. Pass the prompt and the base64 image data as parameters:

def invoke_claude_3_multimodal(prompt, base64_image_data):    request_body = {        "anthropic_version": "bedrock-2023-05-31",        "max_tokens": 2048,        "messages": [            {                "role": "user",                "content": [                    {                        "type": "text",                        "text": prompt,                    },                    {                        "type": "image",                        "source": {                            "type": "base64",                            "media_type": "image/png",                            "data": base64_image_data,                        },                    },                ],            }        ],    }    try:        response = bedrock.invoke_model(modelId=MODEL_ID, body=json.dumps(request_body))        return json.loads(response['body'].read())    except bedrock.exceptions.ClientError as err:        print(f"Couldn't invoke Claude 3 Sonnet. Here's why: {err.response['Error']['Code']}: {err.response['Error']['Message']}")        raise

Send the Amazon Bedrock API response to the SQS queue using the send_message method:

def send_message_to_sqs(message_body):    try:        sqs.send_message(QueueUrl=QUEUE_URL, MessageBody=json.dumps(message_body))    except sqs.exceptions.ClientError as e:        print(f"Error sending message to SQS: {e.response['Error']['Code']}: {e.response['Error']['Message']}")

Next, modify the IAM role of the Lambda function to grant the required permissions:

    On the Lambda console, navigate to the function. On the Configuration tab, choose Permissions in the left pane. Choose the IAM role (for example, invoke_bedrock_claude3-role-{random chars}).

This will open the role on a new tab.

    In the Permissions policies section, choose Add permissions and Create inline policy. On the Create policy page, switch to the JSON tab in the policy editor. Enter the policy from the following code block, replacing {AWS Account ID} with your AWS account ID and {S3 Bucket Name} with your S3 bucket name. Choose Next. Enter a name for the policy (for example, invoke_bedrock_claude3-role-policy), and choose Create policy.
{    "Version": "2012-10-17",    "Statement": [{        "Effect": "Allow",        "Action": "bedrock:InvokeModel",        "Resource": "arn:aws:bedrock:us-east-1::foundation-model/*"    }, {        "Effect": "Allow",        "Action": "s3:GetObject",        "Resource": "arn:aws:s3:::{S3 Bucket Name}/*"    }, {        "Effect": "Allow",        "Action": "sqs:SendMessage",        "Resource": "arn:aws:sqs:us-east-1:{AWS Account ID}:bedrock-idp-extracted-data"    }]}

The policy will grant the following permissions:

Additionally, modify the Lambda function’s timeout to 2 minutes. By default, it’s set to 3 seconds.

Create an S3 Event Notification

To create an S3 Event Notification, complete the following steps:

    On the Amazon S3 console, open the bedrock-claude3-idp... S3 bucket. Navigate to Properties, and in the Event notifications section, create an event notification. Enter a name for Event name (for example, bedrock-claude3-idp-event-notification). Enter images/birth_certificates/ for the prefix. For Event Type, select Put in the Object creation section. For Destination, select Lambda function and choose invoke_bedrock_claude3. Choose Save changes.

Create a DynamoDB table

To store the extracted data in DynamoDB, you need to create a table. On the DynamoDB console, create a table called birth_certificates with Id as the partition key, and keep the remaining settings as default.

Create a Lambda function to insert records into the DynamoDB table

On the Lambda console, create a Lambda function (for example, insert_into_dynamodb), choose Python 3.12 for the runtime, and leave the remaining settings as default. You can download the entire Lambda function code from insert_into_dynamodb.py. Replace the contents of the lambda_function.py file with the code from the downloaded file and choose Deploy.

The Lambda function should perform the following actions:

Get the message from the SQS queue that contains the response from the Anthropic Claude 3 Sonnet model:

data = json.loads(event['Records'][0]['body'])['content'][0]['text']event_id = event['Records'][0]['messageId']data = json.loads(data)

Create objects representing DynamoDB and its table:

dynamodb = boto3.resource('dynamodb')table = dynamodb.Table('birth_certificates')

Get the key objects from the JSON data:

applicant_details = data.get('applicantDetails', {})    mailing_address = data.get('mailingAddress', {})    relation_to_applicant = data.get('relationToApplicant', [])    birth_certificate_details = data.get('BirthCertificateDetails', {})    fees = data.get('fees', {})

Insert the extracted data into DynamoDB table using put_item() method:

table.put_item(Item={'Id': event_id,'applicantName': applicant_details.get('applicantName', ''),'dayPhoneNumber': applicant_details.get('dayPhoneNumber', ''),'address': applicant_details.get('address', ''),'city': applicant_details.get('city', ''),'state': applicant_details.get('state', ''),'zipCode': applicant_details.get('zipCode', ''),'email': applicant_details.get('email', ''),'mailingAddressApplicantName': mailing_address.get('mailingAddressApplicantName', ''),'mailingAddress': mailing_address.get('mailingAddress', ''),'mailingAddressCity': mailing_address.get('mailingAddressCity', ''),'mailingAddressState': mailing_address.get('mailingAddressState', ''),'mailingAddressZipCode': mailing_address.get('mailingAddressZipCode', ''),'relationToApplicant': ', '.join(relation_to_applicant),'purposeOfRequest': data.get('purposeOfRequest', ''),'nameOnBirthCertificate': birth_certificate_details.get('nameOnBirthCertificate', ''),'dateOfBirth': birth_certificate_details.get('dateOfBirth', ''),'sex': birth_certificate_details.get('sex', ''),'cityOfBirth': birth_certificate_details.get('cityOfBirth', ''),'countyOfBirth': birth_certificate_details.get('countyOfBirth', ''),'mothersMaidenName': birth_certificate_details.get('mothersMaidenName', ''),'fathersName': birth_certificate_details.get('fathersName', ''),'mothersPlaceOfBirth': birth_certificate_details.get('mothersPlaceOfBirth', ''),'fathersPlaceOfBirth': birth_certificate_details.get('fathersPlaceOfBirth', ''),'parentsMarriedAtBirth': birth_certificate_details.get('parentsMarriedAtBirth', ''),'numberOfChildrenBornInSCToMother': birth_certificate_details.get('numberOfChildrenBornInSCToMother', ''),'diffNameAtBirth': birth_certificate_details.get('diffNameAtBirth', ''),'searchFee': fees.get('searchFee', ''),'eachAdditionalCopy': fees.get('eachAdditionalCopy', ''),'expediteFee': fees.get('expediteFee', ''),'totalFees': fees.get('totalFees', '')})

Next, modify the IAM role of the Lambda function to grant the required permissions. Follow the same steps you used to modify the permissions for the invoke_bedrock_claude3 Lambda function, but enter the following JSON as the inline policy:

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "VisualEditor0",            "Effect": "Allow",            "Action": "dynamodb:PutItem",            "Resource": "arn:aws:dynamodb:us-east-1::{AWS Account ID}:table/birth_certificates"        },        {            "Sid": "VisualEditor1",            "Effect": "Allow",            "Action": [                "sqs:DeleteMessage",                "sqs:ReceiveMessage",                "sqs:GetQueueAttributes"            ],            "Resource": "arn:aws:sqs:us-east-1::{AWS Account ID}:bedrock-idp-extracted-data"        }    ]}

Enter a policy name (for example, insert_into_dynamodb-role-policy) and choose Create policy.

The policy will grant the following permissions:

Configure the Lambda function trigger for SQS

Complete the following steps to create a trigger for the Lambda function:

    On the Amazon SQS console, open the bedrock-idp-extracted-data queue. On the Lambda triggers tab, choose Configure Lambda function trigger. Select the insert_into_dynamodb Lambda function and choose Save.

Test the solution

Now that you have created all the necessary resources, permissions, and code, it’s time to test the solution.

In the S3 folder birth_certificates, upload the two scanned images that you downloaded earlier. Then open the DynamoDB console and explore the items in the birth_certificates table.

If everything is configured properly, you should see two items in DynamoDB in just a few seconds, as shown in the following screenshots. For the Spanish form, Anthropic Claude 3 automatically translated the keys and labels from Spanish to English based on the prompt.

Troubleshooting

If you don’t see the extracted data in the DynamoDB table, you can investigate the issue:

Clean up

Clean up the resources created as part of this post to avoid incurring ongoing charges:

    Delete all the objects from the bedrock-claude3-idp... S3 bucket, then delete the bucket. Delete the two Lambda functions named invoke_bedrock_claude3 and insert_into_dynamodb. Delete the SQS queue named bedrock-idp-extracted-data. Delete the DynamoDB table named birth_certificates.

Example use cases and business value

The generative AI-powered IDP solution demonstrated in this post can benefit organizations across various industries, such as:

By using the power of generative AI and Amazon Bedrock, organizations can unlock the true potential of their data, driving operational excellence, enhancing customer experiences, and fostering continuous innovation.

Conclusion

In this post, we demonstrated how to use Amazon Bedrock and the powerful Anthropic Claude 3 Sonnet model to develop an IDP solution. By harnessing the advanced multimodal capabilities of Anthropic Claude 3, we were able to accurately extract data from scanned documents and store it in a structured format in a DynamoDB table.

Although this solution showcases the potential of generative AI in IDP, it may not be suitable for all IDP use cases. The effectiveness of the solution may vary depending on the complexity and quality of the documents, the amount of training data available, and the specific requirements of the organization.

To further enhance the solution, consider implementing a human-in-the-loop workflow to review and validate the extracted data, especially for mission-critical or sensitive applications. This will provide data accuracy and compliance with regulatory requirements. You can also explore the model evaluation feature in Amazon Bedrock to compare model outputs, and then choose the model best suited for your downstream generative AI applications.

For further exploration and learning, we recommend checking out the following resources:


About the Authors

Govind Palanisamy is a Solutions Architect at AWS, where he helps government agencies migrate and modernize their workloads to increase citizen experience. He is passionate about technology and transformation, and he helps customers transform their businesses using AI/ML and generative AI-based solutions.

Bharath Gunapati is a Sr. Solutions architect at AWS, where he helps clinicians, researchers, and staff at academic medical centers to adopt and use cloud technologies. He is passionate about technology and the impact it can make on healthcare and research.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Bedrock Anthropic Claude 3 智能文档处理 IDP 生成式 AI
相关文章