AWS Machine Learning Blog 2024年08月22日
Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Amazon Bedrock 推出批处理推理功能,允许用户对大量数据进行处理,特别适用于呼叫中心转录摘要等用例。该功能提供可扩展的解决方案,可通过 Amazon Bedrock 控制台或 API 提交批处理作业,简化大规模数据处理任务。文章详细介绍了数据准备、作业提交和输出分析等步骤,并提供了最佳实践,帮助用户优化批处理推理工作流程,最大程度地利用数据价值。

😄 **数据准备**:为了进行批处理推理,需要将数据准备成 JSONL 格式,每行代表一个要进行摘要的转录。每个 JSONL 文件中的行应遵循特定的结构,包括 `recordId` 和 `modelInput`。`recordId` 是一个 11 个字符的字母数字字符串,作为每个条目的唯一标识符。`modelInput` 是一个 JSON 对象,其格式应与您在 InvokeModel 请求中使用的模型的正文字段匹配。例如,如果您使用 Amazon Bedrock 上的 Anthropic Claude 3,则应使用 MessageAPI,并且您的模型输入可能如下所示: { "recordId": "CALL0000001", "modelInput": { "anthropic_version": "bedrock-2023-05-31", "max_tokens": 1024, "messages": [{ "role": "user", "content": [{ "type": "text", "text": "Summarize the following call transcript: ...." }] }] } } 此外,还需要注意批处理推理的配额,例如每个文件最多 50,000 条记录、每个作业最多 50,000 条记录、每个文件最大 200 MB 以及所有文件总大小最大 1 GB。

😊 **批处理作业提交**:准备好批处理推理数据并将其存储在 Amazon S3 中后,可以通过 Amazon Bedrock 控制台或 API 两种方式启动批处理推理作业。在 Amazon Bedrock 控制台中,选择“推理”,然后选择“批处理推理”并点击“创建作业”。输入作业名称、选择 FM、指定输入数据和输出数据的 S3 位置,并配置访问权限。您也可以使用 AWS SDK 以编程方式启动批处理推理作业。

😉 **输出收集和分析**:批处理推理作业完成后,Amazon Bedrock 会在指定的 S3 存储桶中创建一个专用的文件夹,使用作业 ID 作为文件夹名称。该文件夹包含批处理推理作业的摘要以及以 JSONL 格式处理的推理数据。您可以通过 Amazon S3 控制台或使用 AWS SDK 以编程方式访问处理后的输出。

😎 **最佳实践**:为了优化批处理推理工作流程,可以使用以下最佳实践: * 使用合适大小的数据集:为了确保高效的处理,建议将数据集拆分成多个较小的批次。 * 使用合适的模型:选择与您的数据和用例匹配的模型。 * 监控作业状态:定期检查作业状态,以确保它们按预期运行。 * 分析输出:仔细分析输出数据,以了解模型的性能和结果。

Today, we are excited to announce general availability of batch inference for Amazon Bedrock. This new feature enables organizations to process large volumes of data when interacting with foundation models (FMs), addressing a critical need in various industries, including call center operations.

Call center transcript summarization has become an essential task for businesses seeking to extract valuable insights from customer interactions. As the volume of call data grows, traditional analysis methods struggle to keep pace, creating a demand for a scalable solution.

Batch inference presents itself as a compelling approach to tackle this challenge. By processing substantial volumes of text transcripts in batches, frequently using parallel processing techniques, this method offers benefits compared to real-time or on-demand processing approaches. It is particularly well suited for large-scale call center operations where instantaneous results are not always a requirement.

In the following sections, we provide a detailed, step-by-step guide on implementing these new capabilities, covering everything from data preparation to job submission and output analysis. We also explore best practices for optimizing your batch inference workflows on Amazon Bedrock, helping you maximize the value of your data across different use cases and industries.

Solution overview

The batch inference feature in Amazon Bedrock provides a scalable solution for processing large volumes of data across various domains. This fully managed feature allows organizations to submit batch jobs through a CreateModelInvocationJob API or on the Amazon Bedrock console, simplifying large-scale data processing tasks.

In this post, we demonstrate the capabilities of batch inference using call center transcript summarization as an example. This use case serves to illustrate the broader potential of the feature for handling diverse data processing tasks. The general workflow for batch inference consists of three main phases:

By walking through this specific implementation, we aim to showcase how you can adapt batch inference to suit various data processing needs, regardless of the data source or nature.

Prerequisites

To use the batch inference feature, make sure you have satisfied the following requirements:

Prepare the data

Before you initiate a batch inference job for call center transcript summarization, it’s crucial to properly format and upload your data. The input data should be in JSONL format, with each line representing a single transcript for summarization.

Each line in your JSONL file should follow this structure:

{"recordId": "11 character alphanumeric string", "modelInput": {JSON body}}

Here, recordId is an 11-character alphanumeric string, working as a unique identifier for each entry. If you omit this field, the batch inference job will automatically add it in the output.

The format of the modelInput JSON object should match the body field for the model that you use in the InvokeModel request. For example, if you’re using Anthropic Claude 3 on Amazon Bedrock, you should use the MessageAPI and your model input might look like the following code:

{"recordId": "CALL0000001",  "modelInput": {     "anthropic_version": "bedrock-2023-05-31",      "max_tokens": 1024,     "messages": [ {            "role": "user",            "content": [{"type":"text", "text":"Summarize the following call transcript: ...." ]} ],      }}

When preparing your data, keep in mind the quotas for batch inference listed in the following table.

Limit Name Value Adjustable Through Service Quotas?
Maximum number of batch jobs per account per model ID using a foundation model 3 Yes
Maximum number of batch jobs per account per model ID using a custom model 3 Yes
Maximum number of records per file 50,000 Yes
Maximum number of records per job 50,000 Yes
Minimum number of records per job 1,000 No
Maximum size per file 200 MB Yes
Maximum size for all files across job 1 GB Yes

Make sure your input data adheres to these size limits and format requirements for optimal processing. If your dataset exceeds these limits, considering splitting it into multiple batch jobs.

Start the batch inference job

After you have prepared your batch inference data and stored it in Amazon S3, there are two primary methods to initiate a batch inference job: using the Amazon Bedrock console or API.

Run the batch inference job on the Amazon Bedrock console

Let’s first explore the step-by-step process of starting a batch inference job through the Amazon Bedrock console.

    On the Amazon Bedrock console, choose Inference in the navigation pane. Choose Batch inference and choose Create job. For Job name, enter a name for the training job, then choose an FM from the list. In this example, we choose Anthropic Claude-3 Haiku as the FM for our call center transcript summarization job.
    Under Input data, specify the S3 location for your prepared batch inference data.
    Under Output data, enter the S3 path for the bucket storing batch inference outputs. Your data is encrypted by default with an AWS managed key. If you want to use a different key, select Customize encryption settings.
    Under Service access, select a method to authorize Amazon Bedrock. You can select Use an existing service role if you have an access role with fine-grained IAM policies or select Create and use a new service role. Optionally, expand the Tags section to add tags for tracking. After you have added all the required configurations for your batch inference job, choose Create batch inference job.

You can check the status of your batch inference job by choosing the corresponding job name on the Amazon Bedrock console. When the job is complete, you can see more job information, including model name, job duration, status, and locations of input and output data.

Run the batch inference job using the API

Alternatively, you can initiate a batch inference job programmatically using the AWS SDK. Follow these steps:

    Create an Amazon Bedrock client:
    import boto3bedrock = boto3.client(service_name="bedrock")
    Configure the input and output data:
    input_data_config = {    "s3InputDataConfig": {        "s3Uri": "s3://{bucket_name}/{input_prefix}/your_input_data.jsonl"    }}output_data_config = {    "s3OutputDataConfig": {        "s3Uri": "s3://{bucket_name}/{output_prefix}/"    }}
    Start the batch inference job:
    response = bedrock.create_model_invocation_job(    roleArn="arn:aws:iam::{account_id}:role/{role_name}",    modelId="model-of-your-choice",    jobName="your-job-name",    inputDataConfig=input_data_config,    outputDataConfig=output_data_config)
    Retrieve and monitor the job status:
    job_arn = response.get('jobArn')status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)['status']print(f"Job status: {status}")

Replace the placeholders {bucket_name}, {input_prefix}, {output_prefix}, {account_id}, {role_name}, your-job-name, and model-of-your-choice with your actual values.

By using the AWS SDK, you can programmatically initiate and manage batch inference jobs, enabling seamless integration with your existing workflows and automation pipelines.

Collect and analyze the output

When your batch inference job is complete, Amazon Bedrock creates a dedicated folder in the specified S3 bucket, using the job ID as the folder name. This folder contains a summary of the batch inference job, along with the processed inference data in JSONL format.

You can access the processed output through two convenient methods: on the Amazon S3 console or programmatically using the AWS SDK.

Access the output on the Amazon S3 console

To use the Amazon S3 console, complete the following steps:

    On the Amazon S3 console, choose Buckets in the navigation pane. Navigate to the bucket you specified as the output destination for your batch inference job. Within the bucket, locate the folder with the batch inference job ID.

Inside this folder, you’ll find the processed data files, which you can browse or download as needed.

Access the output data using the AWS SDK

Alternatively, you can access the processed data programmatically using the AWS SDK. In the following code example, we show the output for the Anthropic Claude 3 model. If you used a different model, update the parameter values according to the model you used.

The output files contain not only the processed text, but also observability data and the parameters used for inference. The following is an example in Python:

import boto3import json# Create an S3 clients3 = boto3.client('s3')# Set the S3 bucket name and prefix for the output filesbucket_name = 'your-bucket-name'prefix = 'your-output-prefix'filename = 'your-output-file.jsonl.out'# Read the JSON file from S3object_key = f"{prefix}{filename}"response = s3.get_object(Bucket=bucket_name, Key=object_key)json_data = response['Body'].read().decode('utf-8')# Initialize a listoutput_data = []# Process the JSON data. Showing example for Anthropic Claude 3 Model (update json keys as necessary for a different models) for line in json_data.splitlines():    data = json.loads(line)    request_id = data['recordId']        # Access the processed text    output_text = data['modelOutput']['content'][0]['text']        # Access observability data    input_tokens = data['modelOutput']['usage']['input_tokens']    output_tokens = data['modelOutput']['usage']['output_tokens']    model = data['modelOutput']['model']    stop_reason = data['modelOutput']['stop_reason']        # Access inference parameters    max_tokens = data['modelInput']['max_tokens']    temperature = data['modelInput']['temperature']    top_p = data['modelInput']['top_p']    top_k = data['modelInput']['top_k']        # Create a dictionary for the current record    output_entry = {        request_id: {            'output_text': output_text,            'observability': {                'input_tokens': input_tokens,                'output_tokens': output_tokens,                'model': model,                'stop_reason': stop_reason            },            'inference_params': {                'max_tokens': max_tokens,                'temperature': temperature,                'top_p': top_p,                'top_k': top_k            }        }    }        # Append the dictionary to the list    output_data.append(output_entry)

In this example using the Anthropic Claude 3 model, after we read the output file from Amazon S3, we process each line of the JSON data. We can access the processed text using data['modelOutput']['content'][0]['text'], the observability data such as input/output tokens, model, and stop reason, and the inference parameters like max tokens, temperature, top-p, and top-k.

In the output location specified for your batch inference job, you’ll find a manifest.json.out file that provides a summary of the processed records. This file includes information such as the total number of records processed, the number of successfully processed records, the number of records with errors, and the total input and output token counts.

You can then process this data as needed, such as integrating it into your existing workflows, or performing further analysis.

Remember to replace your-bucket-name, your-output-prefix, and your-output-file.jsonl.out with your actual values.

By using the AWS SDK, you can programmatically access and work with the processed data, observability information, inference parameters, and the summary information from your batch inference jobs, enabling seamless integration with your existing workflows and data pipelines.

Conclusion

Batch inference for Amazon Bedrock provides a solution for processing multiple data inputs in a single API call, as illustrated through our call center transcript summarization example. This fully managed service is designed to handle datasets of varying sizes, offering benefits for various industries and use cases.

We encourage you to implement batch inference in your projects and experience how it can optimize your interactions with FMs at scale.


About the Authors

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Rahul Virbhadra Mishra is a Senior Software Engineer at Amazon Bedrock. He is passionate about delighting customers through building practical solutions for AWS and Amazon. Outside of work, he enjoys sports and values quality time with his family.

Mohd Altaf is an SDE at AWS AI Services based out of Seattle, United States. He works with AWS AI/ML tech space and has helped building various solutions across different teams at Amazon. In his spare time, he likes playing chess, snooker and indoor games.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon Bedrock 批处理推理 呼叫中心转录摘要 大规模数据处理 人工智能
相关文章