Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock

Today, we are excited to announce general availability of batch inference for Amazon Bedrock. This new feature enables organizations to process large volumes of data when interacting with foundation models (FMs), addressing a critical need in various industries, including call center operations.

Call center transcript summarization has become an essential task for businesses seeking to extract valuable insights from customer interactions. As the volume of call data grows, traditional analysis methods struggle to keep pace, creating a demand for a scalable solution.

Batch inference presents itself as a compelling approach to tackle this challenge. By processing substantial volumes of text transcripts in batches, frequently using parallel processing techniques, this method offers benefits compared to real-time or on-demand processing approaches. It is particularly well suited for large-scale call center operations where instantaneous results are not always a requirement.

In the following sections, we provide a detailed, step-by-step guide on implementing these new capabilities, covering everything from data preparation to job submission and output analysis. We also explore best practices for optimizing your batch inference workflows on Amazon Bedrock, helping you maximize the value of your data across different use cases and industries.

Solution overview

The batch inference feature in Amazon Bedrock provides a scalable solution for processing large volumes of data across various domains. This fully managed feature allows organizations to submit batch jobs through a CreateModelInvocationJob API or on the Amazon Bedrock console, simplifying large-scale data processing tasks.

In this post, we demonstrate the capabilities of batch inference using call center transcript summarization as an example. This use case serves to illustrate the broader potential of the feature for handling diverse data processing tasks. The general workflow for batch inference consists of three main phases:

Data preparation

Format and upload your inference data

Batch job submission

Output collection and analysis

By walking through this specific implementation, we aim to showcase how you can adapt batch inference to suit various data processing needs, regardless of the data source or nature.

Prerequisites

To use the batch inference feature, make sure you have satisfied the following requirements:

AWS account

Amazon Simple Storage Service

Uploading objects

supported models and their capabilities

Text to embeddings Text to text Text to image Image to images Image to embeddings

AWS Identity and Access Management

role for batch inference

Prepare the data

Before you initiate a batch inference job for call center transcript summarization, it’s crucial to properly format and upload your data. The input data should be in JSONL format, with each line representing a single transcript for summarization.

Each line in your JSONL file should follow this structure:

{"recordId": "11 character alphanumeric string", "modelInput": {JSON body}}

Here, recordId is an 11-character alphanumeric string, working as a unique identifier for each entry. If you omit this field, the batch inference job will automatically add it in the output.

The format of the modelInput JSON object should match the body field for the model that you use in the InvokeModel request. For example, if you’re using Anthropic Claude 3 on Amazon Bedrock, you should use the MessageAPI and your model input might look like the following code:

{"recordId": "CALL0000001",  "modelInput": {     "anthropic_version": "bedrock-2023-05-31",      "max_tokens": 1024,     "messages": [ {            "role": "user",            "content": [{"type":"text", "text":"Summarize the following call transcript: ...." ]} ],      }}

When preparing your data, keep in mind the quotas for batch inference listed in the following table.

Limit Name	Value	Adjustable Through Service Quotas?
Maximum number of batch jobs per account per model ID using a foundation model	3	Yes
Maximum number of batch jobs per account per model ID using a custom model	3	Yes
Maximum number of records per file	50,000	Yes
Maximum number of records per job	50,000	Yes
Minimum number of records per job	1,000	No
Maximum size per file	200 MB	Yes
Maximum size for all files across job	1 GB	Yes

Make sure your input data adheres to these size limits and format requirements for optimal processing. If your dataset exceeds these limits, considering splitting it into multiple batch jobs.

Start the batch inference job

After you have prepared your batch inference data and stored it in Amazon S3, there are two primary methods to initiate a batch inference job: using the Amazon Bedrock console or API.

Run the batch inference job on the Amazon Bedrock console

Let’s first explore the step-by-step process of starting a batch inference job through the Amazon Bedrock console.

Inference

Batch inference

Create job

Job name

Input data

Output data

Customize encryption settings

Service access

Use an existing service role

Create and use a new service role

Tags

Create batch inference job

You can check the status of your batch inference job by choosing the corresponding job name on the Amazon Bedrock console. When the job is complete, you can see more job information, including model name, job duration, status, and locations of input and output data.

Run the batch inference job using the API

Alternatively, you can initiate a batch inference job programmatically using the AWS SDK. Follow these steps:

import boto3bedrock = boto3.client(service_name="bedrock")

input_data_config = {    "s3InputDataConfig": {        "s3Uri": "s3://{bucket_name}/{input_prefix}/your_input_data.jsonl"    }}output_data_config = {    "s3OutputDataConfig": {        "s3Uri": "s3://{bucket_name}/{output_prefix}/"    }}

response = bedrock.create_model_invocation_job(    roleArn="arn:aws:iam::{account_id}:role/{role_name}",    modelId="model-of-your-choice",    jobName="your-job-name",    inputDataConfig=input_data_config,    outputDataConfig=output_data_config)

job_arn = response.get('jobArn')status = bedrock.get_model_invocation_job(jobIdentifier=job_arn)['status']print(f"Job status: {status}")

Replace the placeholders {bucket_name}, {input_prefix}, {output_prefix}, {account_id}, {role_name}, your-job-name, and model-of-your-choice with your actual values.

By using the AWS SDK, you can programmatically initiate and manage batch inference jobs, enabling seamless integration with your existing workflows and automation pipelines.

Collect and analyze the output

When your batch inference job is complete, Amazon Bedrock creates a dedicated folder in the specified S3 bucket, using the job ID as the folder name. This folder contains a summary of the batch inference job, along with the processed inference data in JSONL format.

You can access the processed output through two convenient methods: on the Amazon S3 console or programmatically using the AWS SDK.

Access the output on the Amazon S3 console

To use the Amazon S3 console, complete the following steps:

Buckets

Inside this folder, you’ll find the processed data files, which you can browse or download as needed.

Access the output data using the AWS SDK

Alternatively, you can access the processed data programmatically using the AWS SDK. In the following code example, we show the output for the Anthropic Claude 3 model. If you used a different model, update the parameter values according to the model you used.

The output files contain not only the processed text, but also observability data and the parameters used for inference. The following is an example in Python:

import boto3import json# Create an S3 clients3 = boto3.client('s3')# Set the S3 bucket name and prefix for the output filesbucket_name = 'your-bucket-name'prefix = 'your-output-prefix'filename = 'your-output-file.jsonl.out'# Read the JSON file from S3object_key = f"{prefix}{filename}"response = s3.get_object(Bucket=bucket_name, Key=object_key)json_data = response['Body'].read().decode('utf-8')# Initialize a listoutput_data = []# Process the JSON data. Showing example for Anthropic Claude 3 Model (update json keys as necessary for a different models) for line in json_data.splitlines():    data = json.loads(line)    request_id = data['recordId']        # Access the processed text    output_text = data['modelOutput']['content'][0]['text']        # Access observability data    input_tokens = data['modelOutput']['usage']['input_tokens']    output_tokens = data['modelOutput']['usage']['output_tokens']    model = data['modelOutput']['model']    stop_reason = data['modelOutput']['stop_reason']        # Access inference parameters    max_tokens = data['modelInput']['max_tokens']    temperature = data['modelInput']['temperature']    top_p = data['modelInput']['top_p']    top_k = data['modelInput']['top_k']        # Create a dictionary for the current record    output_entry = {        request_id: {            'output_text': output_text,            'observability': {                'input_tokens': input_tokens,                'output_tokens': output_tokens,                'model': model,                'stop_reason': stop_reason            },            'inference_params': {                'max_tokens': max_tokens,                'temperature': temperature,                'top_p': top_p,                'top_k': top_k            }        }    }        # Append the dictionary to the list    output_data.append(output_entry)

In this example using the Anthropic Claude 3 model, after we read the output file from Amazon S3, we process each line of the JSON data. We can access the processed text using data['modelOutput']['content'][0]['text'], the observability data such as input/output tokens, model, and stop reason, and the inference parameters like max tokens, temperature, top-p, and top-k.

In the output location specified for your batch inference job, you’ll find a manifest.json.out file that provides a summary of the processed records. This file includes information such as the total number of records processed, the number of successfully processed records, the number of records with errors, and the total input and output token counts.

You can then process this data as needed, such as integrating it into your existing workflows, or performing further analysis.

Remember to replace your-bucket-name, your-output-prefix, and your-output-file.jsonl.out with your actual values.

By using the AWS SDK, you can programmatically access and work with the processed data, observability information, inference parameters, and the summary information from your batch inference jobs, enabling seamless integration with your existing workflows and data pipelines.

Conclusion

Batch inference for Amazon Bedrock provides a solution for processing multiple data inputs in a single API call, as illustrated through our call center transcript summarization example. This fully managed service is designed to handle datasets of varying sizes, offering benefits for various industries and use cases.

We encourage you to implement batch inference in your projects and experience how it can optimize your interactions with FMs at scale.

About the Authors

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Rahul Virbhadra Mishra is a Senior Software Engineer at Amazon Bedrock. He is passionate about delighting customers through building practical solutions for AWS and Amazon. Outside of work, he enjoys sports and values quality time with his family.

Mohd Altaf is an SDE at AWS AI Services based out of Seattle, United States. He works with AWS AI/ML tech space and has helped building various solutions across different teams at Amazon. In his spare time, he likes playing chess, snooker and indoor games.