Classify call center conversations with Amazon Bedrock batch inference

In this post, we demonstrate how to build an end-to-end solution for text classification using the Amazon Bedrock batch inference capability with the Anthropic’s Claude Haiku model. Amazon Bedrock batch inference offers a 50% discount compared to the on-demand price, which is an important factor when dealing with a large number of requests. We walk through classifying travel agency call center conversations into categories, showcasing how to generate synthetic training data, process large volumes of text data, and automate the entire workflow using AWS services.

Challenges with high-volume text classification

Organizations across various sectors face a common challenge: the need to efficiently handle high-volume classification tasks. From travel agency call centers categorizing customer inquiries to sales teams analyzing lost opportunities and finance departments classifying invoices, these manual processes are a daily necessity. But these tasks come with significant challenges.

The manual approach to analyzing and categorizing these classification requests is not only time-intensive but also prone to inconsistencies. As teams process the high volume of data, the potential for errors and inefficiencies grows. By implementing automated systems to classify these interactions, multiple departments stand to gain substantial benefits. They can uncover hidden trends in their data, significantly enhance the quality of their customer service, and streamline their operations for greater efficiency.

However, the path to effective automated classification has its own challenges. Organizations must grapple with the complexities of efficiently processing vast amounts of textual information while maintaining consistent accuracy in their classification results. In this post, we demonstrate how to create a fully automated workflow while keeping operational costs under control.

Data

For this solution, we used synthetic call center conversation data. For realistic training data that maintains user privacy, we generated synthetic conversations using Anthropic’s Claude 3.7 Sonnet. We used the following prompt generate synthetic data:

Task: Generate <N> synthetic conversations from customer calls to an imaginary travelcompany. Come up with 10 most probable categories that calls of this nature can come from and treat them as classification categories for these calls. For each generated call create a column that indicates the category for that call. Conversations should follow the following format:"User: ...Agent: ...User: ...Agent: ......Class: One of the 10 following categories that is most relevant to the conversation."Ten acceptable classes:1. Booking Inquiry - Customer asking about making new reservations2. Reservation Change - Customer wanting to modify existing bookings3. Cancellation Request - Customer seeking to cancel their travel plans4. Refund Issues - Customer inquiring about getting money back5. Travel Information - Customer seeking details about destinations, documentation, etc.6. Complaint - Customer expressing dissatisfaction with service7. Payment Problem - Customer having issues with billing or payments8. Loyalty Program - Customer asking about rewards points or membership status9. Special Accommodation - Customer requesting special arrangements10. Technical Support - Customer having issues with website, app or booking systemsInstructions:- Keep conversations concise- Use John Doe for male names and Jane Doe for female names- Use john.doe@email.com for male email address,jane.doe@email.com for female email address and corporate@email.com for corporate email address, whenever you need to generate emails.Use \" or ' instead of " whenever there is a quote within theconversation

The synthetic dataset includes the following information:

Customer inquiries about flight bookings Hotel reservation discussions Travel package negotiations Customer service complaints General travel inquiries

Solution overview

The solution architecture uses a serverless, event-driven, scalable design to effectively handle and classify large quantities of classification requests. Built on AWS, it automatically starts working when new classification request data arrives in an Amazon Simple Storage Service (Amazon S3) bucket. The system then uses Amazon Bedrock batch processing to analyze and categorize the content at scale, minimizing the need for constant manual oversight.

The following diagram illustrates the solution architecture.

The architecture follows a well-structured flow that facilitates reliable processing of classification requests:

Data preparation

Amazon Simple Queue Service

AWS Lambda

Batch inference

CreateModelInvocationJob

Classification results processing

Analytics

AWS Glue

Amazon Athena

Amazon QuickSight

We use AWS best practices in this solution, including event-driven and batch processing for optimal resource utilization, batch operations for cost-effectiveness, decoupled components for independent scaling, and least privilege access patterns. We implemented the system using the AWS Cloud Development Kit (AWS CDK) with TypeScript for infrastructure as code (IaC) and Python for application logic, making sure we achieve seamless automation, dynamic scaling, and efficient processing of classification requests, positioning it to effectively address both current requirements and future demands.

Prerequisites

To perform the solution, you must have the following prerequisites:

AWS account

supported Regions

selected model has been enabled

Region

group

quicksight-access

code

AWS CDK CLI reference

Deploy the solution

The solution is accessible in the GitHub repository.

Complete the following steps to set up and deploy the solution:

Clone the Repository:

git clone git@github.com:aws-samples/sample-genai-bedrock-batch-classifier.git

Set Up AWS Credentials:

AWS Identity and Access Management

AWS Command Line Interface

Authenticating using IAM user credentials for the AWS CLI

Bootstrap the Application:

npm install & cdk bootstrap --profile {your_profile_name}

{your_profile_name}

Deploy the Solution:

cdk deploy --all --profile {your_profile_name}

{your_profile_name}

After you complete the deployment process, you will see a total of six stacks created in your AWS account, as illustrated in the following screenshot.

SharedStack acts as a central hub for resources that multiple parts of the system need to access. Within this stack, there are two S3 buckets: one handles internal operations behind the scenes, and the other serves as a bridge between the system and customers, so they can both submit their classification requests and retrieve their results.

DataPreparationStack serves as a data transformation engine. It’s designed to handle incoming files in three specific formats: XLSX, CSV, and JSON, which at the time of writing are the only supported input formats. This stack’s primary role is to convert these inputs into the specialized JSONL format required by Amazon Bedrock. The data processing script is available in the GitHub repo. This transformation makes sure that incoming data, regardless of its original format, is properly structured before being processed by Amazon Bedrock. The format is as follows:

{ "recordId": ${unique_id},  "modelInput": {     "anthropic_version": "bedrock-2023-05-31",      "max_tokens": 1024,     "messages": [ {            "role": "user",            "content": [{"type":"text", "text": ${initial_text}]} ],      },      "system": ${prompt}}where:initial_text - text that you want to classifyprompt       - instructions to Bedrock service how to classifyunique_id    - id coming from the upstream service, otherwise it will be                automatically generated by the code

BatchClassifierStack handles the classification operations. Although currently powered by Anthropic’s Claude Haiku, the system maintains flexibility by allowing straightforward switches to alternative models as needed. This adaptability is made possible through a comprehensive constants file that serves as the system’s control center. The following configurations are available:

PREFIX

genai

BEDROCK_AGENT_MODEL

BATCH_SIZE

CLASSIFICATION_INPUT_FOLDER

CLASSIFICATION_OUTPUT_FOLDER

OUTPUT_FORMAT

INPUT_MAPPING

record_id

record_text

PROMPT

GitHub repo

<class>

BatchResultsProcessingStack functions as the data postprocessing stage, transforming the Amazon Bedrock JSONL output into user-friendly formats. At the time of writing, the system supports CSV, JSON, and XLSX. These processed files are then stored in a designated output folder in the S3 bucket, organized by date for quick retrieval and management. The conversion scripts are available in the GitHub repo. The output files have the following schema:

INPUT_TEXT

CLASS

RATIONALE

AnalyticsStack provides a business intelligence (BI) dashboard that displays a list of classifications and allows filtering based on defined in prompt categories. It offers the following key configuration options:

ATHENA_DATABASE_NAME

QUICKSIGHT_DATA_SCHEMA

QUICKSIGHT_PRINCIPAL_NAME

QUICKSIGHT_QUERY_MODE

your use case

Now that you’ve successfully deployed the system, you can prepare your data file—this can be either real customer data or the synthetic dataset we provided for testing. When your file is ready, go to the S3 bucket named {prefix}-{account_id}-customer-requests-bucket-{region} and upload your file to input_data folder. After the batch inference job is complete, you can view the classification results on the dashboard. You can find it under the name {prefix}-{account_id}-classifications-dashboard-{region}. The following screenshot shows a preview of what you can expect.

The dashboard will not display data until Amazon Bedrock finishes processing the batch inference jobs and the AWS Glue crawler creates the Athena table. Without these steps completed, the dashboard can’t connect to the table because it doesn’t exist yet. Additionally, you must update the QuickSight role permissions that were set up during pre-deployment. To update permissions, complete the following steps:

Manage QuickSight

Security & Permissions

{prefix}-{account_id}-internal-classifications-{region}

Results

To test the solution’s performance and reliability, we tested 1,190 synthetically generated travel agency conversations from a single Excel file across multiple runs. The results were remarkably consistent across 10 consecutive runs, with processing times ranging between 11–12 minutes per batch (200 classifications in a single batch).Our solution achieved the following:

Speed

Accuracy

Cost-effectiveness

Challenges

For certain cases, the generated class didn’t exactly match the class name given in the prompt. For instance, in multiple cases, it output “Hotel/Flight Booking Inquiry” instead of “Booking Inquiry,” which was defined as the class in the prompt. This was addressed by prompt engineering and asking the model to check the final class output to match exactly with one of the provided classes.

Error handling

For troubleshooting purposes, the solution includes an Amazon DynamoDB table that tracks batch processing status, along with Amazon CloudWatch Logs. Error tracking is not automated and requires manual monitoring and validation.

Key takeaways

Although our testing focused on travel agency scenarios, the solution’s architecture is flexible and can be adapted to various classification needs across different industries and use cases.

Known limitations

The following are key limitations of the classification solution and should be considered when planning its use:

Minimum batch size

Processing time

Input file formats

Clean up

To avoid additional charges, clean up your AWS resources when they’re no longer needed by running the command cdk destroy --all --profile {your_profile_name}, replacing {your_profile_name} with your AWS profile name.

To remove resources associated with this project, complete the following steps:

Buckets

{prefix}

Tables

{prefix}-{account_id}-batch-processing-status-{region}

This comprehensive cleanup helps make sure residual resources don’t remain in your AWS account from this project.

Conclusion

In this post, we explored how Amazon Bedrock batch inference can transform your large-scale text classification workflows. You can now automate time-consuming tasks your teams handle daily, such as analyzing lost sales opportunities, categorizing travel requests, and processing insurance claims. This solution frees your teams to focus on growing and improving your business.

Furthermore, this solution gives the opportunity to create a system that provides real-time classifications, seamlessly integrates with your communication channels, offers enhanced monitoring capabilities, and supports multiple languages for global operations.

This solution was developed for internal use in test and non-production environments only. It is the responsibility of the customer to perform their due diligence to verify the solution aligns with their compliance obligations.

We’re excited to see how you will adapt this solution to your unique challenges. Share your experience or questions in the comments—we’re here to help you get started on your automation journey.

About the authors

Nika Mishurina is a Senior Solutions Architect with Amazon Web Services. She is passionate about delighting customers through building end-to-end production-ready solutions for Amazon. Outside of work, she loves traveling, working out, and exploring new things.

Farshad Harirchi is a Principal Data Scientist at AWS Professional Services. He helps customers across industries, from retail to industrial and financial services, with the design and development of generative AI and machine learning solutions. Farshad brings extensive experience in the entire machine learning and MLOps stack. Outside of work, he enjoys traveling, playing outdoor sports, and exploring board games.