AWS Machine Learning Blog 06月07日 02:15
Build a serverless audio summarization solution with Amazon Bedrock and Whisper
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种使用生成式AI自动化录音转录、摘要生成和敏感信息编辑的解决方案。该方案利用Amazon Bedrock服务,结合OpenAI Whisper模型进行转录,Anthropic Claude模型进行摘要,并使用Guardrails进行PII(个人身份信息)的自动编辑。通过React前端、AWS Lambda、Step Functions等组件,构建了一个完整的音视频处理流程,实现高效、安全的录音内容管理。

🎧 方案的核心在于结合了Amazon Bedrock与多种AI模型,实现了录音转录、摘要生成和敏感信息编辑的自动化流程。

🎙️用户通过React前端上传录音文件,文件存储在S3桶中,触发Step Functions状态机,启动AI处理流程。

📝状态机利用Whisper进行转录、Claude进行摘要,并使用Guardrails自动编辑敏感信息,确保数据安全。

⚙️ 方案涉及多个AWS服务,包括S3、API Gateway、EventBridge、Lambda、Step Functions、CloudFront等,构建了一个完整的音视频处理流水线。

🛡️用户需要创建Guardrail,配置PII检测和处理,并部署Whisper模型。通过AWS CDK部署前端和后端基础设施,实现端到端的解决方案。

Recordings of business meetings, interviews, and customer interactions have become essential for preserving important information. However, transcribing and summarizing these recordings manually is often time-consuming and labor-intensive. With the progress in generative AI and automatic speech recognition (ASR), automated solutions have emerged to make this process faster and more efficient.

Protecting personally identifiable information (PII) is a vital aspect of data security, driven by both ethical responsibilities and legal requirements. In this post, we demonstrate how to use the Open AI Whisper foundation model (FM) Whisper Large V3 Turbo, available in Amazon Bedrock Marketplace, which offers access to over 140 models through a dedicated offering, to produce near real-time transcription. These transcriptions are then processed by Amazon Bedrock for summarization and redaction of sensitive information.

Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, DeepSeek, Luma, Meta, Mistral AI, poolside (coming soon), Stability AI, and Amazon Nova through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Additionally, you can use Amazon Bedrock Guardrails to automatically redact sensitive information, including PII, from the transcription summaries to support compliance and data protection needs.

In this post, we walk through an end-to-end architecture that combines a React-based frontend with Amazon Bedrock, AWS Lambda, and AWS Step Functions to orchestrate the workflow, facilitating seamless integration and processing.

Solution overview

The solution highlights the power of integrating serverless technologies with generative AI to automate and scale content processing workflows. The user journey begins with uploading a recording through a React frontend application, hosted on Amazon CloudFront and backed by Amazon Simple Storage Service (Amazon S3) and Amazon API Gateway. When the file is uploaded, it triggers a Step Functions state machine that orchestrates the core processing steps, using AI models and Lambda functions for seamless data flow and transformation. The following diagram illustrates the solution architecture.

The workflow consists of the following steps:

    The React application is hosted in an S3 bucket and served to users through CloudFront for fast, global access. API Gateway handles interactions between the frontend and backend services. Users upload audio or video files directly from the app. These recordings are stored in a designated S3 bucket for processing. An Amazon EventBridge rule detects the S3 upload event and triggers the Step Functions state machine, initiating the AI-powered processing pipeline. The state machine performs audio transcription, summarization, and redaction by orchestrating multiple Amazon Bedrock models in sequence. It uses Whisper for transcription, Claude for summarization, and Guardrails to redact sensitive data. The redacted summary is returned to the frontend application and displayed to the user.

The following diagram illustrates the state machine workflow.

The Step Functions state machine orchestrates a series of tasks to transcribe, summarize, and redact sensitive information from uploaded audio/video recordings:

    A Lambda function is triggered to gather input details (for example, Amazon S3 object path, metadata) and prepare the payload for transcription. The payload is sent to the OpenAI Whisper Large V3 Turbo model through the Amazon Bedrock Marketplace to generate a near real-time transcription of the recording. The raw transcript is passed to Anthropic’s Claude Sonnet 3.5 through Amazon Bedrock, which produces a concise and coherent summary of the conversation or content. A second Lambda function validates and forwards the summary to the redaction step. The summary is processed through Amazon Bedrock Guardrails, which automatically redacts PII and other sensitive data. The redacted summary is stored or returned to the frontend application through an API, where it is displayed to the user.

Prerequisites

Before you start, make sure that you have the following prerequisites in place:

Create a guardrail in the Amazon Bedrock console

For instructions for creating guardrails in Amazon Bedrock, refer to Create a guardrail. For details on detecting and redacting PII, see Remove PII from conversations by using sensitive information filters. Configure your guardrail with the following key settings:

After you deploy the guardrail, note the Amazon Resource Name (ARN), and you will be using this when deploys the model.

Deploy the Whisper model

Complete the following steps to deploy the Whisper Large V3 Turbo model:

    On the Amazon Bedrock console, choose Model catalog under Foundation models in the navigation pane. Search for and choose Whisper Large V3 Turbo. On the options menu (three dots), choose Deploy.

    Modify the endpoint name, number of instances, and instance type to suit your specific use case. For this post, we use the default settings. Modify the Advanced settings section to suit your use case. For this post, we use the default settings. Choose Deploy.

This creates a new AWS Identity and Access Management IAM role and deploys the model.

You can choose Marketplace deployments in the navigation pane, and in the Managed deployments section, you can see the endpoint status as Creating. Wait for the endpoint to finish deployment and the status to change to In Service, then copy the Endpoint Name, and you will be using this when deploying the

Deploy the solution infrastructure

In the GitHub repo, follow the instructions in the README file to clone the repository, then deploy the frontend and backend infrastructure.

We use the AWS Cloud Development Kit (AWS CDK) to define and deploy the infrastructure. The AWS CDK code deploys the following resources:

Implementation deep dive

The backend is composed of a sequence of Lambda functions, each handling a specific stage of the audio processing pipeline:

Let’s examine some of the key components:

The transcription Lambda function uses the Whisper model to convert audio files to text:

def transcribe_with_whisper(audio_chunk, endpoint_name):    # Convert audio to hex string format    hex_audio = audio_chunk.hex()        # Create payload for Whisper model    payload = {        "audio_input": hex_audio,        "language": "english",        "task": "transcribe",        "top_p": 0.9    }        # Invoke the SageMaker endpoint running Whisper    response = sagemaker_runtime.invoke_endpoint(        EndpointName=endpoint_name,        ContentType='application/json',        Body=json.dumps(payload)    )        # Parse the transcription response    response_body = json.loads(response['Body'].read().decode('utf-8'))    transcription_text = response_body['text']        return transcription_text

We use Amazon Bedrock to generate concise summaries from the transcriptions:

def generate_summary(transcription):    # Format the prompt with the transcription    prompt = f"{transcription}\n\nGive me the summary, speakers, key discussions, and action items with owners"        # Call Bedrock for summarization    response = bedrock_runtime.invoke_model(        modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",        body=json.dumps({            "prompt": prompt,            "max_tokens_to_sample": 4096,            "temperature": 0.7,            "top_p": 0.9,        })    )        # Extract and return the summary    result = json.loads(response.get('body').read())    return result.get('completion')

A critical component of our solution is the automatic redaction of PII. We implemented this using Amazon Bedrock Guardrails to support compliance with privacy regulations:

def apply_guardrail(bedrock_runtime, content, guardrail_id):# Format content according to API requirementsformatted_content = [{"text": {"text": content}}]# Call the guardrail APIresponse = bedrock_runtime.apply_guardrail(guardrailIdentifier=guardrail_id,guardrailVersion="DRAFT",source="OUTPUT",  # Using OUTPUT parameter for proper flowcontent=formatted_content)# Extract redacted text from responseif 'action' in response and response['action'] == 'GUARDRAIL_INTERVENED':if len(response['outputs']) > 0:output = response['outputs'][0]if 'text' in output and isinstance(output['text'], str):return output['text']# Return original content if redaction failsreturn content

When PII is detected, it’s replaced with type indicators (for example, {PHONE} or {EMAIL}), making sure that summaries remain informative while protecting sensitive data.

To manage the complex processing pipeline, we use Step Functions to orchestrate the Lambda functions:

{"Comment": "Audio Summarization Workflow","StartAt": "TranscribeAudio","States": {"TranscribeAudio": {"Type": "Task","Resource": "arn:aws:states:::lambda:invoke","Parameters": {"FunctionName": "WhisperTranscriptionFunction","Payload": {"bucket": "$.bucket","key": "$.key"}},"Next": "IdentifySpeakers"},"IdentifySpeakers": {"Type": "Task","Resource": "arn:aws:states:::lambda:invoke","Parameters": {"FunctionName": "SpeakerIdentificationFunction","Payload": {"Transcription.$": "$.Payload"}},"Next": "GenerateSummary"},"GenerateSummary": {"Type": "Task","Resource": "arn:aws:states:::lambda:invoke","Parameters": {"FunctionName": "BedrockSummaryFunction","Payload": {"SpeakerIdentification.$": "$.Payload"}},"End": true}}}

This workflow makes sure each step completes successfully before proceeding to the next, with automatic error handling and retry logic built in.

Test the solution

After you have successfully completed the deployment, you can use the CloudFront URL to test the solution functionality.

Security considerations

Security is a critical aspect of this solution, and we’ve implemented several best practices to support data protection and compliance:

Clean up

To prevent unnecessary charges, make sure to delete the resources provisioned for this solution when you’re done:

    Delete the Amazon Bedrock guardrail:
      On the Amazon Bedrock console, in the navigation menu, choose Guardrails. Choose your guardrail, then choose Delete.
    Delete the Whisper Large V3 Turbo model deployed through the Amazon Bedrock Marketplace:
      On the Amazon Bedrock console, choose Marketplace deployments in the navigation pane. In the Managed deployments section, select the deployed endpoint and choose Delete.
    Delete the AWS CDK stack by running the command cdk destroy, which deletes the AWS infrastructure.

Conclusion

This serverless audio summarization solution demonstrates the benefits of combining AWS services to create a sophisticated, secure, and scalable application. By using Amazon Bedrock for AI capabilities, Lambda for serverless processing, and CloudFront for content delivery, we’ve built a solution that can handle large volumes of audio content efficiently while helping you align with security best practices.

The automatic PII redaction feature supports compliance with privacy regulations, making this solution well-suited for regulated industries such as healthcare, finance, and legal services where data security is paramount. To get started, deploy this architecture within your AWS environment to accelerate your audio processing workflows.


About the Authors

Kaiyin Hu is a Senior Solutions Architect for Strategic Accounts at Amazon Web Services, with years of experience across enterprises, startups, and professional services. Currently, she helps customers build cloud solutions and drives GenAI adoption to cloud. Previously, Kaiyin worked in the Smart Home domain, assisting customers in integrating voice and IoT technologies.

Sid Vantair is a Solutions Architect with AWS covering Strategic accounts.  He thrives on resolving complex technical issues to overcome customer hurdles. Outside of work, he cherishes spending time with his family and fostering inquisitiveness in his children.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 录音转录 Amazon Bedrock Whisper 隐私保护
相关文章