AWS Machine Learning Blog 2024年10月31日
Unlock organizational wisdom using voice-driven knowledge capture with Amazon Transcribe and Amazon Bedrock
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种创新的基于语音的应用程序工作流程,该流程利用 Amazon Bedrock、Amazon Transcribe 和 React 的力量,通过经验丰富的员工的语音录音来系统地捕捉和记录机构知识。该解决方案使用 Amazon Transcribe 进行实时语音转文本转换,从而能够准确、立即地记录口头知识。然后,我们使用由 Amazon Bedrock 提供支持的生成式 AI 来分析和总结转录的内容,提取关键见解并生成全面的文档。该应用程序的前端使用 React 构建,React 是一个用于创建动态 UI 的流行 JavaScript 库。这个基于 React 的 UI 与 Amazon Transcribe 无缝集成,为用户提供实时转录体验。当员工说话时,他们可以实时观察他们的文字转换为文本,允许立即审查和编辑。

📘 **利用 Amazon Bedrock、Amazon Transcribe 和 React 构建一个基于语音的应用程序工作流程,通过经验丰富的员工的语音录音来系统地捕捉和记录机构知识。** 该解决方案使用 Amazon Transcribe 进行实时语音转文本转换,从而能够准确、立即地记录口头知识。然后,我们使用由 Amazon Bedrock 提供支持的生成式 AI 来分析和总结转录的内容,提取关键见解并生成全面的文档。

📢 **该应用程序的前端使用 React 构建,React 是一个用于创建动态 UI 的流行 JavaScript 库。** 这个基于 React 的 UI 与 Amazon Transcribe 无缝集成,为用户提供实时转录体验。当员工说话时,他们可以实时观察他们的文字转换为文本,允许立即审查和编辑。

💻 **该解决方案使用了多种 AWS 服务,包括 Amazon Transcribe、Amazon Bedrock、AWS Lambda、Amazon Simple Storage Service (Amazon S3) 和 Amazon CloudFront,以提供实时转录和文档生成。** 该解决方案结合了尖端技术,创建了一个无缝的知识捕获流程。

📂 **该解决方案使用一个自定义授权 Lambda 函数和 Amazon API Gateway,而不是更全面的身份管理解决方案,例如 Amazon Cognito。** 这种方法的选择有几个原因:简单、最小的用户摩擦、快速实现、临时凭据管理。

Preserving and taking advantage of institutional knowledge is critical for organizational success and adaptability. This collective wisdom, comprising insights and experiences accumulated by employees over time, often exists as tacit knowledge passed down informally. Formalizing and documenting this invaluable resource can help organizations maintain institutional memory, drive innovation, enhance decision-making processes, and accelerate onboarding for new employees. However, effectively capturing and documenting this knowledge presents significant challenges. Traditional methods, such as manual documentation or interviews, are often time-consuming, inconsistent, and prone to errors. Moreover, the most valuable knowledge frequently resides in the minds of seasoned employees, who may find it difficult to articulate or lack the time to document their expertise comprehensively.

This post introduces an innovative voice-based application workflow that harnesses the power of Amazon Bedrock, Amazon Transcribe, and React to systematically capture and document institutional knowledge through voice recordings from experienced staff members. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Our solution uses Amazon Transcribe for real-time speech-to-text conversion, enabling accurate and immediate documentation of spoken knowledge. We then use generative AI, powered by Amazon Bedrock, to analyze and summarize the transcribed content, extracting key insights and generating comprehensive documentation.

The front-end of our application is built using React, a popular JavaScript library for creating dynamic UIs. This React-based UI seamlessly integrates with Amazon Transcribe, providing users with a real-time transcription experience. As employees speak, they can observe their words converted to text in real-time, permitting immediate review and editing.

By combining the React front-end UI with Amazon Transcribe and Amazon Bedrock, we’ve created a comprehensive solution for capturing, processing, and preserving valuable institutional knowledge. This approach not only streamlines the documentation process but also enhances the quality and accessibility of the captured information, supporting operational excellence and fostering a culture of continuous learning and improvement within organizations.

Solution overview

This solution uses a combination of AWS services, including Amazon Transcribe, Amazon Bedrock, AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon CloudFront, to deliver real-time transcription and document generation. This solution uses a combination of cutting-edge technologies to create a seamless knowledge capture process:

This solution uses a custom authorization Lambda function with Amazon API Gateway instead of more comprehensive identity management solutions such as Amazon Cognito. This approach was chosen for several reasons:

Although this solution works well for this specific use case, it’s important to note that for production applications, especially those dealing with sensitive data or needing user-specific functionality, a more robust identity solution such as Amazon Cognito would typically be recommended.

The following diagram illustrates the architecture of our solution.

The workflow includes the following steps:

    Users access the front-end UI application, which is distributed through CloudFront The React web application sends an initial request to Amazon API Gateway API Gateway forwards the request to the authorization Lambda function The authorization function checks the request against the AWS Identity and Access Management (IAM) role to confirm proper permissions The authorization function sends temporary credentials back to the front-end application through API Gateway With the temporary credentials, the React web application communicates directly with Amazon Transcribe for real-time speech-to-text conversion as the user records their input After recording and transcription, the user sends (through the front-end UI) the transcribed texts and audio files to the backend through API Gateway API Gateway routes the authorized request (containing transcribed text and audio files) to the orchestration Lambda function The orchestration function sends the transcribed text for summarization The orchestration function receives summarized text from Amazon Bedrock to generate content The orchestration function stores the generated PDF files and recorded audio files in the artifacts S3 bucket

Prerequisites

You need the following prerequisites:

Deploy the solution with the AWS CDK

The AWS Cloud Development Kit (AWS CDK) is an open source software development framework for defining cloud infrastructure as code and provisioning it through AWS CloudFormation. Our AWS CDK stack deploys resources from the following AWS services:

To deploy the solution, complete the following steps:

    Clone the GitHub repository: genai-knowledge-capture-webapp Follow the Prerequisites section in the README.md file to set up your local environment

As of this writing, this solution supports deployment to the us-east-1 Region. The CloudFront distribution in this solution is geo-restricted to the US and Canada by default. To change this configuration, refer to the react-app-deploy.ts GitHub repo.

    Invoke npm install to install the dependencies Invoke cdk deploy to deploy the solution

The deployment process typically takes 20–30 minutes. When the deployment is complete, CodeBuild will build and deploy the React application, which typically takes 2–3 minutes. After that, you can access the UI at the ReactAppUrl URL that is output by the AWS CDK.

Amazon Transcribe Streaming within React application

Our solution’s front-end is built using React, a popular JavaScript library for creating dynamic user interfaces. We integrate Amazon Transcribe streaming into our React application using the aws-sdk/client-transcribe-streaming library. This integration enables real-time speech-to-text functionality, so users can observe their spoken words converted to text instantly.

The real-time transcription offers several benefits for knowledge capture:

In this solution, the Amazon Transcribe client is managed in a reusable React hook, useAudioTranscription.ts. An additional React hook, useAudioProcessing.ts, implements the necessary audio stream processing. Refer to the GitHub repo for more information. The following is a simplified code snippet demonstrating the Amazon Transcribe client integration:

// Create Transcribe clienttranscribeClientRef.current = new TranscribeStreamingClient({  region: credentials.Region,  credentials: {    accessKeyId: credentials.AccessKeyId,    secretAccessKey: credentials.SecretAccessKey,    sessionToken: credentials.SessionToken,  },});// Create Transcribe Start Commandconst transcribeStartCommand = new StartStreamTranscriptionCommand({  LanguageCode: transcribeLanguage,  MediaEncoding: audioEncodingType,  MediaSampleRateHertz: audioSampleRate,  AudioStream: getAudioStreamGenerator(),});// Start Transcribe sessionconst data = await transcribeClientRef.current.send(  transcribeStartCommand);console.log("Transcribe session established ", data.SessionId);setIsTranscribing(true);// Process Transcribe result streamif (data.TranscriptResultStream) {  try {    for await (const event of data.TranscriptResultStream) {      handleTranscriptEvent(event, setTranscribeResponse);    }  } catch (error) {    console.error("Error processing transcript result stream:", error);  }}

For optimal results, we recommend using a good-quality microphone and speaking clearly. At the time of writing, the system supports major dialects of English, with plans to expand language support in future updates.

Use the application

After deployment, open the ReactAppUrl link (https://<cloud front domain name>.cloudfront.net) in your browser (the solution supports Chrome, Firefox, Edge, Safari, and Brave browsers on Mac and Windows). A web UI opens, as shown in the following screenshot.

To use this application, complete the following steps:

    Enter a question or topic. Enter a file name for the document. Choose Start Transcription and start recording your input for the given question or topic. The transcribed text will be shown in the Transcription box in real time. After recording, you can edit the transcribed text. You can also choose the play icon to play the recorded audio clips. Choose Generate Document to invoke the backend service to generate a document from the input question and associated transcription. Meanwhile, the recorded audio clips are sent to an S3 bucket for future analysis.

The document generation process uses FMs from Amazon Bedrock to create a well-structured, professional document. The FM model performs the following actions:

The audio files and generated documents are stored in a dedicated S3 bucket, as shown in the following screenshot, with appropriate encryption and access controls in place.

    Choose View Document after you generate the document, and you will notice a professional PDF document generated with the user’s input in your browser, accessed through a presigned URL.

Additional information

To further enhance your knowledge capture solution and address specific use cases, consider the additional features and best practices discussed in this section.

Custom vocabulary with Amazon Transcribe

For industries with specialized terminology, Amazon Transcribe offers a custom vocabulary feature. You can define industry-specific terms, acronyms, and phrases to improve transcription accuracy. To implement this, complete the following steps:

    Create a custom vocabulary file with your specialized terms Use the Amazon Transcribe API to add this vocabulary to your account Specify the custom vocabulary in your transcription requests

Asynchronous file uploads

For handling large audio files or improving user experience, implement an asynchronous upload process:

    Create a separate Lambda function for file uploads Use Amazon S3 presigned URLs to allow direct uploads from the client to Amazon S3 Invoke the upload Lambda function using S3 Event Notifications

Multi-topic document generation

For generating comprehensive documents covering multiple topics, refer to the following AWS Prescriptive Guidance pattern: Document institutional knowledge from voice inputs by using Amazon Bedrock and Amazon Transcribe. This pattern provides a scalable approach to combining multiple voice inputs into a single, coherent document.

Key benefits of this approach include:

Use captured knowledge as a knowledge base

The knowledge captured through this solution can serve as a valuable, searchable knowledge base for your organization. To maximize its utility, you can integrate with enterprise search solutions such as Amazon Bedrock Knowledge Bases to make the captured knowledge quickly discoverable. Additionally, you can set up regular review and update cycles to keep the knowledge base current and relevant.

Clean up

When you’re done testing the solution, remove it from your AWS account to avoid future costs:

    Invoke cdk destroy to remove the solution You may also need to manually remove the S3 buckets created by the solution

Summary

This post demonstrates the power of combining AWS services such as Amazon Transcribe and Amazon Bedrock with popular front-end frameworks such as React to create a robust knowledge capture solution. By using real-time transcription and generative AI, organizations can efficiently document and preserve valuable institutional knowledge, fostering innovation, improving decision-making, and maintaining a competitive edge in dynamic business environments.

We encourage you to explore this solution further by deploying it in your own environment and adapting it to your organization’s specific needs. The source code and detailed instructions are available in our genai-knowledge-capture-webapp GitHub repository, providing a solid foundation for your knowledge capture initiatives.

By embracing this innovative approach to knowledge capture, organizations can unlock the full potential of their collective wisdom, driving continuous improvement and maintaining their competitive edge.


About the Authors

Jundong Qiao is a Machine Learning Engineer at AWS Professional Service, where he specializes in implementing and enhancing AI/ML capabilities across various sectors. His expertise encompasses building next-generation AI solutions, including chatbots and predictive models that drive efficiency and innovation.

Michael Massey is a Cloud Application Architect at Amazon Web Services. He helps AWS customers achieve their goals by building highly-available and highly-scalable solutions on the AWS Cloud.

Praveen Kumar Jeyarajan is a Principal DevOps Consultant at AWS, supporting Enterprise customers and their journey to the cloud. He has 13+ years of DevOps experience and is skilled in solving myriad technical challenges using the latest technologies. He holds a Masters degree in Software Engineering. Outside of work, he enjoys watching movies and playing tennis.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机构知识 语音技术 Amazon Bedrock Amazon Transcribe React
相关文章