AWS Machine Learning Blog 2024年07月16日
Video auto-dubbing using Amazon Translate, Amazon Bedrock, and Amazon Polly
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种使用Amazon Translate 和Amazon Bedrock 实现视频自动配音的解决方案,旨在降低配音成本和提高效率。该方案通过Amazon Translate 进行字幕翻译,并使用Amazon Bedrock 进一步提升翻译质量,实现自动时间缩放,从而同步音频和视频。此外,还利用Amazon Augmented AI 进行内容审核,并使用Amazon Polly 生成合成语音。该方案可通过AWS Step Functions 进行编排,并使用AWS CloudFormation 进行自动化部署,方便用户进行其他语言的配音。

🤔 **Amazon Translate 用于字幕翻译:** Amazon Translate 支持75种以上语言,并提供高精度、快速且经济实惠的语言翻译服务。该方案利用Amazon Translate 进行字幕翻译,并提供自定义术语词典功能,确保翻译结果符合视频内容的专业术语和组织词汇。

🧐 **Amazon Bedrock 用于提升翻译质量:** Amazon Bedrock 是一款全托管服务,提供来自领先 AI 公司的各种高性能基础模型 (FM),并提供广泛的功能,帮助用户构建安全、私密且负责任的生成式 AI 应用程序。该方案利用Amazon Bedrock 进行翻译后编辑,包括识别和替换习语,以及自动时间缩放功能。

🚀 **其他技术组件:** 该方案还利用了Amazon Augmented AI 进行内容审核,使用Amazon Polly 生成合成语音,并通过AWS Step Functions 进行编排,使用AWS CloudFormation 进行自动化部署。

💡 **方案优势:** 该方案提供了一种高效、经济实惠的视频自动配音解决方案,可以帮助企业降低配音成本,提高效率,并实现更广泛的市场覆盖。

💪 **应用场景:** 该方案适用于需要进行视频国际化的企业,例如视频流媒体平台、教育机构、企业培训机构等。

🌎 **未来展望:** 未来,该方案可以进一步扩展,支持更多语言,并提供更完善的翻译和配音功能,为用户提供更加便捷高效的视频国际化解决方案。

This post is co-written with MagellanTV and Mission Cloud. 

Video dubbing, or content localization, is the process of replacing the original spoken language in a video with another language while synchronizing audio and video. Video dubbing has emerged as a key tool in breaking down linguistic barriers, enhancing viewer engagement, and expanding market reach. However, traditional dubbing methods are costly (about $20 per minute with human review effort) and time consuming, making them a common challenge for companies in the Media & Entertainment (M&E) industry. Video auto-dubbing that uses the power of generative artificial intelligence (generative AI) offers creators an affordable and efficient solution.

This post shows you a cost-saving solution for video auto-dubbing. We use Amazon Translate for initial translation of video captions and use Amazon Bedrock for post-editing to further improve the translation quality. Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to help you build generative AI applications with security, privacy, and responsible AI.

MagellanTV, a leading streaming platform for documentaries, wants to broaden its global presence through content internationalization. Faced with manual dubbing challenges and prohibitive costs, MagellanTV sought out AWS Premier Tier Partner Mission Cloud for an innovative solution.

Mission Cloud’s solution distinguishes itself with idiomatic detection and automatic replacement, seamless automatic time scaling, and flexible batch processing capabilities with increased efficiency and scalability.

Solution overview

The following diagram illustrates the solution architecture. The inputs of the solution are specified by the user, including the folder path containing the original video and caption file, target language, and toggles for idiom detector and formality tone. You can specify these inputs in an Excel template and upload the Excel file to a designated Amazon Simple Storage Service (Amazon S3) bucket. This will launch the whole pipeline. The final outputs are a dubbed video file and a translated caption file.

We use Amazon Translate to translate the video caption, and Amazon Bedrock to enhance the translation quality and enable automatic time scaling to synchronize audio and video. We use Amazon Augmented AI for editors to review the content, which is then sent to Amazon Polly to generate synthetic voices for the video. To assign a gender expression that matches the speaker, we developed a model to predict the gender expression of the speaker.

In the backend, AWS Step Functions orchestrates the preceding steps as a pipeline. Each step is run on AWS Lambda or AWS Batch. By using the infrastructure as code (IaC) tool, AWS CloudFormation, the pipeline becomes reusable for dubbing new foreign languages.

In the following sections, you will learn how to use the unique features of Amazon Translate for setting formality tone and for custom terminology. You will also learn how to use Amazon Bedrock to further improve the quality of video dubbing.

Why choose Amazon Translate?

We chose Amazon Translate to translate video captions based on three factors.

Use Amazon Translate for custom terminology

Amazon Translate allows you to input a custom terminology dictionary, ensuring translations reflect the organization’s vocabulary or specialized terminology. We use the custom terminology dictionary to compile frequently used terms within video transcription scripts.

Here’s an example. In a documentary video, the caption file would typically display “(speaking in foreign language)” on the screen as the caption when the interviewee speaks in a foreign language. The sentence “(speaking in foreign language)” itself doesn’t have proper English grammar: it lacks the proper noun, yet it’s commonly accepted as an English caption display. When translating the caption into German, the translation also lacks the proper noun, which can be confusing to German audiences as shown in the code block that follows.

## Translate - without custom terminology (default)import boto3# Initialize a session of Amazon Translatetranslate=boto3.client(service_name='translate', region_name='us-east-1', use_ssl=True)def translate_text(text, source_lang, target_lang):    result=translate.translate_text(        Text=text,         SourceLanguageCode=source_lang,         TargetLanguageCode=target_lang)    return result.get('TranslatedText')text="(speaking in a foreign language)"output=translate_text(text, "en", "de")print(output)# Output: (in einer Fremdsprache sprechen)

Because this phrase “(speaking in foreign language)” is commonly seen in video transcripts, we added this term to the custom terminology CSV file translation_custom_terminology_de.csv with the vetted translation and provided it in the Amazon Translate job. The translation output is as intended as shown in the following code.

## Translate - with custom terminologyimport boto3import json# Initialize a session of Amazon Translatetranslate=boto3.client('translate')with open('translation_custom_terminology_de.csv', 'rb') as ct_file:    translate.import_terminology(        Name='CustomTerminology_boto3',        MergeStrategy='OVERWRITE',        Description='Terminology for Demo through boto3',        TerminologyData={            'File':ct_file.read(),            'Format':'CSV',            'Directionality':'MULTI'        }    )text="(speaking in foreign language)"result=translate.translate_text(    Text=text,    TerminologyNames=['CustomTerminology_boto3_2024'],     SourceLanguageCode="en",    TargetLanguageCode="de")print(result['TranslatedText'])# Output: (Person spricht in einer Fremdsprache)

Set formality tone in Amazon Translate

Some documentary genres tend to be more formal than others. Amazon Translate allows you to define the desired level of formality for translations to supported target languages. By using the default setting (Informal) of Amazon Translate, the translation output in German for the phrase, “[Speaker 1] Let me show you something,” is informal, according to a professional translator.

## Translate - with informal tone (default) import boto3# Initialize a session of Amazon Translatetranslate=boto3.client(service_name='translate', region_name='us-east-1', use_ssl=True)def translate_text(text, source_lang,target_lang):    result=translate.translate_text(        Text=text,         SourceLanguageCode=source_lang,         TargetLanguageCode=target_lang)    return result.get('TranslatedText')text="[Speaker 1] Let me show you something."output=translate_text(text, "en", "de")print(output)# Output: [Sprecher 1] Lass mich dir etwas zeigen.

By adding the Formal setting, the output translation has a formal tone, which fits the documentary’s genre as intended.

## Translate - with formal tone import boto3# Initialize a session of Amazon Translatetranslate=boto3.client(service_name='translate', region_name='us-east-1', use_ssl=True)def translate_text(text, source_lang, target_lang):    result=translate.translate_text(        Text=text,         SourceLanguageCode=source_lang,         TargetLanguageCode=target_lang,        Settings={'Formality':'FORMAL'})    return result.get('TranslatedText')text="[Speaker 1] Let me show you something."output=translate_text(text, "en", "de")print(output)# Output: [Sprecher 1] Lassen Sie mich Ihnen etwas zeigen.

Use Amazon Bedrock for post-editing

In this section, we use Amazon Bedrock to improve the quality of video captions after we obtain the initial translation from Amazon Translate.

Idiom detection and replacement

Idiom detection and replacement is vital in dubbing English videos to accurately convey cultural nuances. Adapting idioms prevents misunderstandings, enhances engagement, preserves humor and emotion, and ultimately improves the global viewing experience. Hence, we developed an idiom detection function using Amazon Bedrock to resolve this issue.

You can turn the idiom detector on or off by specifying the inputs to the pipeline. For example, for science genres that have fewer idioms, you can turn the idiom detector off. While, for genres that have more casual conversations, you can turn the idiom detector on. For a 25-minute video, the total processing time is about 1.5 hours, of which about 1 hour is spent on video preprocessing and video composing. Turning the idiom detector on only adds about 5 minutes to the total processing time.

We have developed a function bedrock_api_idiom to detect and replace idioms using Amazon Bedrock. The function first uses Amazon Bedrock LLMs to detect idioms in the text and then replace them. In the example that follows, Amazon Bedrock successfully detects and replaces the input text “well, I hustle” to “I work hard,” which can be translated correctly into Spanish by using Amazon Translate.

## A rare idiom is well-detected and rephrased by Amazon Bedrock text_rephrased=bedrock_api_idiom(text)print(text_rephrased)# Output: I work hardresponse=translate_text(text_rephrased, "en", "es-MX")print(response)# Output: yo trabajo duroresponse=translate_text(response, "es-MX", "en")print(response)# Output: I work hard

Sentence shortening

Third-party video dubbing tools can be used for time-scaling during video dubbing, which can be costly if done manually. In our pipeline, we used Amazon Bedrock to develop a sentence shortening algorithm for automatic time scaling.

For example, a typical caption file consists of a section number, timestamp, and the sentence. The following is an example of an English sentence before shortening.

Original sentence:

A large portion of the solar energy that reaches our planet is reflected back into space or absorbed by dust and clouds.

Here’s the shortened sentence using the sentence shortening algorithm. Using Amazon Bedrock, we can significantly improve the video-dubbing performance and reduce the human review effort, resulting in cost saving.

Shortened sentence:

A large part of solar energy is reflected into space or absorbed by dust and clouds.

Conclusion

This new and constantly developing pipeline has been a revolutionary step for MagellanTV because it efficiently resolved some challenges they were facing that are common within Media & Entertainment companies in general. The unique localization pipeline developed by Mission Cloud creates a new frontier of opportunities to distribute content across the world while saving on costs. Using generative AI in tandem with brilliant solutions for idiom detection and resolution, sentence length shortening, and custom terminology and tone results in a truly special pipeline bespoke to MagellanTV’s growing needs and ambitions.

If you want to learn more about this use case or have a consultative session with the Mission team to review your specific generative AI use case, feel free to request one through AWS Marketplace.


About the Authors

Na Yu is a Lead GenAI Solutions Architect at Mission Cloud, specializing in developing ML, MLOps, and GenAI solutions in AWS Cloud and working closely with customers. She received her Ph.D. in Mechanical Engineering from the University of Notre Dame.

Max Goff is a data scientist/data engineer with over 30 years of software development experience. A published author, blogger, and music producer he sometimes dreams in A.I.

Marco Mercado is a Sr. Cloud Engineer specializing in developing cloud native solutions and automation. He holds multiple AWS Certifications and has extensive experience working with high-tier AWS partners. Marco excels at leveraging cloud technologies to drive innovation and efficiency in various projects.

Yaoqi Zhang is a Senior Big Data Engineer at Mission Cloud. She specializes in leveraging AI and ML to drive innovation and develop solutions on AWS. Before Mission Cloud, she worked as an ML and software engineer at Amazon for six years, specializing in recommender systems for Amazon fashion shopping and NLP for Alexa. She received her Master of Science Degree in Electrical Engineering from Boston University.

Adrian Martin is a Big Data/Machine Learning Lead Engineer at Mission Cloud. He has extensive experience in English/Spanish interpretation and translation.

Ryan Ries holds over 15 years of leadership experience in data and engineering, over 20 years of experience working with AI and 5+ years helping customers build their AWS data infrastructure and AI models. After earning his Ph.D. in Biophysical Chemistry at UCLA and Caltech, Dr. Ries has helped develop cutting-edge data solutions for the U.S. Department of Defense and a myriad of Fortune 500 companies.

Andrew Federowicz is the IT and Product Lead Director for Magellan VoiceWorks at MagellanTV. With a decade of experience working in cloud systems and IT in addition to a degree in mechanical engineering, Andrew designs builds, deploys, and scales inventive solutions to unique problems. Before Magellan VoiceWorks, Andrew architected and built the AWS infrastructure for MagellanTV’s 24/7 globally available streaming app. In his free time, Andrew enjoys sim racing and horology.

Qiong Zhang, PhD, is a Sr. Partner Solutions Architect at AWS, specializing in AI/ML. Her current areas of interest include federated learning, distributed training, and generative AI. She holds 30+ patents and has co-authored 100+ journal/conference papers. She is also the recipient of the Best Paper Award at IEEE NetSoft 2016, IEEE ICC 2011, ONDM 2010, and IEEE GLOBECOM 2005.

Cristian Torres is a Sr. Partner Solutions Architect at AWS. He has 10 years of experience working in technology performing several roles such as: Support Engineer, Presales Engineer, Sales Specialist and Solutions Architect. He works as a generalist with AWS services focusing on Migrations to help strategic AWS Partners develop successfully from a technical and business perspective.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视频配音 Amazon Translate Amazon Bedrock AI 国际化
相关文章