AWS Machine Learning Blog 02月22日
AWS and DXC collaborate to deliver customizable, near real-time voice-to-voice translation capabilities for Amazon Connect
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨AWS和DXC Technology合作开发的可扩展语音到语音(V2V)翻译原型,以解决全球业务中多语言客户支持的难题,包括面临的挑战、带来的业务影响、解决方案概述及实施中的问题与优化等内容。

🎯DXC面临多语言客户服务挑战,需解决成本等问题

💡AWS和DXC合作确定需求、建立基准等

📋解决方案包括语音识别、机器翻译等关键组件

🚧实施近实时语音翻译存在用户体验问题

🎧音频流插件优化客户/代理体验

Providing effective multilingual customer support in global businesses presents significant operational challenges. Through collaboration between AWS and DXC Technology, we’ve developed a scalable voice-to-voice (V2V) translation prototype that transforms how contact centers handle multi-lingual customer interactions.

In this post, we discuss how AWS and DXC used Amazon Connect and other AWS AI services to deliver near real-time V2V translation capabilities.

Challenge: Serving customers in multiple languages

In Q3 2024, DXC Technology approached AWS with a critical business challenge: their global contact centers needed to serve customers in multiple languages without the exponential cost of hiring language-specific agents for the lower volume languages. Previously, DXC had explored several existing alternatives but found limitations in each approach – from communication constraints to infrastructure requirements that impacted reliability, scalability, and operational costs. DXC and AWS decided to organize a focused hackathon where DXC and AWS Solution Architects collaborated to:

Business impact

For DXC, this prototype was used as an enabler, allowing technical talent maximization, operational transformation, and cost improvements through:

Solution overview

The Amazon Connect V2V translation prototype uses AWS advanced speech recognition and machine translation technologies to enable real-time conversation translation between agents and customers, allowing them to speak in their preferred languages while having natural conversations. It consists of the following key components:

The prototype can be extended with other AWS AI services to further customize the translation capabilities. It’s open source and ready for customization to meet your specific needs.

The following diagram illustrates the solution architecture.

The following screenshot illustrates a sample agent web application.

The user interface consists of three sections:

Challenges when implementing near real-time voice translation

The Amazon Connect V2V sample project was designed to minimize the audio processing time from the moment the customer or agent finishes speaking until the translated audio stream is started. However, even with the shortest audio processing time, the user experience still doesn’t match the experience of a real conversation when both are speaking the same language. This is due to the specific pattern of the customer only hearing the agent’s translated speech, and the agent only hearing the customer’s translated speech. The following diagram displays that pattern.

The example workflow consists of the following steps:

    The customer starts speaking in their own language, and speaks for 10 seconds. Because the agent only hears the customer’s translated speech, the agent first hears 10 seconds of silence. When customer finishes speaking, the audio processing time takes 1–2 seconds, during which time both the customer and agent hear silence. The customer’s translated speech is streamed to the agent. During that time, the customer hears silence. When the customer’s translated speech playback is complete, the agent starts speaking, and speaks for 10 seconds. Because customer only hears the agent’s translated speech, the customer hears 10 seconds of silence. When the agent finishes speaking, the audio processing time takes 1–2 seconds, during which time both the customer and agent hear silence. The agent’s translated speech is streamed to the agent. During that time, the agent hears silence.

In this scenario, the customer hears a single block of 22–24 seconds of a complete silence, from the moment they finished speaking until they hear the agent’s translated voice. This creates a suboptimal experience, because the customer might not be certain what is happening during these 22–24 seconds—for instance, if the agent was able to hear them, or if there was a technical issue.

Audio streaming add-ons

In a face-to-face conversation scenario between two people that don’t speak the same language, they might have another person as a translator or interpreter. An example workflow consists of the following steps:

    Person A speaks in their own language, which is heard by Person B and the translator. The translator translates what Person A said to Person B’s language. The translation is heard by Person B and Person A.

Essentially, Person A and Person B hear each other speaking their own language, and they also hear the translation (from the translator). There’s no waiting in silence, which is even more important in non-face-to-face conversations (such as contact center interactions).

To optimize the customer/agent experience, the Amazon Connect V2V sample project implements audio streaming add-ons to simulate a more natural conversation experience. The following diagram illustrates an example workflow.

The workflow consists of the following steps:

    The customer starts speaking in their own language, and speaks for 10 seconds. The agent hears the customer’s original voice, at a lower volume (“Stream Customer Mic to Agent” enabled). When the customer finishes speaking, the audio processing time takes 1–2 seconds. During that time, the customer and agent hear subtle audio feedback—contact center background noise—at a very low volume (“Audio Feedback” enabled). The customer’s translated speech is then streamed to the agent. During that time, the customer hears their translated speech, at a lower volume (“Stream Customer Translation to Customer” enabled). When the customer’s translated speech playback is complete, the agent starts speaking, and speaks for 10 seconds. The customer hears the agent’s original voice, at a lower volume (“Stream Agent Mic to Customer” enabled). When the agent finishes speaking, the audio processing time takes 1–2 seconds. During that time, the customer and agent hear subtle audio feedback—contact center background noise—at a very low volume (“Audio Feedback” enabled). The agent’s translated speech is then streamed to the agent. During that time, the agent hears their translated speech, at a lower volume (“Stream Agent Translation to Agent” enabled).

In this scenario, the customer hears two short blocks (1–2 seconds) of subtle audio feedback, instead of a single block of 22–24 seconds of complete silence. This pattern is much closer to a face-to-face conversation that includes a translator.

The audio streaming add-ons provide additional benefits, including:

Get started with Amazon Connect V2V

Ready to transform your contact center’s communication? Our Amazon Connect V2V sample project is now available on GitHub. We invite you to explore, deploy, and experiment with this powerful prototype. You can it as a foundation for developing innovative multi-lingual communication solutions in your own contact center, through the following key steps:

    Clone the GitHub repository. Test different configurations for audio streaming add-ons. Review the sample project’s limitations in the README. Develop your implementation strategy:
      Implement robust security and compliance controls that meet your organization’s standards. Collaborate with your customer experience team to define your specific use case requirements. Balance between automation and the agent’s manual controls (for example, use an Amazon Connect contact flow to automatically set contact attributes for preferred languages and audio streaming add-ons). Use your preferred transcribe, translate, and text-to-speech engines, based on specific language support requirements and business, legal, and regional preferences. Plan a phased rollout, starting with a pilot group, then iteratively optimize your transcription custom vocabularies and translation custom terminologies.

Conclusion

The Amazon Connect V2V sample project demonstrates how Amazon Connect and advanced AWS AI services can break down language barriers, enhance operational flexibility, and reduce support costs. Get started now and revolutionize how your contact center communicates across language barriers!


About the Authors

Milos Cosic is a Principal Solutions Architect at AWS.

EJ Ferrell is a Senior Solutions Architect at AWS.

Adam El Tanbouli is a Technical Program Manager for Prototyping and Support Services at DXC Modern Workplace.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

多语言客户支持 V2V翻译原型 业务影响 音频流插件 解决方案
相关文章