AWS Machine Learning Blog 2024年08月09日
How Cisco accelerated the use of generative AI with Amazon SageMaker Inference
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Cisco 的 Webex AI 团队利用大语言模型 (LLM) 来增强 Webex 产品的功能,提升用户体验和效率。随着 LLM 模型的不断增长,团队面临资源分配和启动应用程序的挑战。为了优化 AI/ML 基础设施,Cisco 将 LLM 迁移到 Amazon SageMaker Inference,从而提高速度、可扩展性和性价比。本文介绍了 Cisco 如何利用 SageMaker Inference 优化资源,提升协作和客户体验,并详细阐述了其在接触中心用例中的应用,以及如何使用生成式 AI 来分析通话记录,改善客户体验和代理效率。

💬 **利用生成式 AI 增强协作和客户参与:Webex 的 AI 驱动的解决方案** Webex 利用生成式 AI 来自动生成会议记录和摘要,提取关键要点和行动项目,帮助分布式团队即使错过实时会议也能及时了解情况。生成式 AI 还能够从会议内容中提取智能见解,包括识别行动项目、突出关键决策以及为每个参与者生成个性化的会议记录和待办事项列表。 Webex 还将生成式 AI 应用于其接触中心解决方案,实现客户与代理之间更自然、更像人一样的对话。AI 可以生成与上下文相关的、富有同理心的回复来应对客户的询问,还可以自动起草个性化的电子邮件和聊天信息。这有助于接触中心代理提高工作效率,同时保持高水平的客户服务。

💻 **利用 SageMaker Inference 优化 Cisco 的资源** 为了解决将 LLM 直接嵌入应用程序中所面临的可扩展性和资源利用挑战,WxAI 团队迁移到了 SageMaker Inference。通过利用此完全托管的服务来部署 LLM,Cisco 实现了显著的性能和成本优化机会。主要优势包括能够在单个端点后面部署多个 LLM,以实现更快的扩展和改进的响应延迟,以及节省成本。此外,WxAI 团队实施了一个 LLM 代理,以简化 Webex 团队对 LLM 的访问,实现集中数据收集并减少运营开销。借助 SageMaker Inference,Cisco 可以高效地管理和扩展其 LLM 部署,在整个 Webex 产品组合中利用生成式 AI 的强大功能,同时保持最佳的性能、可扩展性和成本效益。

📰 **解决方案概述:通过迁移到 SageMaker Inference 提高效率并降低成本** 为了解决将 LLM 直接嵌入其应用程序中所面临的可扩展性和资源利用挑战,WxAI 团队迁移到了 SageMaker Inference。通过利用此完全托管的服务来部署 LLM,Cisco 实现了显著的性能和成本优化机会。主要优势包括能够在单个端点后面部署多个 LLM,以实现更快的扩展和改进的响应延迟,以及节省成本。此外,WxAI 团队实施了一个 LLM 代理,以简化 Webex 团队对 LLM 的访问,实现集中数据收集并减少运营开销。借助 SageMaker Inference,Cisco 可以高效地管理和扩展其 LLM 部署,在整个 Webex 产品组合中利用生成式 AI 的强大功能,同时保持最佳的性能、可扩展性和成本效益。 该体系结构建立在稳健且安全的 AWS 基础之上: * 该体系结构使用 AWS 服务(如应用程序负载均衡器、AWS WAF 和 EKS 集群)来实现无缝的入站流量、威胁缓解和容器化工作负载管理。 * LLM 代理(作为服务 VPC 的一部分在 EKS Pod 上部署的微服务)简化了 Webex 团队对 LLM 的集成,提供简化的界面并减少运营开销。LLM 代理支持在 SageMaker Inference、Amazon Bedrock 或其他 LLM 提供商上部署 LLM,以供 Webex 团队使用。 * 该体系结构使用 SageMaker Inference 来实现优化的模型部署、自动扩展和路由机制。 * 该系统集成了 Loki 用于日志记录、Amazon Managed Service for Prometheus 用于指标以及 Grafana 用于统一可视化,并与 Cisco SSO 无缝集成。 * 数据 VPC 包含数据层组件,包括 Amazon ElastiCache 用于缓存以及 Amazon Relational Database Service (Amazon RDS) 用于数据。

📢 **生成式 AI 正在推动 Webex 客户取得积极成果** Webex 对生成式 AI 的采用正在为客户带来切实的益处。使用该平台的 AI 驱动的会议摘要和见解的客户报告了生产力提高。使用该平台的生成式 AI 用于接触中心的 Webex 客户已处理了数十万个电话,客户满意度提高,处理时间缩短,实现了代理与客户之间更自然、更富有同理心的对话。Webex 对生成式 AI 的战略性集成正在赋予用户以更智能的方式工作并提供卓越的体验。

This post is co-authored with Travis Mehlinger and Karthik Raghunathan from Cisco.

Webex by Cisco is a leading provider of cloud-based collaboration solutions, including video meetings, calling, messaging, events, polling, asynchronous video, and customer experience solutions like contact center and purpose-built collaboration devices. Webex’s focus on delivering inclusive collaboration experiences fuels their innovation, which uses artificial intelligence (AI) and machine learning (ML), to remove the barriers of geography, language, personality, and familiarity with technology. Its solutions are underpinned with security and privacy by design. Webex works with the world’s leading business and productivity apps—including AWS.

Cisco’s Webex AI (WxAI) team plays a crucial role in enhancing these products with AI-driven features and functionalities, using large language models (LLMs) to improve user productivity and experiences. In the past year, the team has increasingly focused on building AI capabilities powered by LLMs to improve productivity and experience for users. Notably, the team’s work extends to Webex Contact Center, a cloud-based omni-channel contact center solution that empowers organizations to deliver exceptional customer experiences. By integrating LLMs, the WxAI team enables advanced capabilities such as intelligent virtual assistants, natural language processing (NLP), and sentiment analysis, allowing Webex Contact Center to provide more personalized and efficient customer support. However, as these LLM models grew to contain hundreds of gigabytes of data, the WxAI team faced challenges in efficiently allocating resources and starting applications with the embedded models. To optimize its AI/ML infrastructure, Cisco migrated its LLMs to Amazon SageMaker Inference, improving speed, scalability, and price-performance.

This post highlights how Cisco implemented new functionalities and migrated existing workloads to Amazon SageMaker inference components for their industry-specific contact center use cases. By integrating generative AI, they can now analyze call transcripts to better understand customer pain points and improve agent productivity. Cisco has also implemented conversational AI experiences, including chatbots and virtual agents that can generate human-like responses, to automate personalized communications based on customer context. Additionally, they are using generative AI to extract key call drivers, optimize agent workflows, and gain deeper insights into customer sentiment. Cisco’s adoption of SageMaker Inference has enabled them to streamline their contact center operations and provide more satisfying, personalized interactions that address customer needs.

In this post, we discuss the following:

Enhancing collaboration and customer engagement with generative AI: Webex’s AI-powered solutions

In this section, we discuss Cisco’s AI-powered use cases.

Meeting summaries and insights

For Webex Meetings, the platform uses generative AI to automatically summarize meeting recordings and transcripts. This extracts the key takeaways and action items, helping distributed teams stay informed even if they missed a live session. The AI-generated summaries provide a concise overview of important discussions and decisions, allowing employees to quickly get up to speed. Beyond summaries, Webex’s generative AI capabilities also surface intelligent insights from meeting content. This includes identifying action items, highlighting critical decisions, and generating personalized meeting notes and to-do lists for each participant. These insights help make meetings more productive and hold attendees accountable.

Enhancing contact center experiences

Webex is also applying generative AI to its contact center solutions, enabling more natural, human-like conversations between customers and agents. The AI can generate contextual, empathetic responses to customer inquiries, as well as automatically draft personalized emails and chat messages. This helps contact center agents work more efficiently while maintaining a high level of customer service.

Webex customers realize positive outcomes with generative AI

Webex’s adoption of generative AI is driving tangible benefits for customers. Clients using the platform’s AI-powered meeting summaries and insights have reported productivity gains. Webex customers using the platform’s generative AI for contact centers have handled hundreds of thousands of calls with improved customer satisfaction and reduced handle times, enabling more natural, empathetic conversations between agents and clients. Webex’s strategic integration of generative AI is empowering users to work smarter and deliver exceptional experiences.

For more details on how Webex is harnessing generative AI to enhance collaboration and customer engagement, see Webex | Exceptional Experiences for Every Interaction on the Webex blog.

Using SageMaker Inference to optimize resources for Cisco

Cisco’s WxAI team is dedicated to delivering advanced collaboration experiences powered by cutting-edge ML. The team develops a comprehensive suite of AI and ML features for the Webex ecosystem, including audio intelligence capabilities like noise removal and optimizing speaker voices, language intelligence for transcription and translation, and video intelligence features like virtual backgrounds. At the forefront of WxAI’s innovations is the AI-powered Webex Assistant, a virtual assistant that provides voice-activated control and seamless meeting support in multiple languages. To build these sophisticated capabilities, WxAI uses LLMs, which can contain up to hundreds of gigabytes of training data.

Initially, WxAI embedded LLM models directly into the application container images running on Amazon Elastic Kubernetes Service (Amazon EKS). However, as the models grew larger and more complex, this approach faced significant scalability and resource utilization challenges. Operating the resource-intensive LLMs through the applications required provisioning substantial compute resources, which slowed down processes like allocating resources and starting applications. This inefficiency hampered WxAI’s ability to rapidly develop, test, and deploy new AI-powered features for the Webex portfolio. To address these challenges, the WxAI team turned to SageMaker Inference—a fully managed AI inference service that allows seamless deployment and scaling of models independently from the applications that use them. By decoupling the LLM hosting from the Webex applications, WxAI could provision the necessary compute resources for the models without impacting the core collaboration and communication capabilities.

 “The applications and the models work and scale fundamentally differently, with entirely different cost considerations; by separating them rather than lumping them together, it’s much simpler to solve issues independently.”

– Travis Mehlinger, Principal Engineer at Cisco.

This architectural shift has enabled Webex to harness the power of generative AI across its suite of collaboration and customer engagement solutions.

Solution overview: Improving efficiency and reducing costs by migrating to SageMaker Inference

To address the scalability and resource utilization challenges faced with embedding LLMs directly into their applications, the WxAI team migrated to SageMaker Inference. By taking advantage of this fully managed service for deploying LLMs, Cisco unlocked significant performance and cost-optimization opportunities. Key benefits include the ability to deploy multiple LLMs behind a single endpoint for faster scaling and improved response latencies, as well as cost savings. Additionally, the WxAI team implemented an LLM proxy to simplify access to LLMs for Webex teams, enable centralized data collection, and reduce operational overhead. With SageMaker Inference, Cisco can efficiently manage and scale their LLM deployments, harnessing the power of generative AI across the Webex portfolio while maintaining optimal performance, scalability, and cost-effectiveness.

The following diagram illustrates the WxAI architecture on AWS.

The architecture is built on a robust and secure AWS foundation:

Use case overview: Contact center topic analytics

A key focus area for the WxAI team is to enhance the capabilities of the Webex Contact Center platform. A typical Webex Contact Center installation has hundreds of agents handling many interactions through various channels like phone calls and digital channels. Webex’s AI-powered Topic Analytics feature extracts the key reasons customers are calling about by analyzing aggregated historical interactions and clustering them into meaningful topic categories, as shown in the following screenshot. The contact center administrator can then use these insights to optimize operations, enhance agent performance, and ultimately deliver a more satisfactory customer experience.

The Topic Analytics feature is powered by a pipeline of three models: a call driver extraction model, a topic clustering model, and a topic labeling model, as illustrated in the following diagram.

The model details are as follows:

This solution also used the auto scaling capabilities of SageMaker to dynamically adjust the number of instances based on a desired minimum of 1 endpoint and maximum of 30. This approach provides efficient resource utilization while maintaining high throughput, allowing the WxAI platform to handle batch jobs overnight and scale to hundreds of inferences per minute during peak hours. By deploying the model on SageMaker Inference with auto scaling, WxAI team was able to deliver reliable and accurate responses to customer interactions for their Topic Analytics use case.

By accurately pinpointing the call driver, the system can suggest appropriate actions, resources, and next steps to the agent, streamlining the customer support process, further leading to personalized and accurate responses to customer questions.

To handle fluctuating demand and optimize resource utilization, the WxAI team implemented auto scaling for their SageMaker Inference endpoints. They configured the endpoints to scale from a minimum to a maximum instance count based on GPU utilization. Additionally, the LLM proxy routed requests between the different LLMs deployed on SageMaker Inference. This proxy abstracts the complexities of communicating with various LLM providers and enables centralized data collection and analysis. This led to enhanced generative AI workflows, optimized latency, and personalized use case implementations.

Benefits

Through the strategic adoption of AWS AI services, Cisco’s WxAI team has realized significant benefits, enabling them to build cutting-edge, AI-powered collaboration capabilities more rapidly and cost-effectively:

Cisco’s contributions to SageMaker Inference: Enhancing generative AI inference capabilities

Building upon the success of their strategic migration to SageMaker Inference, Cisco has been instrumental in partnering with the SageMaker Inference team to build and enhance key generative AI capabilities within the SageMaker platform. Since the early days of generative AI, Cisco has provided the SageMaker Inference team with valuable inputs and expertise, enabling the introduction of several new features and optimizations:

By closely partnering with the SageMaker Inference team, Cisco has played a pivotal role in driving the rapid evolution of generative AI Inference capabilities in SageMaker. The features and optimizations introduced through this collaboration are empowering AWS customers to unlock the transformative potential of generative AI with greater ease, cost-effectiveness, and performance.

“Our partnership with the SageMaker Inference product team goes back to the early days of generative AI, and we believe the features we have built in collaboration, from cost optimizations to high-performance model deployment, will broadly help other enterprises rapidly adopt and scale generative AI workloads on SageMaker, unlocking new frontiers of innovation and business transformation.”

– Travis Mehlinger, Principal Engineer at Cisco.

Conclusion

By using AWS services like SageMaker Inference and Amazon Bedrock for generative AI, Cisco’s WxAI team has been able to optimize their AI/ML infrastructure, enabling them to build and deploy AI-powered features more efficiently, reliably, and cost-effectively. This strategic approach has unlocked significant benefits for Cisco in deploying and scaling its generative AI capabilities for the Webex platform. Cisco’s own journey with generative AI, as showcased in this post, offers valuable lessons and insights for other uses of SageMaker Inference.

Recognizing the impact of generative AI, Cisco has played a crucial role in shaping the future of these capabilities within SageMaker Inference. By providing valuable insights and hands-on collaboration, Cisco has helped AWS develop a range of powerful features that are making generative AI more accessible and scalable for organizations. From optimizing infrastructure costs and performance to streamlining model deployment and scaling, Cisco’s contributions have been instrumental in enhancing the SageMaker Inference service.

Moving forward, the Cisco-AWS partnership aims to drive further advancements in areas like conversational and generative AI inference. As generative AI adoption accelerates across industries, Cisco’s Webex platform is designed to scale and streamline user experiences through various use cases discussed in this post and beyond. You can expect to see ongoing innovation from this collaboration in SageMaker Inference capabilities, as Cisco and SageMaker Inference continue to push the boundaries of what’s possible in the world of AI.

For more information on Webex Contact Center’s Topic Analytics feature and related AI capabilities, refer to The Webex Advantage: Navigating Customer Experience in the Age of AI on the Webex blog.


About the Authors

Travis Mehlinger is a Principal Software Engineer in the Webex Collaboration AI group, where he helps teams develop and operate cloud-centered AI and ML capabilities to support Webex AI features for customers around the world. In his spare time, Travis enjoys cooking barbecue, playing video games, and traveling around the US and UK to race go-karts.

Karthik Raghunathan is the Senior Director for Speech, Language, and Video AI in the Webex Collaboration AI Group. He leads a multidisciplinary team of software engineers, machine learning engineers, data scientists, computational linguists, and designers who develop advanced AI-driven features for the Webex collaboration portfolio. Prior to Cisco, Karthik held research positions at MindMeld (acquired by Cisco), Microsoft, and Stanford University.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

Ravi Thakur is a Senior Solutions Architect at AWS, based in Charlotte, NC. He specializes in solving complex business challenges using distributed, cloud-centered, and well-architected patterns. Ravi’s expertise includes microservices, containerization, AI/ML, and generative AI. He empowers AWS strategic customers on digital transformation journeys, delivering bottom-line benefits. In his spare time, Ravi enjoys motorcycle rides, family time, reading, movies, and traveling.

Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.

Madhur Prashant is an AI and ML Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Webex SageMaker Inference 生成式 AI 接触中心 协作
相关文章