AWS Machine Learning Blog 2024年12月05日
Real value, real time: Production AI with Amazon SageMaker and Tecton
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了如何利用Tecton和Amazon SageMaker简化生产级实时AI应用的开发和部署。许多AI项目难以实现生产化,而Tecton和SageMaker的结合可以解决这一问题,通过抽象化工程复杂性,加速价值实现。文章以欺诈检测为例,展示了如何利用Tecton管理特征,并利用SageMaker训练和部署模型,从而满足实时应用的低延迟要求。此外,文章还介绍了如何将该架构扩展到生成式AI用例,例如结合LLM提供客户支持。通过使用Tecton的声明式框架,开发人员可以简化特征工程,并专注于构建新的AI功能,从而加快AI应用的开发和部署速度。

🤔 **AI应用生产化挑战:** 许多机器学习原型无法进入生产环境,生成式AI的生产化率更低,这主要是因为构建和管理准确可靠的AI应用非常复杂,需要处理大量工程问题,包括数据管道、计算基础设施、模型服务等。

🚀 **Tecton与SageMaker的协同:** Tecton和SageMaker的结合可以简化AI应用的开发和部署,Tecton负责特征管理和计算,SageMaker负责模型训练和部署,两者协同工作,形成一个完整的AI应用开发流程,从而加速价值实现。

📊 **特征管理与在线服务:** Tecton提供了一个声明式框架,简化了特征的定义和管理,并支持离线和在线特征服务,可以轻松处理批处理、流式和实时数据。Tecton还提供企业级特征存储能力,方便进行数据血缘追踪和数据质量监控。

⏱️ **实时欺诈检测示例:** 文章以欺诈检测为例,展示了如何利用Tecton和SageMaker构建实时AI应用,满足低延迟要求。通过Tecton获取用户行为特征,并将其发送到SageMaker上的模型进行预测,实现实时欺诈检测。

💡 **扩展至生成式AI:** 基于现有的Tecton和AWS架构,可以轻松扩展到生成式AI用例,例如结合LLM提供客户支持。Tecton可以将上下文数据或特征提供给LLM,从而提高生成式AI应用的性能和效率。

This post is cowritten with Isaac Cameron and Alex Gnibus from Tecton.

Businesses are under pressure to show return on investment (ROI) from AI use cases, whether predictive machine learning (ML) or generative AI. Only 54% of ML prototypes make it to production, and only 5% of generative AI use cases make it to production.

ROI isn’t just about getting to production—it’s about model accuracy and performance. You need a scalable, reliable system with high accuracy and low latency for the real-time use cases that directly impact the bottom line every millisecond.

Fraud detection, for example, requires extremely low latency because decisions need to be made in the time it takes to swipe a credit card. With fraud on the rise, more organizations are pushing to implement successful fraud detection systems. The US nationwide fraud losses topped $10 billion in 2023, a 14% increase from 2022. Global ecommerce fraud is predicted to exceed $343 billion by 2027.

But building and managing an accurate, reliable AI application that can make a dent in that $343 billion problem is overwhelmingly complex.

ML teams often start by manually stitching together different infrastructure components. It seems straightforward at first for batch data, but the engineering gets even more complicated when you need to go from batch data to incorporating real-time and streaming data sources, and from batch inference to real-time serving.

Engineers need to build and orchestrate the data pipelines, juggle the different processing needs for each data source, manage the compute infrastructure, build reliable serving infrastructure for inference, and more. Without the capabilities of Tecton, the architecture might look like the following diagram.

Accelerate your AI development and deployment with Amazon SageMaker and Tecton

All that manual complexity gets simplified with Tecton and Amazon SageMaker. Together, Tecton and SageMaker abstract away the engineering needed for production, real-time AI applications. This enables faster time to value, and engineering teams can focus on building new features and use cases instead of struggling to manage the existing infrastructure.

Using SageMaker, you can build, train and deploy ML models. Meanwhile, Tecton makes it straightforward to compute, manage, and retrieve features to power models in SageMaker, both for offline training and online serving. This streamlines the end-to-end feature lifecycle for production-scale use cases, resulting in a simpler architecture, as shown in the following diagram.

How does it work? With Tecton’s simple-to-use declarative framework, you define the transformations for your features in a few lines of code, and Tecton builds the pipelines needed to compute, manage, and serve the features. Tecton takes care of the full deployment into production and online serving.

It doesn’t matter if it’s batch, streaming, or real-time data or whether it’s offline or online serving. It’s one common framework for every data processing need in end-to-end feature production.

This framework creates a central hub for feature management and governance with enterprise feature store capabilities, making it straightforward to observe the data lineage for each feature pipeline, monitor data quality, and reuse features across multiple models and teams.

The following diagram shows the Tecton declarative framework.

The next section examines a fraud detection example to show how Tecton and SageMaker accelerate both training and real-time serving for a production AI system.

Streamline feature development and model training

First, you need to develop the features and train the model. Tecton’s declarative framework makes it simple to define features and generate accurate training data for SageMaker models:

Next, the features need to be served online for the final model to consume in production.

Serve features with robust, real-time online inference

Tecton’s declarative framework extends to online serving. Tecton’s real-time infrastructure is designed to help meet the demands of extensive applications and can reliably run 100,000 requests per second.

For critical ML apps, it’s hard to meet demanding service level agreements (SLAs) in a scalable and cost-efficient manner. Real-time use cases such as fraud detection typically have a p99 latency budget between 100 to 200 milliseconds. That means 99% of requests need to be faster than 200ms for the end-to-end process from feature retrieval to model scoring and post-processing.

Feature serving only gets a fraction of that end-to-end latency budget, which means you need your solution to be especially quick. Tecton accommodates these latency requirements by integrating with both disk-based and in-memory data stores, supporting in-memory caching, and serving features for inference through a low-latency REST API, which integrates with SageMaker endpoints.

Now we can complete our fraud detection use case. In a fraud detection system, when someone makes a transaction (such as buying something online), your app might follow these steps:

    It checks with other services to get more information (for example, “Is this merchant known to be risky?”) from third-party APIs It pulls important historical data about the user and their behavior (for example, “How often does this person usually spend this much?” or “Have they made purchases from this location before?”), requesting the ML features from Tecton It will likely use streaming features to compare the current transaction with recent spending activity over the last few hours or minutes It sends all this information to the model hosted on Amazon SageMaker that predicts whether the transaction looks fraudulent.

This process is shown in the following diagram.

Expand to generative AI use cases with your existing AWS and Tecton architecture

After you’ve developed ML features using the Tecton and AWS architecture, you can extend your ML work to generative AI use cases.

For instance, in the fraud detection example, you might want to add an LLM-powered customer support chat that helps a user answer questions about their account. To generate a useful response, the chat would need to reference different data sources, including the unstructured documents in your knowledge base (such as policy documentation about what causes an account suspension) and structured data such as transaction history and real-time account activity.

If you’re using a Retrieval Augmented Generation (RAG) system to provide context to your LLM, you can use your existing ML feature pipelines as context. With Tecton, you can either enrich your prompts with contextual data or provide features as tools to your LLM—all using the same declarative framework.

To choose and customize the model that will best suit your use case, Amazon Bedrock provides a range of pre-trained foundation models (FMs) for inference, or you can use SageMaker for more extensive model building and training.

The following graphic shows how Amazon Bedrock is incorporated to support generative AI capabilities in the fraud detection system architecture.

Build valuable AI apps faster with AWS and Tecton

In this post, we walked through how SageMaker and Tecton enable AI teams to train and deploy a high-performing, real-time AI application—without the complex data engineering work. Tecton combines production ML capabilities with the convenience of doing everything from within SageMaker, whether that’s at the development stage for training models or doing real-time inference in production.

To get started, refer to Getting Started with Amazon SageMaker & Tecton’s Feature Platform, a more detailed guide on how to use Tecton with Amazon SageMaker. And if you can’t wait to try it yourself, check out the Tecton interactive demo and observe a fraud detection use case in action.

You can also find Tecton at AWS re:Invent. Reach out to set up a meeting with experts onsite about your AI engineering needs.


About the Authors

Isaac Cameron is Lead Solutions Architect at Tecton, guiding customers in designing and deploying real-time machine learning applications. Having previously built a custom ML platform from scratch at a major U.S. airline, he brings firsthand experience of the challenges and complexities involved—making him a strong advocate for leveraging modern, managed ML/AI infrastructure.

Alex Gnibus is a technical evangelist at Tecton, making technical concepts accessible and actionable for engineering teams. Through her work educating practitioners, Alex has developed deep expertise in identifying and addressing the practical challenges teams face when productionizing AI systems.

Arnab Sinha is a Senior Solutions Architect at AWS, specializing in designing scalable solutions that drive business outcomes in AI, machine learning, big data, digital transformation, and application modernization. With expertise across industries like energy, healthcare, retail and manufacturing, Arnab holds all AWS Certifications, including the ML Specialty, and has led technology and engineering teams before joining AWS.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Tecton SageMaker AI应用 特征工程 实时AI
相关文章