Real value, real time: Production AI with Amazon SageMaker and Tecton

This post is cowritten with Isaac Cameron and Alex Gnibus from Tecton.

Businesses are under pressure to show return on investment (ROI) from AI use cases, whether predictive machine learning (ML) or generative AI. Only 54% of ML prototypes make it to production, and only 5% of generative AI use cases make it to production.

ROI isn’t just about getting to production—it’s about model accuracy and performance. You need a scalable, reliable system with high accuracy and low latency for the real-time use cases that directly impact the bottom line every millisecond.

Fraud detection, for example, requires extremely low latency because decisions need to be made in the time it takes to swipe a credit card. With fraud on the rise, more organizations are pushing to implement successful fraud detection systems. The US nationwide fraud losses topped $10 billion in 2023, a 14% increase from 2022. Global ecommerce fraud is predicted to exceed $343 billion by 2027.

But building and managing an accurate, reliable AI application that can make a dent in that $343 billion problem is overwhelmingly complex.

ML teams often start by manually stitching together different infrastructure components. It seems straightforward at first for batch data, but the engineering gets even more complicated when you need to go from batch data to incorporating real-time and streaming data sources, and from batch inference to real-time serving.

Engineers need to build and orchestrate the data pipelines, juggle the different processing needs for each data source, manage the compute infrastructure, build reliable serving infrastructure for inference, and more. Without the capabilities of Tecton, the architecture might look like the following diagram.

Accelerate your AI development and deployment with Amazon SageMaker and Tecton

All that manual complexity gets simplified with Tecton and Amazon SageMaker. Together, Tecton and SageMaker abstract away the engineering needed for production, real-time AI applications. This enables faster time to value, and engineering teams can focus on building new features and use cases instead of struggling to manage the existing infrastructure.

Using SageMaker, you can build, train and deploy ML models. Meanwhile, Tecton makes it straightforward to compute, manage, and retrieve features to power models in SageMaker, both for offline training and online serving. This streamlines the end-to-end feature lifecycle for production-scale use cases, resulting in a simpler architecture, as shown in the following diagram.

How does it work? With Tecton’s simple-to-use declarative framework, you define the transformations for your features in a few lines of code, and Tecton builds the pipelines needed to compute, manage, and serve the features. Tecton takes care of the full deployment into production and online serving.

It doesn’t matter if it’s batch, streaming, or real-time data or whether it’s offline or online serving. It’s one common framework for every data processing need in end-to-end feature production.

This framework creates a central hub for feature management and governance with enterprise feature store capabilities, making it straightforward to observe the data lineage for each feature pipeline, monitor data quality, and reuse features across multiple models and teams.

The following diagram shows the Tecton declarative framework.

The next section examines a fraud detection example to show how Tecton and SageMaker accelerate both training and real-time serving for a production AI system.

Streamline feature development and model training

First, you need to develop the features and train the model. Tecton’s declarative framework makes it simple to define features and generate accurate training data for SageMaker models:

Experiment and iterate on features in SageMaker notebooks

Orchestrate with Tecton-managed EMR clusters

Amazon EMR

Generate accurate training data for SageMaker models –

Next, the features need to be served online for the final model to consume in production.

Serve features with robust, real-time online inference

Tecton’s declarative framework extends to online serving. Tecton’s real-time infrastructure is designed to help meet the demands of extensive applications and can reliably run 100,000 requests per second.

For critical ML apps, it’s hard to meet demanding service level agreements (SLAs) in a scalable and cost-efficient manner. Real-time use cases such as fraud detection typically have a p99 latency budget between 100 to 200 milliseconds. That means 99% of requests need to be faster than 200ms for the end-to-end process from feature retrieval to model scoring and post-processing.

Feature serving only gets a fraction of that end-to-end latency budget, which means you need your solution to be especially quick. Tecton accommodates these latency requirements by integrating with both disk-based and in-memory data stores, supporting in-memory caching, and serving features for inference through a low-latency REST API, which integrates with SageMaker endpoints.

Now we can complete our fraud detection use case. In a fraud detection system, when someone makes a transaction (such as buying something online), your app might follow these steps:

It checks with other services to get more information (for example, “Is this merchant known to be risky?”) from third-party APIs It pulls important historical data about the user and their behavior (for example, “How often does this person usually spend this much?” or “Have they made purchases from this location before?”), requesting the ML features from Tecton It will likely use streaming features to compare the current transaction with recent spending activity over the last few hours or minutes It sends all this information to the model hosted on Amazon SageMaker that predicts whether the transaction looks fraudulent.

This process is shown in the following diagram.

Expand to generative AI use cases with your existing AWS and Tecton architecture

After you’ve developed ML features using the Tecton and AWS architecture, you can extend your ML work to generative AI use cases.

For instance, in the fraud detection example, you might want to add an LLM-powered customer support chat that helps a user answer questions about their account. To generate a useful response, the chat would need to reference different data sources, including the unstructured documents in your knowledge base (such as policy documentation about what causes an account suspension) and structured data such as transaction history and real-time account activity.

If you’re using a Retrieval Augmented Generation (RAG) system to provide context to your LLM, you can use your existing ML feature pipelines as context. With Tecton, you can either enrich your prompts with contextual data or provide features as tools to your LLM—all using the same declarative framework.

To choose and customize the model that will best suit your use case, Amazon Bedrock provides a range of pre-trained foundation models (FMs) for inference, or you can use SageMaker for more extensive model building and training.

The following graphic shows how Amazon Bedrock is incorporated to support generative AI capabilities in the fraud detection system architecture.

Build valuable AI apps faster with AWS and Tecton

In this post, we walked through how SageMaker and Tecton enable AI teams to train and deploy a high-performing, real-time AI application—without the complex data engineering work. Tecton combines production ML capabilities with the convenience of doing everything from within SageMaker, whether that’s at the development stage for training models or doing real-time inference in production.

To get started, refer to Getting Started with Amazon SageMaker & Tecton’s Feature Platform, a more detailed guide on how to use Tecton with Amazon SageMaker. And if you can’t wait to try it yourself, check out the Tecton interactive demo and observe a fraud detection use case in action.

You can also find Tecton at AWS re:Invent. Reach out to set up a meeting with experts onsite about your AI engineering needs.

About the Authors

Isaac Cameron is Lead Solutions Architect at Tecton, guiding customers in designing and deploying real-time machine learning applications. Having previously built a custom ML platform from scratch at a major U.S. airline, he brings firsthand experience of the challenges and complexities involved—making him a strong advocate for leveraging modern, managed ML/AI infrastructure.

Alex Gnibus is a technical evangelist at Tecton, making technical concepts accessible and actionable for engineering teams. Through her work educating practitioners, Alex has developed deep expertise in identifying and addressing the practical challenges teams face when productionizing AI systems.

Arnab Sinha is a Senior Solutions Architect at AWS, specializing in designing scalable solutions that drive business outcomes in AI, machine learning, big data, digital transformation, and application modernization. With expertise across industries like energy, healthcare, retail and manufacturing, Arnab holds all AWS Certifications, including the ML Specialty, and has led technology and engineering teams before joining AWS.

Accelerate your AI development and deployment with Amazon SageMaker and Tecton

Streamline feature development and model training

Serve features with robust, real-time online inference

Expand to generative AI use cases with your existing AWS and Tecton architecture

Build valuable AI apps faster with AWS and Tecton

About the Authors

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签