AWS Machine Learning Blog 2024年07月03日
Identify idle endpoints in Amazon SageMaker
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了在快速发展的技术环境中,如何识别和管理 Amazon SageMaker 中的闲置端点,以优化资源利用率并降低运营成本。文章介绍了使用 Python 脚本与 Amazon CloudWatch 集成,通过监控端点调用次数来识别闲置端点,并提供了一些在识别闲置端点后采取的行动建议,例如删除或缩减端点、审查和改进模型部署策略等。

🎯 **识别闲置端点** 本文提供了一个 Python 脚本,该脚本使用 AWS SDK for Python (Boto3) 与 SageMaker 和 CloudWatch 交互,自动化查询 CloudWatch 指标以确定端点活动,并根据指定时间段内的调用次数识别闲置端点。脚本通过定义全局变量、初始化 AWS 客户端、检索 SageMaker 端点、查询 CloudWatch 指标并记录端点活动来实现这一功能。

🎯 **权限要求** 要运行此脚本,需要确保 AWS 身份和访问管理 (IAM) 用户或角色拥有必要的权限。具体来说,脚本需要 CloudWatch 权限(cloudwatch:GetMetricData 和 cloudwatch:ListMetrics)和 SageMaker 权限(sagemaker:ListEndpoints)。

🎯 **采取行动** 识别出闲置端点后,可以采取一些行动来优化资源利用率并降低运营成本。例如,删除或缩减端点、审查和改进模型部署策略、实施自动缩放策略以及探索无服务器推理选项。

🎯 **结论** 本文介绍了识别 SageMaker 中闲置端点的重要性,并提供了一个 Python 脚本来自动化此过程。通过实施主动监控解决方案和优化资源利用率,SageMaker 用户可以有效地管理其端点,降低运营成本,并最大限度地提高其机器学习工作流程的效率。

🎯 **资源** 本文还提供了有关推理成本优化最佳实践、使用 Amazon CloudWatch 指标以及无服务器推理等主题的额外资源链接,以供进一步参考。

Amazon SageMaker is a machine learning (ML) platform designed to simplify the process of building, training, deploying, and managing ML models at scale. With a comprehensive suite of tools and services, SageMaker offers developers and data scientists the resources they need to accelerate the development and deployment of ML solutions.

In today’s fast-paced technological landscape, efficiency and agility are essential for businesses and developers striving to innovate. AWS plays a critical role in enabling this innovation by providing a range of services that abstract away the complexities of infrastructure management. By handling tasks such as provisioning, scaling, and managing resources, AWS allows developers to focus more on their core business logic and iterate quickly on new ideas.

As developers deploy and scale applications, unused resources such as idle SageMaker endpoints can accumulate unnoticed, leading to higher operational costs. This post addresses the issue of identifying and managing idle endpoints in SageMaker. We explore methods to monitor SageMaker endpoints effectively and distinguish between active and idle ones. Additionally, we walk through a Python script that automates the identification of idle endpoints using Amazon CloudWatch metrics.

Identify idle endpoints with a Python script

To effectively manage SageMaker endpoints and optimize resource utilization, we use a Python script that uses the AWS SDK for Python (Boto3) to interact with SageMaker and CloudWatch. This script automates the process of querying CloudWatch metrics to determine endpoint activity and identifies idle endpoints based on the number of invocations over a specified time period.

Let’s break down the key components of the Python script and explain how each part contributes to the identification of idle endpoints:

from datetime import datetime, timedeltaimport boto3import logging# AWS clients initializationcloudwatch = boto3.client("cloudwatch")sagemaker = boto3.client("sagemaker")# Global variablesNAMESPACE = "AWS/SageMaker"METRIC = "Invocations"LOOKBACK = 1  # Number of days to look back for activityPERIOD = 86400  # We opt for a granularity of 1 Day to reduce the volume of metrics retrieved while maintaining accuracy.# Calculate time range for querying CloudWatch metricsago = datetime.utcnow() - timedelta(days=LOOKBACK)now = datetime.utcnow()
# Helper function to extract endpoint name from CloudWatch metricdef get_endpoint_name_from_metric(metric):    for d in metric["Dimensions"]:        if d["Name"] == "EndpointName" or d["Name"] == "InferenceComponentName" :            yield d["Value"]# Helper Function to aggregate individual metrics for a designated endpoint and output the total. This validation helps in determining if the endpoint has been idle during the specified period.def list_metrics():    paginator = cloudwatch.get_paginator("list_metrics")    response_iterator = paginator.paginate(Namespace=NAMESPACE, MetricName=METRIC)    return [m for r in response_iterator for m in r["Metrics"]]# Helper function to check if endpoint is in use based on CloudWatch metricsdef is_endpoint_busy(metric):    metric_values = cloudwatch.get_metric_data(        MetricDataQueries=[{            "Id": "metricname",            "MetricStat": {                "Metric": {                    "Namespace": metric["Namespace"],                    "MetricName": metric["MetricName"],                    "Dimensions": metric["Dimensions"],                },                "Period": PERIOD,                "Stat": "Sum",                "Unit": "None",            },        }],        StartTime=ago,        EndTime=now,        ScanBy="TimestampAscending",        MaxDatapoints=24 * (LOOKBACK + 1),    )    return sum(metric_values.get("MetricDataResults", [{}])[0].get("Values", [])) > 0# Helper function to log endpoint activitydef log_endpoint_activity(endpoint_name, is_busy):    status = "BUSY" if is_busy else "IDLE"    log_message = f"{datetime.utcnow()} - Endpoint {endpoint_name} {status}"    print(log_message)
# Main function to identify idle endpoints and log their activity statusdef main():    endpoints = sagemaker.list_endpoints()["Endpoints"]        if not endpoints:        print("No endpoints found")        return    existing_endpoints_name = []    for endpoint in endpoints:        existing_endpoints_name.append(endpoint["EndpointName"])        for metric in list_metrics():        for endpoint_name in get_endpoint_name_from_metric(metric):            if endpoint_name in existing_endpoints_name:                is_busy = is_endpoint_busy(metric)                log_endpoint_activity(endpoint_name, is_busy)            else:                print(f"Endpoint {endpoint_name} not active")if __name__ == "__main__":    main()

By following along with the explanation of the script, you’ll gain a deeper understanding of how to automate the identification of idle endpoints in SageMaker, paving the way for more efficient resource management and cost optimization.

Permissions required to run the script

Before you run the provided Python script to identify idle endpoints in SageMaker, make sure your AWS Identity and Access Management (IAM) user or role has the necessary permissions. The permissions required for the script include:

Run the Python script

You can run the Python script using various methods, including:

In this post, we demonstrate running the Python script through the AWS CLI.

Actions to take after identifying idle endpoints

After you’ve successfully identified idle endpoints in your SageMaker environment using the Python script, you can take proactive steps to optimize resource utilization and reduce operational costs. The following are some actionable measures you can implement:

Conclusion

In this post, we discussed the importance of identifying idle endpoints in SageMaker and provided a Python script to help automate this process. By implementing proactive monitoring solutions and optimizing resource utilization, SageMaker users can effectively manage their endpoints, reduce operational costs, and maximize the efficiency of their machine learning workflows.

Get started with the techniques demonstrated in this post to automate cost monitoring for SageMaker inference. Explore AWS re:Post for valuable resources on optimizing your cloud infrastructure and maximizing AWS services.

Resources

For more information about the features and services used in this post, refer to the following:


About the authors

Pablo Colazurdo is a Principal Solutions Architect at AWS where he enjoys helping customers to launch successful projects in the Cloud. He has many years of experience working on varied technologies and is passionate about learning new things. Pablo grew up in Argentina but now enjoys the rain in Ireland while listening to music, reading or playing D&D with his kids.

Ozgur Canibeyaz is a Senior Technical Account Manager at AWS with 8 years of experience. Ozgur helps customers optimize their AWS usage by navigating technical challenges, exploring cost-saving opportunities, achieving operational excellence, and building innovative services using AWS products.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SageMaker 闲置端点 成本优化 资源管理 CloudWatch
相关文章