未知数据源 2024年10月02日
Monitoring Data Ingestion Tasks with Amazon CloudWatch Metrics and Alarms
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了在AWS中进行数据摄入的相关内容,包括使用Amazon CloudWatch嵌入式指标格式建立数据摄入工作负载的监控和警报,还提供了常见数据摄入模式的架构及增强监控的方法,涵盖了多种场景的监测,并详细说明了构建解决方案的步骤及所需的前提条件。

📄数据摄入是常见任务,文中提供使用Amazon CloudWatch嵌入式指标格式建立监控和警报的指南,该格式可使摄入复杂高基数应用数据并轻松创建自定义指标。

💻文中给出常见数据摄入模式的架构,对象上传至Amazon S3后,S3事件触发AWS Lambda函数,该函数进行数据处理并创建自定义Amazon CloudWatch指标,同时还有定时检查函数运行并填充相关指标。

📋该解决方案提供多种场景的监测,如数据质量检查中的错误、各供应商在预期时间内成功摄入文件等,并详细说明了构建此解决方案的步骤及所需的前提条件。

<p>Data is produced every day in increasing volumes and varieties in on-premises and cloud environments. <a href="https://aws.amazon.com/blogs/storage/easily-ingest-data-into-aws-for-building-data-lakes-archiving-and-more/&quot;&gt;Data ingestion into AWS</a> is a common task and there are many services and architecture patterns that customers use to bring in data. In this post, we provide a guide for establishing monitoring and alerting on a data ingestion workload using the <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format.html&quot;&gt;Amazon CloudWatch embedded metric format</a>.</p><p>Telemetry is a vital component that must be included when designing your data ingestion workloads. To help our customers design for operational excellence, AWS has created services and mechanisms to emit and collect logs, metrics, and traces to enable you to understand the internal state and health of the workload. <a href="https://aws.amazon.com/cloudwatch/&quot;&gt;Amazon CloudWatch</a> launched the embedded metric format in November 2019, which enables the ingestion of complex high-cardinality application data in the form of logs. These logs in the embedded metric format can be used to <a href="https://aws.amazon.com/blogs/mt/enhancing-workload-observability-using-amazon-cloudwatch-embedded-metric-format/&quot;&gt;easily create custom metrics</a> without having to use multiple libraries or maintain separate code.</p><p>This post includes an architecture of a common data ingestion pattern and an enhanced monitoring approach using the CloudWatch embedded metric format. We also provide a link to the open-source <a href="https://github.com/aws-samples/amazon-cloudwatch-monitoring-data-ingestion-tasks&quot;&gt;GitHub repository</a> containing the <a href="https://aws.amazon.com/cloudformation/&quot;&gt;AWS CloudFormation</a> template and steps to create the solution in your own account.</p><h2>Solution overview</h2><div id="attachment_32917" class="wp-caption aligncenter c4"><img aria-describedby="caption-attachment-32917" class="wp-image-32917" src="https://d2908q01vomqb2.cloudfront.net/972a67c48192728a34979d9a35164c1295401b71/2022/09/21/couldops_737_1.png&quot; alt="Figure 1: Objects are uploaded into Amazon S3. An S3 Event Notification triggers an AWS Lambda Function. The Lambda Function ingests the object and creates custom Amazon CloudWatch metrics. The Timeliness Checker Function runs on an interval and populates Timeliness Checker Metrics." width="600" height="564" /><p id="caption-attachment-32917" class="wp-caption-text">Figure 1: Objects are uploaded into Amazon S3. An S3 Event Notification triggers an AWS Lambda Function. The Lambda Function ingests the object and creates custom Amazon CloudWatch metrics. The Timeliness Checker Function runs on an interval and populates Timeliness Checker Metrics.</p></div><h2>Walkthrough</h2><p>In the architecture provided, JSON files from two vendors are uploaded to the raw <a href="https://aws.amazon.com/s3/&quot;&gt;Amazon Simple Storage Service (Amazon S3)</a> bucket. The files are partitioned by the vendor name and the date. After a vendor file is uploaded, an S3 event triggers an <a href="https://aws.amazon.com/lambda/&quot;&gt;AWS Lambda</a> function which reads the JSON file, performs validation checks, and writes to the processed S3 bucket if the data fields and types are valid.</p><p><strong>Vendor A JSON Sample:</strong></p><p><strong>Vendor B JSON Sample:</strong></p><p>This solution provides monitoring for the following scenarios:</p><ul><li>Errors<ul><li>KeyError or TypeError from data quality checks</li><li>Any other exceptions raised in the Lambda function for data processing</li></ul></li><li>Successful file ingestion within expected timeframe for each vendor<ul><li>Vendor A = at least one file ingested in the past 1 hour</li><li>Vendor B = at least one file ingested in past 24 hours</li></ul></li></ul><h2>Steps</h2><p>These are the high-level steps and components needed to build the solution in your own account. The following sections have details and screenshots for reference.</p><ol><li>Create the CloudFormation stack.</li><li>View CloudWatch Alarm for alerting on exceptions raised in Lambda data processor function.</li><li>View Lambda function instrumentation.</li><li>View Lambda function for monitoring the file ingestion within the expected timeframes.</li></ol><p>The GitHub repository containing the scripts and templates required to build the solution is located <a href="https://github.com/aws-samples/amazon-cloudwatch-monitoring-data-ingestion-tasks&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;To avoid unexpected charges, make sure to follow the clean-up procedures at the end of this post.</p><h2>Prerequisites</h2><p>The following prerequisites are required for this post:</p><h2>Create the CloudFormation stack</h2><p>The following commands can be used to deploy the solution into your AWS account:</p><ol><li>Configure the AWS CLI with your AWS account and preferred region. Refer to the guide here for different methods and detailed instructions.</li><li>Run the following command to initialize the AWS SAM project from the GitHub source:</li></ol><ol start="3"><li>Run the following command which processes the AWS SAM template file, application code, and dependencies:</li></ol><ol start="4"><li>Run the following command to package the AWS SAM application as a .zip file and upload to Amazon S3:</li></ol><ol start="5"><li>Run the following command to deploy the AWS SAM application which includes all of the required infrastructure:</li></ol><h2>View CloudWatch Alarm for alerting on Data Processor Function errors</h2><p>Once the solution infrastructure has been deployed, you should be able to navigate to the CloudWatch service and see three different CloudWatch alarms. The CloudWatch alarm in the following screenshot enters the ALARM state if one or more errors occur in the past five minutes for the Lambda function which processes the data.</p><div id="attachment_32921" class="wp-caption aligncenter c4"><img aria-describedby="caption-attachment-32921" class="wp-image-32921" src="https://d2908q01vomqb2.cloudfront.net/972a67c48192728a34979d9a35164c1295401b71/2022/09/21/couldops_737_2.png&quot; alt="Figure 2: The CloudWatch alarm is in the ALARM state due to an error encountered in the last five minutes." width="600" height="361" /><p id="caption-attachment-32921" class="wp-caption-text">Figure 2: The CloudWatch alarm is in the ALARM state due to an error encountered in the last five minutes</p></div><p>In this example, a KeyError exception was raised when the Lambda function was processing a data file which was missing the required key “object_id” and contained an invalid key. The CloudWatch metric for this Lambda function’s error triggered the CloudWatch alarm. The following is part of the traceback uploaded to CloudWatch Logs.</p><h2>View Lambda function instrumentation</h2><p>There are several methods you can use to generate logs in the CloudWatch embedded metric format. In this solution, the client library for Python is used to generate the logs and send to CloudWatch. The list of available languages that Amazon has created open-sourced client libraries for, and instructions on how to use them, are located <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format_Libraries.html&quot;&gt;here&lt;/a&gt;. Alternatively, you can manually generate the log using the <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format_Specification.html&quot;&gt;specified format </a>and leverage the PutLogEvents API or use the CloudWatch agent to send the embedded metric format logs.</p><p>In the following code snippet, the <code>metric_scope</code> decorator from the <code>aws_embedded_metrics</code> library is used on the Lambda function’s handler to get a metric logger object. The dimension is set to the ingestion data source for the metric, which in our solution would be either <code>vendor_a</code> or <code>vendor_b</code>. The put_metric call adds the Success metric of 1 to the current logger context. Note that when additional dimensions are added, every distinct value will result in a new CloudWatch metric. If the cardinality of a particular value is expected to be high, then you should consider using set_property instead.</p><p>The following screenshot displays the graphed metric of Vendor A’s successful data processing over the span of a week.</p><div id="attachment_32922" class="wp-caption aligncenter c4"><img aria-describedby="caption-attachment-32922" class="wp-image-32922" src="https://d2908q01vomqb2.cloudfront.net/972a67c48192728a34979d9a35164c1295401b71/2022/09/21/couldops_737_3.png&quot; alt="Figure 3: The graphed metric of Vendor A’s successful data processing over the span of a week." width="600" height="442" /><p id="caption-attachment-32922" class="wp-caption-text">Figure 3: The graphed metric of Vendor A’s successful data processing over the span of a week</p></div><p>The following screenshot displays the graphed metric of Vendor B’s successful data processing over the span of two weeks.</p><div id="attachment_32923" class="wp-caption aligncenter c4"><img aria-describedby="caption-attachment-32923" class="wp-image-32923" src="https://d2908q01vomqb2.cloudfront.net/972a67c48192728a34979d9a35164c1295401b71/2022/09/21/couldops_737_4.png&quot; alt="" width="600" height="451" /><p id="caption-attachment-32923" class="wp-caption-text">Figure 4: The graphed metric of Vendor B’s successful data processing over the span of two weeks.</p></div><p>The following is the log output from CloudWatch Logs for the data processing Lambda function which successfully ran. You can see the format and additional context injected into the log from the embedded metric format client library.</p><p>CloudWatch Logs Insights enables you to interactively search and analyze your log data in CloudWatch Logs. The following example shows the daily total count of successfully processed files from vendor_a over the span of a week. See this <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html&quot;&gt;page&lt;/a&gt; for additional information on CloudWatch Logs Insights.</p><div id="attachment_32924" class="wp-caption aligncenter c4"><img aria-describedby="caption-attachment-32924" class="wp-image-32924" src="https://d2908q01vomqb2.cloudfront.net/972a67c48192728a34979d9a35164c1295401b71/2022/09/21/couldops_737_5.png&quot; alt="Figure 5: Query interface for CloudWatch Logs Insights displaying results for Vendor A ingestion over the span of one week." width="600" height="535" /><p id="caption-attachment-32924" class="wp-caption-text">Figure 5: Query interface for CloudWatch Logs Insights displaying results for Vendor A ingestion over the span of one week</p></div><h2>View the Lambda function for monitoring ingestion within expected timeframes</h2><p>This is the expected timeframe for successful file ingestion for each vendor:</p><ul><li>Vendor A = at least one file ingested in the past 1 hour</li><li>Vendor B = at least one file ingested in past 24 hours</li></ul><p>The Data Processing Function in the previous section emits the Success metric after each successful run. The Timeliness Checker Function runs every hour, queries CloudWatch for each vendor’s Success metric statistics for their expected timeframes, and emits a Timeliness metric of 1 or 0 with 1 indicating that at least one file has been successfully ingested in the timeframe.</p><p>The following is the call to CloudWatch to get the Success metric statistics, which contains an array of “Datapoints” that shows whether or not at least one successful file ingestion has occurred in the expected timeframe.</p><p>In the following code snippet, the metric logger is created and used by the function which emits the Timeliness metric. The dimension is set to the ingestion data source which would be either <code>vendor_a</code> or <code>vendor_b</code>. Since the two vendors have different expected timeframes for file ingestion, a property is added to include the timeframe in the logger context. The put_metric call adds the Timeliness metric of 0 or 1.</p><p>The following screenshot displays the graphed Timeliness metric for Vendor A over the timespan of almost two days. From the graph, you can see the periods of time in which the data ingestion process for Vendor A failed to meet the Timeliness expectation.</p><div id="attachment_32925" class="wp-caption aligncenter c4"><img aria-describedby="caption-attachment-32925" class="wp-image-32925" src="https://d2908q01vomqb2.cloudfront.net/972a67c48192728a34979d9a35164c1295401b71/2022/09/21/couldops_737_6.png&quot; alt="Figure 6: The graphed Timeliness metric for Vendor A over the span of two days." width="600" height="445" /><p id="caption-attachment-32925" class="wp-caption-text">Figure 6: The graphed Timeliness metric for Vendor A over the span of two days</p></div><p>The following screenshot displays the graphed Timeliness metric for Vendor B over the timespan of a week. From the graph, you can see the periods of time in which the data ingestion process for Vendor B failed to meet the Timeliness expectation.</p><div id="attachment_32926" class="wp-caption aligncenter c4"><img aria-describedby="caption-attachment-32926" class="wp-image-32926" src="https://d2908q01vomqb2.cloudfront.net/972a67c48192728a34979d9a35164c1295401b71/2022/09/21/couldops_737_7.png&quot; alt="Figure 7: The graphed Timeliness metric for Vendor B over the span of one week." width="600" height="455" /><p id="caption-attachment-32926" class="wp-caption-text">Figure 7: The graphed Timeliness metric for Vendor B over the span of one week</p></div><p>A CloudWatch alarm is created in the solution for each vendor to alarm when the Timeliness expectation has not been met. In the following you can see the Alarm set off due to a Timeliness metric of 0 for Vendor A.</p><div id="attachment_32927" class="wp-caption aligncenter c4"><img aria-describedby="caption-attachment-32927" class="wp-image-32927" src="https://d2908q01vomqb2.cloudfront.net/972a67c48192728a34979d9a35164c1295401b71/2022/09/21/couldops_737_8.png&quot; alt="Figure 8: Timeliness Alarm in ALARM state due to no successful ingestion activity within expected timeframe. Cleaning up" width="600" height="322" /><p id="caption-attachment-32927" class="wp-caption-text">Figure 8: Timeliness Alarm in ALARM state due to no successful ingestion activity within expected timeframe.</p></div><h2>Cleaning up</h2><p>To avoid incurring further charges, use the following instructions to delete all of the resources created from this solution.</p><p>Run the following command to delete the resources with the SAM CLI:</p><p>Alternatively, use the steps <a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-delete-stack.html&quot;&gt;here&lt;/a&gt; to delete the resources on the <a href="https://aws.amazon.com/console/&quot;&gt;AWS Console</a>.</p><h2>Conclusion</h2><p>This post walks through implementing monitoring and alerting on a data ingestion workload using CloudWatch embedded metric format. We demonstrate an approach on monitoring data ingestion tasks with CloudWatch metrics and alarms and monitoring file ingestion activity within an expected timeframe.</p><p>For more information on the CloudWatch Logs embedded metric format, visit the <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format.html&quot;&gt;service documentation</a> and related <a href="https://aws.amazon.com/blogs/mt/enhancing-workload-observability-using-amazon-cloudwatch-embedded-metric-format/&quot;&gt;AWS post</a>.</p><p>For more granular monitoring, consider creating additional custom metrics measuring object attributes, such as object size or row count, and then create a CloudWatch alarm that uses anomaly detection to identify outliers.</p><p><strong>About the authors:</strong></p>

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

数据摄入 Amazon CloudWatch AWS Lambda 监控警报
相关文章