AWS Blogs 07月16日 07:40
Amazon S3 Metadata now supports metadata for all your S3 objects
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Amazon S3 Metadata现在提供对您所有现有对象的元数据支持,扩展了之前仅对新对象和更改提供可见性的功能。通过此扩展,您可以分析和查询整个S3存储足迹的元数据。许多客户依赖Amazon S3来大规模存储非结构化数据,为了了解存储桶中的内容,通常需要构建和维护自定义系统来扫描对象、跟踪更改和管理元数据。这些系统维护成本高且难以随着数据增长而更新。自2024年re:Invent大会推出以来,您已经能够使用元数据表查询新更新的对象元数据,而不是依赖Amazon S3库存或对象级API(如ListObjects、HeadObject和GetObject),这些API可能会引入延迟并影响下游工作流。S3 Metadata引入了实时库存表,可与熟悉的基于SQL的工具一起使用。在现有对象回填到系统后,任何更新(如上传或删除)通常在一小时内在您的实时库存表中出现。S3 Metadata实时库存表为您提供完整的Apache Iceberg表,提供您存储桶中对象及其元数据的完整和当前快照,包括现有对象,这得益于回填支持。这些表在更改(如上传或删除)后自动在一小时内刷新,因此您始终保持最新状态。您可以使用它们来识别具有特定属性的对象(如未加密数据、缺失标签或特定存储类),并支持分析、成本优化、审计和治理。S3 Metadata日志表以前称为S3 Metadata表,在您配置实时库存表时自动启用,提供对存储桶中对象级更改的近乎实时视图,包括上传、删除和元数据更新。这些表非常适合审计活动、跟踪对象的生命周期并生成事件驱动的见解。

😊Amazon S3 Metadata现在提供对您所有现有对象的元数据支持,扩展了之前仅对新对象和更改提供可见性的功能,使您能够分析和查询整个S3存储足迹的元数据,从而更好地管理和理解您的数据。

📊S3 Metadata引入了实时库存表,可与熟悉的基于SQL的工具一起使用,提供您存储桶中对象及其元数据的完整和当前快照,包括现有对象,这得益于回填支持,使您能够快速发现数据并优化工作流程。

🔍S3 Metadata日志表以前称为S3 Metadata表,在您配置实时库存表时自动启用,提供对存储桶中对象级更改的近乎实时视图,包括上传、删除和元数据更新,使您能够跟踪对象的生命周期并生成事件驱动的见解。

💡这些新表帮助避免在处理之前等待元数据发现,非常适合大规模分析和机器学习工作负载,通过提前查询元数据,您可以更有效地安排GPU作业并减少计算密集型环境中的空闲时间。

<section class="blog-post-content lb-rtxt"><table><tbody><tr><td><p></p></td></tr></tbody></table><p><a href="https://aws.amazon.com/s3/features/metadata/&quot;&gt;Amazon S3 Metadata</a> now provides complete visibility into all your existing objects in your <a href="https://aws.amazon.com/s3/&quot;&gt;Amazon Simple Storage Service (Amazon S3)</a> buckets, expanding beyond new objects and changes. With this expanded coverage, you can analyze and query metadata for your entire S3 storage footprint.</p><p>Today, many customers rely on Amazon S3 to store unstructured data at scale. To understand what’s in a bucket, you often need to build and maintain custom systems that scan for objects, track changes, and manage metadata over time. These systems are expensive to maintain and hard to keep up to date as data grows.</p><p>Since <a href="https://aws.amazon.com/blogs/aws/introducing-queryable-object-metadata-for-amazon-s3-buckets-preview/&quot;&gt;the launch of S3 Metadata at re:Invent 2024</a>, you’ve been able to query new and updated object metadata using metadata tables instead of relying on <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html&quot;&gt;Amazon S3 Inventory</a> or object-level APIs such as <code>ListObjects</code>, <code>HeadObject</code>, and <code>GetObject</code>—which can introduce latency and impact downstream workflows.</p><p>To make it easier for you to work with this expanded metadata, S3 Metadata introduces live inventory tables that work with familiar SQL-based tools. After your existing objects are backfilled into the system, any updates like uploads or deletions typically appear within an hour in your live inventory tables.</p><p>With <strong>S3 Metadata live inventory tables</strong>, you get a fully managed Apache Iceberg table that provides a complete and current snapshot of the objects and their metadata in your bucket, including existing objects, thanks to backfill support. These tables are refreshed automatically within an hour of changes such as uploads or deletions, so you stay up to date. You can use them to identify objects with specific properties—like unencrypted data, missing tags, or particular storage classes—and to support analytics, cost optimization, auditing, and governance.</p><p><strong>S3 Metadata journal tables</strong>, previously known as <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/metadata-tables-overview.html&quot;&gt;S3 Metadata tables</a>, are automatically enabled when you configure live inventory tables, provide a near real-time view of object-level changes in your bucket—including uploads, deletions, and metadata updates. These tables are ideal for auditing activity, tracking the lifecycle of objects, and generating event-driven insights. For example, you can use them to find out which objects were deleted in the past 24 hours, identify the requester making the most <code>PUT</code> operations, or monitor updates to object metadata over time.</p><p>S3 Metadata tables are created in a namespace name that is similar to your bucket name for easier discovery. The tables are stored in AWS system table buckets, grouped by account and <a href="https://docs.aws.amazon.com/glossary/latest/reference/glos-chap.html#region&quot;&gt;Region&lt;/a&gt;. After you enable S3 Metadata for a general purpose S3 bucket, the system creates and maintains these tables for you. You don’t need to manage compaction or garbage collection processes—<a href="https://aws.amazon.com/s3/features/tables/&quot;&gt;S3 Tables</a> takes care of <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-maintenance.html&quot;&gt;table maintenance</a> tasks in the background.</p><p>These new tables help avoid waiting for metadata discovery before processing can begin, making them ideal for large-scale analytics and <a href="https://aws.amazon.com/ai/machine-learning/&quot;&gt;machine learning (ML)</a> workloads. By querying metadata ahead of time, you can schedule GPU jobs more efficiently and reduce idle time in compute-intensive environments.</p><p><strong>Let’s see how it works<br /></strong> To see how this works in practice, I configure S3 Metadata for a general purpose bucket using the <a href="https://console.aws.amazon.com&quot;&gt;AWS Management Console</a>.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/03/2025-07-03_09-39-10.png&quot;&gt;&lt;img class="aligncenter size-full wp-image-97696" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/03/2025-07-03_09-39-10.png&quot; alt="S3 Metadata, start from general purpose bucket" width="800" height="523" /></a></p><p>After choosing a general purpose bucket, I choose the <strong>Metadata</strong> tab, then I choose <strong>Create metadata configuration</strong>.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_12-23-39.png&quot;&gt;&lt;img class="aligncenter wp-image-97901 size-full" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_12-23-39.png&quot; alt="S3 Metadata, configure journal and inventory table" width="800" height="822" /></a>For <strong>Journal table</strong>, I can choose the <strong>Server-side encryption</strong> option and the <strong>Record expiration</strong> period. For <strong>Live Inventory table</strong>, I choose <strong>Enabled</strong> and I can select the <strong>Server-side encryption</strong> options.</p><p>I configure <strong>Record expiration</strong> on the journal table. Journal table records expire after the specified number of days, 365 days (one year) in my example.</p><p>Then, I choose <strong>Create metadata configuration</strong>.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_12-25-18.png&quot;&gt;&lt;img class="aligncenter wp-image-97902 size-full" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_12-25-18.png&quot; alt="S3 Metadata, backfilling" width="800" height="504" /></a></p><p>S3 Metadata creates the live inventory table and journal table. In the <strong>Live Inventory table</strong> section, I can observe the <strong>Table status</strong>: the system immediately starts to <strong>backfill</strong> the table with existing object metadata. It can take between minutes to hours. The exact time depends on the quantity of objects you have in your S3 bucket.</p><p>While waiting, I also upload and delete objects to generate data in the journal table.</p><p>Then, I navigate to <a href="https://aws.amazon.com/athena&quot;&gt;Amazon Athena</a> to start querying the new tables.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_16-10-21.png&quot;&gt;&lt;img class="aligncenter wp-image-97914 size-full" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_16-10-21.png&quot; alt="S3 Metadata, query with Athena" width="510" height="383" /></a></p><p>I choose <strong>Query table with Athena</strong> to start querying the table. I can choose between a couple of default queries on the console.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/03/2025-07-03_10-35-22.png&quot;&gt;&lt;img class="aligncenter size-full wp-image-97700" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/03/2025-07-03_10-35-22.png&quot; alt="S3 MetaData table structure" width="392" height="719" /></a></p><p>In Athena, I observe the structure of the tables in the <strong>AWSDataCatalog</strong> <strong>Data source</strong> and I start with a short query to check how many records are available in the journal table. I already have 6,488 entries:</p><pre class="lang-sql">SELECT count() FROM "b_aws_news_blog_metadata_inventory_ns"."journal";# _col01 6488</pre><p>Here are a couple of example queries I tried on the journal table:</p><pre class="lang-sql"># Query deleted objects in last 24 hours# Use is_delete_marker=true for versioned buckets and record_type='DELETE' otherwiseSELECT bucket, key, version_id, last_modified_dateFROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."journal"WHERE last_modified_date &gt;= (current_date - interval '1' day) AND is_delete_marker = true;# bucket key version_id last_modified_date is_delete_marker1 aws-news-blog-metadata-inventory .build/index-build/arm64-apple-macosx/debug/index/store/v5/records/G0/NSURLSession.h-JET61D329FG0 2 aws-news-blog-metadata-inventory .build/index-build/arm64-apple-macosx/debug/index/store/v5/records/G5/cdefs.h-PJ21EUWKMWG5 3 aws-news-blog-metadata-inventory .build/index-build/arm64-apple-macosx/debug/index/store/v5/records/FX/buf.h-25EDY57V6ZXFX 4 aws-news-blog-metadata-inventory .build/index-build/arm64-apple-macosx/debug/index/store/v5/records/G6/NSMeasurementFormatter.h-3FN8J9CLVMYG6 5 aws-news-blog-metadata-inventory .build/index-build/arm64-apple-macosx/debug/index/store/v5/records/G8/NSXMLDocument.h-1UO2NUJK0OAG8 # Query recent PUT requests IP addressesSELECT source_ip_address, count(source_ip_address)FROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."journal"GROUP BY source_ip_address;# source_ip_address _col11 my_laptop_IP_address 12488# Query S3 Lifecycle expired objects in last 7 daysSELECT bucket, key, version_id, last_modified_date, record_timestampFROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."journal"WHERE requester = 's3.amazonaws.com' AND record_type = 'DELETE' AND record_timestamp &gt; (current_date - interval '7' day);(not applicable to my demo bucket)</pre><p>The results helped me track the specific objects that were removed, including their timestamps.</p><p>Now, I look at the live inventory table:</p><pre class="lang-sql"># Distribution of object tagsSELECT object_tags, count(object_tags)FROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."inventory"GROUP BY object_tags;# object_tags _col11 {Source=Swift} 12 {Source=swift} 13 {} 12486# Query storage class and size for specific tagsSELECT storage_class, count() as count, sum(size) / 1024 / 1024 as usageFROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."inventory"GROUP BY object_tags['pii=true'], storage_class;# storage_class count usage1 STANDARD 124884 165# Find objects with specific user defined metadataSELECT key, last_modified_date, user_metadataFROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."inventory"WHERE cardinality(user_metadata) &gt; 0 ORDER BY last_modified_date DESC;(not applicable to my demo bucket)</pre><p>These are just a few examples of what is possible with S3 Metadata. Your preferred queries will depend on your use cases. Refer to <a href="https://aws.amazon.com/blogs/storage/analyzing-amazon-s3-metadata-with-amazon-athena-and-amazon-quicksight/&quot;&gt;Analyzing Amazon S3 Metadata with Amazon Athena and Amazon QuickSight</a> in the <a href="https://aws.amazon.com/blogs/storage/&quot;&gt;AWS Storage Blog</a> for more examples.</p><p><strong>Pricing and availability<br /></strong> S3 Metadata live inventory and journal tables are available today in US East (Ohio, N. Virginia) and US West (N. California).</p><p>The journal tables are charged $0.30 per million updates. This is a 33 percent drop from our previous price.</p><p>For inventory tables, there’s a one-time backfill cost of $0.30 for a million objects to set up the table and generate metadata for existing objects. There are no additional costs if your bucket has less than one billion objects. For buckets with more than a billion objects, there is a monthly fee of $0.10 per million objects per month.</p><p>As usual, the <a href="https://aws.amazon.com/s3/pricing/&quot;&gt;Amazon S3 pricing page</a> has all the details.</p><p>With S3 Metadata live inventory and journal tables, you can reduce the time and effort required to explore and manage large datasets. You get an up-to-date view of your storage and a record of changes, and both are available as Iceberg tables you can query on demand. You can discover data faster, power compliance workflows, and optimize your ML pipelines.</p><p>You can get started by enabling metadata inventory on your S3 bucket through the AWS console, <a href="https://aws.amazon.com/cli/&quot;&gt;AWS Command Line Interface (AWS CLI)</a>, or <a href="https://aws.amazon.com/tools/&quot;&gt;AWS SDKs</a>. When they’re enabled, the journal and live inventory tables are automatically created and updated. To learn more, visit the <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html&quot;&gt;S3 Metadata Documentation page</a>.</p><a href="https://linktr.ee/sebsto&quot;&gt;— seb</a></section><aside class="blog-comments"><div data-lb-comp="aws-blog:cosmic-comments" data-env="prod" data-content-id="0da9bfe4-3fd7-49dc-b5ef-e6c25c05bb4d" data-title="Amazon S3 Metadata now supports metadata for all your S3 objects" data-url="https://aws.amazon.com/blogs/aws/amazon-s3-metadata-now-supports-metadata-for-all-your-s3-objects/&quot;&gt;&lt;p data-failed-message="Comments cannot be loaded… Please refresh and try again.">Loading comments…</p></div></aside>

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon S3 Metadata 元数据管理 实时库存 日志表 数据分析
相关文章