Amazon S3 Metadata now supports metadata for all your S3 objects

<section class="blog-post-content lb-rtxt"><table><tbody><tr><td></td></tr></tbody></table><a href="https://aws.amazon.com/s3/features/metadata/">Amazon S3 Metadata</a> now provides complete visibility into all your existing objects in your <a href="https://aws.amazon.com/s3/">Amazon Simple Storage Service (Amazon S3)</a> buckets, expanding beyond new objects and changes. With this expanded coverage, you can analyze and query metadata for your entire S3 storage footprint.Today, many customers rely on Amazon S3 to store unstructured data at scale. To understand what’s in a bucket, you often need to build and maintain custom systems that scan for objects, track changes, and manage metadata over time. These systems are expensive to maintain and hard to keep up to date as data grows.Since <a href="https://aws.amazon.com/blogs/aws/introducing-queryable-object-metadata-for-amazon-s3-buckets-preview/">the launch of S3 Metadata at re:Invent 2024</a>, you’ve been able to query new and updated object metadata using metadata tables instead of relying on <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html">Amazon S3 Inventory</a> or object-level APIs such as <code>ListObjects</code>, <code>HeadObject</code>, and <code>GetObject</code>—which can introduce latency and impact downstream workflows.To make it easier for you to work with this expanded metadata, S3 Metadata introduces live inventory tables that work with familiar SQL-based tools. After your existing objects are backfilled into the system, any updates like uploads or deletions typically appear within an hour in your live inventory tables.With S3 Metadata live inventory tables, you get a fully managed Apache Iceberg table that provides a complete and current snapshot of the objects and their metadata in your bucket, including existing objects, thanks to backfill support. These tables are refreshed automatically within an hour of changes such as uploads or deletions, so you stay up to date. You can use them to identify objects with specific properties—like unencrypted data, missing tags, or particular storage classes—and to support analytics, cost optimization, auditing, and governance.S3 Metadata journal tables, previously known as <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/metadata-tables-overview.html">S3 Metadata tables</a>, are automatically enabled when you configure live inventory tables, provide a near real-time view of object-level changes in your bucket—including uploads, deletions, and metadata updates. These tables are ideal for auditing activity, tracking the lifecycle of objects, and generating event-driven insights. For example, you can use them to find out which objects were deleted in the past 24 hours, identify the requester making the most <code>PUT</code> operations, or monitor updates to object metadata over time.S3 Metadata tables are created in a namespace name that is similar to your bucket name for easier discovery. The tables are stored in AWS system table buckets, grouped by account and <a href="https://docs.aws.amazon.com/glossary/latest/reference/glos-chap.html#region">Region</a>. After you enable S3 Metadata for a general purpose S3 bucket, the system creates and maintains these tables for you. You don’t need to manage compaction or garbage collection processes—<a href="https://aws.amazon.com/s3/features/tables/">S3 Tables</a> takes care of <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables-maintenance.html">table maintenance</a> tasks in the background.These new tables help avoid waiting for metadata discovery before processing can begin, making them ideal for large-scale analytics and <a href="https://aws.amazon.com/ai/machine-learning/">machine learning (ML)</a> workloads. By querying metadata ahead of time, you can schedule GPU jobs more efficiently and reduce idle time in compute-intensive environments.Let’s see how it works To see how this works in practice, I configure S3 Metadata for a general purpose bucket using the <a href="https://console.aws.amazon.com">AWS Management Console</a>.<a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/03/2025-07-03_09-39-10.png"><img class="aligncenter size-full wp-image-97696" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/03/2025-07-03_09-39-10.png" alt="S3 Metadata, start from general purpose bucket" width="800" height="523" /></a>After choosing a general purpose bucket, I choose the Metadata tab, then I choose Create metadata configuration.<a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_12-23-39.png"><img class="aligncenter wp-image-97901 size-full" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_12-23-39.png" alt="S3 Metadata, configure journal and inventory table" width="800" height="822" /></a>For Journal table, I can choose the Server-side encryption option and the Record expiration period. For Live Inventory table, I choose Enabled and I can select the Server-side encryption options.I configure Record expiration on the journal table. Journal table records expire after the specified number of days, 365 days (one year) in my example.Then, I choose Create metadata configuration.<a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_12-25-18.png"><img class="aligncenter wp-image-97902 size-full" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_12-25-18.png" alt="S3 Metadata, backfilling" width="800" height="504" /></a>S3 Metadata creates the live inventory table and journal table. In the Live Inventory table section, I can observe the Table status: the system immediately starts to backfill the table with existing object metadata. It can take between minutes to hours. The exact time depends on the quantity of objects you have in your S3 bucket.While waiting, I also upload and delete objects to generate data in the journal table.Then, I navigate to <a href="https://aws.amazon.com/athena">Amazon Athena</a> to start querying the new tables.<a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_16-10-21.png"><img class="aligncenter wp-image-97914 size-full" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/08/2025-07-08_16-10-21.png" alt="S3 Metadata, query with Athena" width="510" height="383" /></a>I choose Query table with Athena to start querying the table. I can choose between a couple of default queries on the console.<a href="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/03/2025-07-03_10-35-22.png"><img class="aligncenter size-full wp-image-97700" src="https://d2908q01vomqb2.cloudfront.net/da4b9237bacccdf19c0760cab7aec4a8359010b0/2025/07/03/2025-07-03_10-35-22.png" alt="S3 MetaData table structure" width="392" height="719" /></a>In Athena, I observe the structure of the tables in the AWSDataCatalog Data source and I start with a short query to check how many records are available in the journal table. I already have 6,488 entries:<pre class="lang-sql">SELECT count() FROM "b_aws_news_blog_metadata_inventory_ns"."journal";# _col01 6488</pre>Here are a couple of example queries I tried on the journal table:<pre class="lang-sql"># Query deleted objects in last 24 hours# Use is_delete_marker=true for versioned buckets and record_type='DELETE' otherwiseSELECT bucket, key, version_id, last_modified_dateFROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."journal"WHERE last_modified_date >= (current_date - interval '1' day) AND is_delete_marker = true;# bucket key version_id last_modified_date is_delete_marker1 aws-news-blog-metadata-inventory .build/index-build/arm64-apple-macosx/debug/index/store/v5/records/G0/NSURLSession.h-JET61D329FG0 2 aws-news-blog-metadata-inventory .build/index-build/arm64-apple-macosx/debug/index/store/v5/records/G5/cdefs.h-PJ21EUWKMWG5 3 aws-news-blog-metadata-inventory .build/index-build/arm64-apple-macosx/debug/index/store/v5/records/FX/buf.h-25EDY57V6ZXFX 4 aws-news-blog-metadata-inventory .build/index-build/arm64-apple-macosx/debug/index/store/v5/records/G6/NSMeasurementFormatter.h-3FN8J9CLVMYG6 5 aws-news-blog-metadata-inventory .build/index-build/arm64-apple-macosx/debug/index/store/v5/records/G8/NSXMLDocument.h-1UO2NUJK0OAG8 # Query recent PUT requests IP addressesSELECT source_ip_address, count(source_ip_address)FROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."journal"GROUP BY source_ip_address;# source_ip_address _col11 my_laptop_IP_address 12488# Query S3 Lifecycle expired objects in last 7 daysSELECT bucket, key, version_id, last_modified_date, record_timestampFROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."journal"WHERE requester = 's3.amazonaws.com' AND record_type = 'DELETE' AND record_timestamp > (current_date - interval '7' day);(not applicable to my demo bucket)</pre>The results helped me track the specific objects that were removed, including their timestamps.Now, I look at the live inventory table:<pre class="lang-sql"># Distribution of object tagsSELECT object_tags, count(object_tags)FROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."inventory"GROUP BY object_tags;# object_tags _col11 {Source=Swift} 12 {Source=swift} 13 {} 12486# Query storage class and size for specific tagsSELECT storage_class, count() as count, sum(size) / 1024 / 1024 as usageFROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."inventory"GROUP BY object_tags['pii=true'], storage_class;# storage_class count usage1 STANDARD 124884 165# Find objects with specific user defined metadataSELECT key, last_modified_date, user_metadataFROM "s3tablescatalog/aws-managed-s3"."b_aws_news_blog_metadata_inventory_ns"."inventory"WHERE cardinality(user_metadata) > 0 ORDER BY last_modified_date DESC;(not applicable to my demo bucket)</pre>These are just a few examples of what is possible with S3 Metadata. Your preferred queries will depend on your use cases. Refer to <a href="https://aws.amazon.com/blogs/storage/analyzing-amazon-s3-metadata-with-amazon-athena-and-amazon-quicksight/">Analyzing Amazon S3 Metadata with Amazon Athena and Amazon QuickSight</a> in the <a href="https://aws.amazon.com/blogs/storage/">AWS Storage Blog</a> for more examples.Pricing and availability S3 Metadata live inventory and journal tables are available today in US East (Ohio, N. Virginia) and US West (N. California).The journal tables are charged $0.30 per million updates. This is a 33 percent drop from our previous price.For inventory tables, there’s a one-time backfill cost of $0.30 for a million objects to set up the table and generate metadata for existing objects. There are no additional costs if your bucket has less than one billion objects. For buckets with more than a billion objects, there is a monthly fee of $0.10 per million objects per month.As usual, the <a href="https://aws.amazon.com/s3/pricing/">Amazon S3 pricing page</a> has all the details.With S3 Metadata live inventory and journal tables, you can reduce the time and effort required to explore and manage large datasets. You get an up-to-date view of your storage and a record of changes, and both are available as Iceberg tables you can query on demand. You can discover data faster, power compliance workflows, and optimize your ML pipelines.You can get started by enabling metadata inventory on your S3 bucket through the AWS console, <a href="https://aws.amazon.com/cli/">AWS Command Line Interface (AWS CLI)</a>, or <a href="https://aws.amazon.com/tools/">AWS SDKs</a>. When they’re enabled, the journal and live inventory tables are automatically created and updated. To learn more, visit the <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html">S3 Metadata Documentation page</a>.<a href="https://linktr.ee/sebsto">— seb</a></section><aside class="blog-comments"><div data-lb-comp="aws-blog:cosmic-comments" data-env="prod" data-content-id="0da9bfe4-3fd7-49dc-b5ef-e6c25c05bb4d" data-title="Amazon S3 Metadata now supports metadata for all your S3 objects" data-url="https://aws.amazon.com/blogs/aws/amazon-s3-metadata-now-supports-metadata-for-all-your-s3-objects/">Loading comments…</div></aside>

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签