未知数据源 2024年10月02日
Access Bitcoin and Ethereum open datasets for cross-chain analytics
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了AWS推出的可公开使用的比特币和以太坊区块链数据集,以及用于运行跨链分析的开源解决方案。这些数据集虽处于实验阶段,但其可让用户直接访问数据,无需操作专用全节点或构建复杂的摄取管道。文中还提到了数据的提取、转换和加载架构,以及如何在AWS环境中使用多种工具进行访问和分析。

🎯AWS推出比特币和以太坊区块链数据集供公众使用,这些数据集有助于Web3建设者解决跨链数据访问和分析的难题,其中包含了大量的交易、信息共享及智能合约部署等相关信息。

💻为了方便数据访问和加快分析速度,部署了一种架构将区块链数据提取、转换并加载到列存储格式中。用户可将数据按日期分区加载到AWS环境,使用Amazon Athena或Amazon Redshift等服务进行高效查询。

📄目前提供的比特币和以太坊区块链数据在公共Amazon S3桶中的文件夹结构有所不同,各自包含了特定的内容,如blocks、transactions等,且Parquet文件的模式在文档中有详细说明。

📊在AWS上,用户可利用多种工具访问和分析这些数据集,还提供了Jupyter notebooks示例,展示如何进行跨链分析及如何将区块链数据与市场趋势结合进行基础链上分析。

<p><em>In this post, we share an open-source solution for running cross-chain analytics on public blockchain data along with public datasets for Bitcoin and Ethereum available through AWS Open Data. These datasets are still experimental and are not recommended for production workloads. You can find the open-source project on GitHub <a href="https://github.com/aws-samples/digital-assets-examples/blob/main/analytics/README.md&quot; target="_blank" rel="noopener noreferrer">here</a> and the public blockchain datasets <a href="https://registry.opendata.aws/aws-public-blockchain/&quot; target="_blank" rel="noopener noreferrer">here</a>.</em></p><p>Today, AWS launches accessible Bitcoin and Ethereum blockchain datasets for public use. With the increase of Web3 activity around the world, more and more data is hosted on public blockchains. Although these blockchains are public, accessing and analyzing data across multiple chains continues to be a challenge for Web3 builders. TBs of data sit on these blockchains as users transact tokens, share information, and deploy smart contracts. However, querying these distributed ledgers directly is time consuming, inefficient, and unsuited for analytics.</p><p>Blocks on each chain contain information about transactions across the network. This includes public keys and addresses where tokens were exchanged, transaction volume and times, and metadata that highlights mining difficulty, network hash rates, available supply. Additionally, these blockchains are often used to host metadata that doesn’t impact or affect the transfer of tokens on that particular network. A growing number of distributed applications embed metadata in other blockchains to validate ownership of assets beyond cryptocurrency. The growing NFT market also has a wealth of metadata, ripe for exploration and analysis.</p><p>Each distributed ledger is designed in a unique way and uses different technology stacks and consensus algorithms. The public blockchain datasets allow you to have immediate access to this data without operating dedicated full nodes for the different blockchains and without building complicated ingestion pipelines. In addition, these datasets normalize data into tabular data structures and you can instantly access years worth of data across chains in a format that can be easily analyzed and queried by data scientists and other analytics professionals.</p><h2>Solution overview</h2><p>For these datasets we deployed an architecture to extract, transform, and load blockchain data into a column-oriented storage format that allows for easy access and expedited analysis.</p><p>You can load these files partitioned by date into your AWS environment and use AWS services like <a href="http://aws.amazon.com/athena&quot; target="_blank" rel="noopener noreferrer">Amazon Athena</a> or <a href="https://aws.amazon.com/pm/redshift/&quot; target="_blank" rel="noopener noreferrer">Amazon Redshift</a> on top of this data to query it efficiently with SQL.</p><p>The following architecture diagram shows which AWS services are used to extract the data from the public blockchains and how it is delivered to <a href="https://aws.amazon.com/s3/&quot; target="_blank" rel="noopener noreferrer">Amazon Simple Storage Service</a> (Amazon S3). You can also see which AWS services can be utilized to access this data from the public Amazon S3 bucket.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/16/DBBLOG-2500-image001.png&quot; target="_blank" rel="noopener noreferrer"><img class="alignnone wp-image-24494 size-full" src="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/16/DBBLOG-2500-image001.png&quot; alt="" width="1329" height="663" /></a></p><p>After taking an initial download of the full blockchain from the first block in 2009 for Bitcoin and in 2015 for Ethereum, an on-chain listener continuously delivers new data to the public Amazon S3 bucket that provides the open datasets. The blockchain data is then transformed into multiple tables as compressed Parquet files partitioned by date to allow efficient access for most common analytics queries.</p><p>The following folder structure is currently provided for Bitcoin and Ethereum blockchain data in the public Amazon S3 bucket.</p><p><strong>Bitcoin: s3://aws-public-blockchain/v1.0/btc/</strong></p><ul><li><strong>blocks</strong>/date={YYYY-MM-DD}/{id}.snappy.parquet</li><li><strong>transactions</strong>/date={YYYY-MM-DD}/{id}.snappy.parquet</li></ul><p><strong>Ethereum: s3://aws-public-blockchain/v1.0/eth/</strong></p><ul><li><strong>blocks</strong>/date={YYYY-MM-DD}/{id}.snappy.parquet</li><li><strong>transactions</strong> /date={YYYY-MM-DD}/{id}.snappy.parquet</li><li><strong>logs</strong>/date={YYYY-MM-DD}/{id}.snappy.parquet</li><li><strong>token_transfers</strong>/date={YYYY-MM-DD}/{id}.snappy.parquet</li><li><strong>traces</strong>/date={YYYY-MM-DD}/{id}.snappy.parquet</li><li><strong>contracts</strong>/date={YYYY-MM-DD}/{id}.snappy.parquet</li></ul><p>The schema of the Parquet files is documented for each table and field <a href="https://github.com/aws-samples/digital-assets-examples/blob/main/analytics/schema.md&quot; target="_blank" rel="noopener noreferrer">here</a>. Currently, we provide the historical block and transaction data for both chains and some additional tables for Ethereum that are most commonly used for queries.</p><h2>How to use this data?</h2><p>On AWS, you can take advantage of multiple tools to access and analyze these datasets. Parquet files in Amazon S3 can be directly queried in <a href="https://aws.amazon.com/athena/&quot; target="_blank" rel="noopener noreferrer">Amazon Athena</a> or <a href="https://aws.amazon.com/redshift/&quot; target="_blank" rel="noopener noreferrer">Amazon Redshift</a>. In addition, we provide Jupyter notebooks <a href="https://github.com/aws-samples/digital-assets-examples/blob/main/analytics/notebooks.md&quot; target="_blank" rel="noopener noreferrer">here</a> for <a href="https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html&quot; target="_blank" rel="noopener noreferrer">Amazon SageMaker Studio</a> that demonstrate how to perform cross-chain analytics and how to combine blockchain data with market trends for fundamental on-chain analytics.</p><p>Before you can run these examples, you need to deploy the following <a href="https://aws.amazon.com/cloudformation/&quot; target="_blank" rel="noopener noreferrer">AWS CloudFormation</a> template. This template sets up <a href="https://aws.amazon.com/glue/&quot; target="_blank" rel="noopener noreferrer">AWS Glue</a> Data Catalog, Amazon Athena Workgroup with a S3 bucket for the query results, and <a href="https://aws.amazon.com/lambda/&quot; target="_blank" rel="noopener noreferrer">AWS Lambda</a> functions to keep partitions up-to-date:</p><p><a href="https://console.aws.amazon.com/cloudformation/home?region=us-east-2#/stacks/new?stackName=aws-public-blockchain&amp;amp;templateURL=https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/DBBLOG-2500/aws-public-blockchain.yaml&quot; target="_blank" rel="noopener noreferrer"><img class="alignnone wp-image-2960 size-full" src="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2018/03/20/Launch-Stack.jpg&quot; alt="" width="144" height="27" /></a></p><p><strong>Example 1)</strong> <em>Tell me the “birth block” for my child</em></p><p>Bitcoin has been around since January 2009 and Ethereum since July 2015. On average, a new Bitcoin block is created every ten minutes. New Ethereum blocks are created every 12-14 seconds. Every block has a time stamp, and we can identify the closest block for any given time in the last 13 years. In our example, we pick a birth time of 2016-01-14 18:23 UTC.</p><p>The following screenshot shows the SQL query in Amazon Athena and output of block 393,323 for Bitcoin and block 848,182 for Ethereum.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/DBBLOG-2500-image003.png&quot; target="_blank" rel="noopener noreferrer"><img class="alignnone size-full wp-image-24431" src="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/DBBLOG-2500-image003.png&quot; alt="" width="1285" height="770" /></a></p><p><strong>Example 2)</strong> <em>Show me the largest stablecoin transactions</em></p><p>Transactions for Ethereum-based stablecoins are captured in the table “token_transfer” in the Ethereum dataset and each stablecoin has a unique token address.</p><p>The following screenshot shows how to query the largest transactions for a specific stablecoin in the query editor of Amazon Redshift.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/DBBLOG-2500-image005.png&quot; target="_blank" rel="noopener noreferrer"><img class="alignnone size-full wp-image-24432" src="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/DBBLOG-2500-image005.png&quot; alt="" width="1488" height="712" /></a></p><p><strong>Example 3)</strong> <em>Show me the weekly BTC transaction volume in USD</em></p><p>To calculate the transaction volume in USD, we also need historical price data for public blockchains. In our GitHub project, we provide a sample Jupyter notebook that pulls prices from a public crypto exchange. Once the market data is loaded, we can combine this data with the public blockchain data and visualize it in a chart. This analysis can help to better understand network adoption changes over time.</p><p>The following screenshot shows the example Jupyter notebook for this query in Amazon SageMaker Studio and the output as a chart.</p><p><a href="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/DBBLOG-2500-image007.png&quot; target="_blank" rel="noopener noreferrer"><img class="alignnone size-full wp-image-24433" src="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/DBBLOG-2500-image007.png&quot; alt="" width="1323" height="928" /></a></p><h2>Clean up</h2><p>Don’t forget to remove the resources created if you don’t plan to use them anymore.</p><p>Empty the S3 bucket created by the stack and <a href="https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-delete-stack.html&quot; target="_blank" rel="noopener noreferrer">delete the CloudFormation stack</a>.</p><h2>Conclusion</h2><p>These publicly available blockchain datasets can be used to jumpstart your projects, making it easier to get started working with Blockchain data. You can deploy the underlying network of nodes using services like <a href="https://aws.amazon.com/managed-blockchain/&quot; target="_blank" rel="noopener noreferrer">Amazon Managed Blockchain</a> to submit transactions and access real-time data directly from the node, but if you’re not deploying your own nodes or blockchain protocols, you can use these datasets to create an analytics layer that sits on top of these blockchains to extracts insights from the underlying data. You can use transaction volume, time, mining difficulty, and hash rates to identify trends and observations across multiple chains. You can also take advantage of the unique metadata within each blockchain, running analyses on unstructured fields.</p><p>If you are looking for a real-time ingestion pipeline from these networks, you can deploy the open-source solution in your own AWS account. This allows you also to create your own data repositories with finer controls for your data access requirements as your application scales and more users use your platform. For customers interested in production grade reliability, real-time access to Blockchain data or other advanced Blockchain data query needs, please contact us at aws-public-blockchain@amazon.com to connect and discuss your use case.</p><p>As other blockchains become more widely used, the open-source architecture can be adapted to other blockchains in the ecosystem. Any protocols developed using ERC-20 or ERC-721 can be easily supported because they use the same Ethereum protocol that has already been established in the open datasets. The same extensibility exists for tokens that are forks or variants of Bitcoin.</p><p>As we plan to extend this solution, let us know how we can help you with your use cases and improve our open source solution and open datasets.</p><p>Learn more about <a href="https://aws.amazon.com/managed-blockchain/&quot; target="_blank" rel="noopener noreferrer">Amazon Managed Blockchain</a> and how you can share and access petabytes of open data through <a href="https://opendata.aws&quot; target="_blank" rel="noopener noreferrer">AWS Open Data</a>.</p><h3>About the authors</h3><p class="c4"><a href="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/olli.png&quot;&gt;&lt;img class="size-full wp-image-24440 alignleft" src="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/olli.png&quot; alt="" width="100" height="133" /></a> <strong>Oliver Steffmann</strong> is a Principal Solutions Architect at AWS based in New York and is passionate about public blockchain use cases. He has over 20 years of experience working with financial institutions and helps his customers get their cloud transformation off the ground. Outside of work he enjoys spending time with his family and training for the next Ironman.</p><p class="c4"><a href="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/stfndckr.png&quot;&gt;&lt;img class="size-full wp-image-24441 alignleft" src="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/stfndckr.png&quot; alt="" width="100" height="100" /></a><strong>Stefan Dicker</strong> works in AWS’ Startup Business Development function focused on supporting the growing Venture Studio ecosystem. He first started working with Blockchain startups in 2015 and helped execute the world’s first blockchain-based trade-finance deal. Outside of work you can find him on the ultimate field, gardening or geeking out on the latest in technical innovations.</p><p class="c4"><a href="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/baskar-rav.png&quot;&gt;&lt;img class="size-full wp-image-24438 alignleft" src="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/baskar-rav.png&quot; alt="" width="100" height="100" /></a><strong>Bhaskar Ravat</strong> is a Senior Solutions Architect at AWS based in New York and is passionate about public blockchain use case and technology landscape including Ethereum, Web3 and Defi. You can find him reading 4 books at time when not helping or building solutions for customers.</p><p class="c4"><a href="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/gsreeji.jpg&quot;&gt;&lt;img class="size-full wp-image-24439 alignleft" src="https://d2908q01vomqb2.cloudfront.net/887309d048beef83ad3eabf2a79a64a389ab1c9f/2022/09/13/gsreeji.jpg&quot; alt="" width="100" height="125" /></a><strong>Sreeji Gopal</strong> is a Data Lake Architect in Big Data at AWS. He helps customers create a meaningful experience with data analytics focused on customers’ vision on products/services. Sreeji is a CrossFit enthusiast and likes to spend his free time with family, hiking, and traveling.</p>

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

区块链数据集 跨链分析 AWS服务 数据访问
相关文章