未知数据源 2024年10月02日
How government agencies can vet external data in minutes with data interchange zones
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

政府服务数字化转型,数据交换面临挑战。文章介绍政府机构如何用AWS构建数据交换区,以安全方式自动摄取和验证数据,包括数据交换面临的挑战、在AWS上构建的方法及各层的具体操作,还提到构建数据交换区的好处及后续了解途径。

政府机构数据交换面临诸多挑战,如接收不同格式和大小的数据集,需要多层数据集审查流程,以及与不同内部平台的无缝集成,为此需建立受保护的数据交换平台及多层严格的数据集审查流程。

在AWS上构建数据交换区,其作为受保护数据湖的公共接口,使外部实体能简单安全地上传或下载经过验证的数据集或报告。该解决方案由用户界面(UI)层、验证层和集成接口层三个关键层组成。

UI层中,外部实体需通过认证和授权流程,先通过Amazon Cognito登录,获取有效JWT令牌,再通过API Gateway调用Lambda函数生成Amazon S3预签名URL来上传数据。

验证和集成接口层中,验证过程从Amazon S3触发的文件上传事件开始,经过恶意软件检查等流程,结果存于DynamoDB,通过DynamoDB Streams更新UI,通过验证的文件会被移动到数据湖进行进一步处理。

<section class="blog-post-content"><p><img class="aligncenter size-full wp-image-17111" src="https://d2908q01vomqb2.cloudfront.net/9e6a55b6b4563e652a23be9d623ca5055c356940/2022/09/29/government-data-interchange-zone-vet-external-data-aws-featured-image.jpg&quot; alt="" width="1536" height="768" /></p><p>The rapid digitalization of government services has transformed the way <a href="https://aws.amazon.com/government-education/government/&quot;&gt;government agencies</a> interact and exchange data with each other and with their communities. These data interchanges can take different forms; one example is how regulatory agencies can provide entities with a simple mechanism to ingest regulatory reporting. However, the manual processes agencies use to exchange data are no longer viable. Agencies are challenged by the volume of data, slow processes, and the potential for human error. Plus, government agencies are endeavouring to improve their customer service experience to meet the rapid change in their citizens’ behaviour and expectations in the digital age.</p><p>In this blog post, learn how government agencies can use Amazon Web Services (AWS) to build data interchange zones to automate their ability to ingest and validate data from other agencies or external entities in a secure manner. Automating this process can help agencies save time to focus on more strategic aspects of their mission.</p><h2>Data interchange challenges for government agencies</h2><p>Obtaining secure data from external entities poses many challenges for government agencies. These challenges include receiving datasets with different file formats and sizes; the need for multi-layer dataset vetting processes like malware checking and schema validation; and seamless integration with different internal platforms, such as protected data lakes and authentication and authorization systems. To address these obstacles, government agencies can establish a protected data interchange platform to create a multi-layered rigorous dataset vetting process between external entities and internal systems.</p><h2>How to build data interchange zones on AWS</h2><p>A data interchange zone acts as a public interface to a protected data lake, enabling external entities to simply and securely upload or download validated datasets or reports. In the following architecture framework (Figure 1) for a data interchange zone using AWS services, the solution is built on a serverless architecture. Using severless architecture can help agencies scale and meet workload demand during different data collection cycles. Serverless architecture features automatic scaling, built-in high availability, and an only-pay-for-what-you-use billing model to increase agility and optimize costs.</p><p><img class="aligncenter wp-image-17106 size-full" src="https://d2908q01vomqb2.cloudfront.net/9e6a55b6b4563e652a23be9d623ca5055c356940/2022/09/29/Screen-Shot-2022-09-29-at-7.08.42-AM.png&quot; alt="Figure 1. The high-level architecture for a data interchange zone." width="1202" height="660" /></p><p><em>Figure 1. The high-level architecture for a data interchange zone.</em></p><p>This solution is composed of three key layers: the user interface (UI) layer, the validation layer, and the integration interface layer. For the UI layer, the platform has a UI component that communicates and integrates with the other systems (e.g., regulatory portals) to authenticate the users using <a href="https://aws.amazon.com/api-gateway/&quot;&gt;Amazon API Gateway</a> and <a href="https://aws.amazon.com/lambda/&quot;&gt;AWS Lambda</a>. Authenticated users can then upload files to the data interchange zone.</p><p>This is where services in the data validation layer take over. The solution uses <a href="https://aws.amazon.com/step-functions/&quot;&gt;AWS Step Functions</a> as a workflow service to orchestrate Lambda functions, which perform malware checks (using <a href="https://aws.amazon.com/blogs/apn/amazon-s3-malware-scanning-using-trend-micro-cloud-one-and-aws-security-hub/&quot;&gt;third-party antivirus software</a>) and basic data validations. This includes validations such as file format checks and schema validation.</p><p>To communicate with the platform, an interface layer uses private API Gateways to allow internal platforms, such as a protected data lake, to move the curated datasets to the raw bucket. In return, the protected data lake can provide reports on the quality of the datasets received from the external entities after further processing, and serve those reports back to end users, using API Gateway and Lambda supported by the interface and UI layer.</p><h3>UI layer</h3><p>To interact with the data interchange zone, external entities need to pass through authentication and authorization processes established in the UI layer. Figure 2 shows the high-level flow along with the different services involved to allow external entities to upload the data securely into the landing bucket.</p><p><img class="aligncenter wp-image-17102 size-full" src="https://d2908q01vomqb2.cloudfront.net/9e6a55b6b4563e652a23be9d623ca5055c356940/2022/09/29/government-data-interchange-zone-vet-external-data-figure-2-authentication-authorization-high-level-flow.jpg&quot; alt="Figure 2. The authentication and authorization high-level flow." width="1514" height="1038" /></p><p><em>Figure 2. The authentication and authorization high-level flow.</em></p><p>First, external entities can use the UI layer to log into the data interchange through an <a href="https://aws.amazon.com/cognito/&quot;&gt;Amazon Cognito</a> user pool or through the <a href="https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-identity.html&quot;&gt;Amazon Cognito federated identify provider</a> (IdP).</p><p>Once entities successfully log in, the entity client will receive valid Jason Web Tokens (JWT), such as an ID token, access token, and refresh tokens as part of the authentication response. Then the external entities make a REST API call to the API Gateway endpoint with the valid JWT ID token as an authorization header. API Gateway checks if the JWT token is valid, as it is configured with Amazon Cognito as the authorizer. If the token is valid, the API Gateway invokes the Lambda function to generate an Amazon Simple Storage Service (<a href="https://aws.amazon.com/s3/&quot;&gt;Amazon S3</a>) pre-signed URL to upload the data.</p><p>Finally, the external entity can upload files to the data interchange via the pre-signed URL returned in the previous step. With Amazon Cognito integration with API gateway and Lambda, only authenticated and authorized entities can upload files into the landing bucket.</p><h3>Validation and integration interface layer</h3><p>The validation process starts with a file upload event triggered by Amazon S3. The event is picked up by an AWS Lambda, which routes it to the AWS Step Function where processing begins with malware check, as shown in Figure 3. Based on the malware check result, which can be either clean or infected, the file is moved to either quarantine or clean buckets in Amazon S3. If the file is clean, the basic validation triggers to run checks against the predefined validation rules stored in <a href="https://aws.amazon.com/dynamodb/&quot;&gt;Amazon DynamoDB</a>.</p><p>Once the validation is complete, the results persist in DynamoDB. By enabling <a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html&quot;&gt;Amazon DynamoDB Streams</a>, you can capture a time-ordered sequence of item-level modifications. Lambda polls the stream and invokes the data interchange lambda function synchronously when it detects new stream records to update the UI using the WebSocket API call, as shown in Figure 3.</p><p>Once all the mandatory files have passed the validation, the private API waits for a signal from the other systems, like the protected data lake, to move the file from a conformed bucket to the data lake for further processing. All the processing steps throughout the flow are recorded in an DynamoDB table for audit and logging purposes.</p><p><img class="aligncenter wp-image-17103 size-full" src="https://d2908q01vomqb2.cloudfront.net/9e6a55b6b4563e652a23be9d623ca5055c356940/2022/09/29/government-data-interchange-zone-vet-external-data-figure-3-validation-integration-interfaces.jpg&quot; alt="Figure 3. The validation and integration interfaces." width="2708" height="1446" /></p><p><em>Figure 3. The validation and integration interfaces.</em></p><h2>Get started with data interchange zones</h2><p>Creating a data interchange zone as an extensible secure data acquisition pattern can provide government agencies with a secure, cost effective, and automated approach to onboard data sources from external entities. By running rigorous data validation and security checks in near real-time with AWS services, government agencies can expand their secure data acquisition cycle to external entities or the community in a simple and automated way.</p><p>If you want to learn more, reach out to your AWS Account Team or <a href="https://aws.amazon.com/professional-services/&quot;&gt;AWS Professional Services</a>. Not yet an AWS customer and want to learn more? Send an inquiry to the <a href="https://aws.amazon.com/government-education/contact/?trkCampaign=ps&amp;amp;trk=ps_blog_body&quot;&gt;AWS Public Sector Sales team</a>.</p><h3>Read more about <a href="https://aws.amazon.com/blogs/publicsector/category/public-sector/government/&quot;&gt;AWS for government</a>:</h3><p><a href="https://pages.awscloud.com/aws-public-sector-blog-newsletter.html?trk=ta_a134p000006vtafAAA&amp;amp;trkCampaign=AWS_Public_Sector_Blog_Newsletter_Opt-In&amp;amp;sc_channel=ta&amp;amp;sc_campaign=Blog-opt-in-internal-newsletter&amp;amp;sc_outcome=WWPS&quot; target="_blank" rel="noopener noreferrer"><em>Subscribe to the AWS Public Sector Blog newsletter</em></a> <em>to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or</em> <a href="https://aws.amazon.com/government-education/contact/?trkCampaign=ps&amp;amp;trk=ps_blog_body&quot; target="_blank" rel="noopener noreferrer"><em>contact us</em></a><em>.</em></p><p><em><a href="https://amazonmr.au1.qualtrics.com/jfe/form/SV_erMC5rmaj1pKfgG&quot; target="_blank" rel="noopener noreferrer">Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey</a>, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.</em></p></section>

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

政府机构 数据交换区 AWS 数据验证
相关文章