AWS Machine Learning Blog 2024年10月16日
Create a data labeling project with Amazon SageMaker Ground Truth Plus
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Amazon SageMaker Ground Truth Plus 是 AWS 提供的一种强大的数据标注服务,它提供了一个全面且可扩展的平台,用于使用各种人力标注员对各种类型的数据(包括文本、图像、视频和 3D 点云)进行标注。除了传统的定制深度学习模型之外,SageMaker Ground Truth 还支持生成式 AI 用例,从而能够为人工智能和机器学习 (AI/ML) 模型生成高质量的训练数据。SageMaker Ground Truth 包括自助服务选项和 AWS 托管选项,称为 SageMaker Ground Truth Plus。本文重点介绍如何通过创建项目和共享需要标注的数据来开始使用 SageMaker Ground Truth Plus。

🤔 **项目请求:** 首先,您需要在“开始使用 Amazon SageMaker Ground Truth”页面上填写咨询表格,或者如果您已经拥有 AWS 帐户,则可以在 SageMaker Ground Truth Plus 控制台上提交项目请求表格。AWS 专家会与您联系,以审查您特定的数据标注需求。您可以分享任何特定需求,例如主题专业知识、语言专业知识或标注员的地理位置。如果您提交了咨询表格,则可以在 SageMaker Ground Truth Plus 控制台上提交项目请求表格,并且无需进一步讨论即可获得批准。如果您提交了项目请求,则您的项目状态将从“正在审查”更改为“请求已批准”。

👷 **创建项目团队:** 接下来,您需要创建您的项目团队,其中包括与您一起参与该项目的成员。每个团队成员都会收到加入您项目的邀请。现在,您需要将需要标注的数据上传到 Amazon 简单存储服务 (Amazon S3) 存储桶。要将该数据添加到您的项目中,请转到您的项目门户并创建一个批次,并包含 S3 存储桶 URL。每个项目都包含一个或多个批次。每个批次由要标注的数据对象组成。

🤖 **数据标注:** 现在,SageMaker Ground Truth Plus 团队将接管并根据您的特定数据标注需求选择标注员,对他们进行标注要求的培训,并为他们创建一个 UI 来标注您的数据。在标注后的数据通过内部质量检查后,它将被传回 S3 存储桶,供您用于训练您的 ML 模型。

📈 **解决方案架构:** 本文中的步骤说明了如何快速设置您的数据标注项目。这包括请求新项目、设置项目团队以及创建批次,其中包括您需要标注的数据对象。

🧰 **先决条件:** 为了完成本指南,您应该具备以下先决条件: AWS 帐户。 存储您数据的 S3 存储桶的 URI。该存储桶应位于美国东部 (弗吉尼亚北部) AWS 区域。 AWS 身份和访问管理 (IAM) 用户。如果您是 AWS 帐户的所有者,那么您拥有管理员访问权限,可以跳过此步骤。如果您的 AWS 帐户是 AWS 组织的一部分,那么您可以要求您的 AWS 管理员授予您的 IAM 用户所需的权限。以下基于身份的策略指定了您的 IAM 用户执行本文中所有步骤所需的最低权限(提供存储您数据的 S3 存储桶的名称: { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "groundtruthlabeling:ListProjects", "groundtruthlabeling:GetIntakeFormStatus", "groundtruthlabeling:CreateProject", "sagemaker:ListWorkforces", "sagemaker:PutLabelingPortalPolicy", "sagemaker:GetLabelingPortalPolicy", "iam:GetRole", "iam:PassRole", "iam:ListRoles", "iam:CreateRole", "iam:CreatePolicy", "iam:AttachRolePolicy", "s3:ListAllMyBuckets", "cognito-idp:CreateUserPool", "cognito-idp:ListUserPools", "cognito-idp:ListGroups", "cognito-idp:AdminAddUserToGroup", "cognito-idp:AdminCreateUser", "cognito-idp:CreateGroup", "cognito-idp:CreateUserPoolClient", "cognito-idp:CreateUserPoolDomain", "cognito-idp:DescribeUserPool", "cognito-idp:DescribeUserPoolClient", "cognito-idp:ListUsersInGroup", "cognito-idp:UpdateUserPool", "cognito-idp:UpdateUserPoolClient" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::<your-S3-bucket-name>", "arn:aws:s3:::<your-S3-bucket-name>/*" ] } ] }

Amazon SageMaker Ground Truth is a powerful data labeling service offered by AWS that provides a comprehensive and scalable platform for labeling various types of data, including text, images, videos, and 3D point clouds, using a diverse workforce of human annotators. In addition to traditional custom-tailored deep learning models, SageMaker Ground Truth also supports generative AI use cases, enabling the generation of high-quality training data for artificial intelligence and machine learning (AI/ML) models. SageMaker Ground Truth includes a self-serve option and an AWS managed option known as SageMaker Ground Truth Plus. In this post, we focus on getting started with SageMaker Ground Truth Plus by creating a project and sharing your data that requires labeling.

Overview of solution

First, you fill out a consultation form on the Get Started with Amazon SageMaker Ground Truth page or, if you already have an AWS account, you submit a request project form on the SageMaker Ground Truth Plus console. An AWS expert contacts you to review your specific data labeling requirements. You can share any specific requirements such as subject matter expertise, language expertise, or geographic location of labelers. If you submitted a consultation form, you submit a request project form on the SageMaker Ground Truth Plus console and it will be approved without any further discussion. If you submitted a project request, then your project status changes from Review in progress to Request approved.

Next, you create your project team, which includes people that are collaborating with you on the project. Each team member receives an invitation to join your project. Now, you upload the data that requires labeling to an Amazon Simple Storage Solution (Amazon S3) bucket. To add that data to your project, go to your project portal and create a batch and include the S3 bucket URL. Every project consists of one or more batches. Each batch is made up of data objects to be labeled.

Now, the SageMaker Ground Truth Plus team takes over and sources annotators based on your specific data labeling needs, trains them on your labeling requirements, and creates a UI for them to label your data. After the labeled data passes internal quality checks, it is delivered back to an S3 bucket for you to use for training your ML models.

The following diagram illustrates the solution architecture.

Using the steps outlined in this post, you’ll be able to quickly get set up for your data labeling project. This includes requesting a new project, setting up a project team, and creating a batch, which includes the data objects you needed labeled.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Request a project 

Complete the following steps to request a project:

    On the SageMaker console, under Ground Truth in the navigation pane, choose Plus.
    Choose Request project.
    For Business email address, enter a valid email. For Project name, enter a descriptive name with no spaces or special characters. For Task type, choose the option that best describes the type of data you need labeled. For Contains PII, only turn on if your data contains personally identifiable information (PII). For IAM role, the role you choose grants SageMaker Ground Truth Plus permissions to access your data in Amazon S3 and perform a labeling job. You can use any of the following options to specify the IAM role:
      Choose Create an IAM role (recommended), which provides access to the S3 buckets you specify and automatically attaches the required permissions and trust policy to the role. Enter a custom IAM role ARN. Choose an existing role.

If you don’t have permissions to create an IAM role, you may ask your AWS administrator to create the role for you. When using an existing role or a custom IAM role ARN, the IAM role should have the following permissions policy and trust policy.

The following code is the permissions policy:

{    "Version": "2012-10-17",    "Statement": [        {            "Effect": "Allow",            "Action": [                "s3:GetObject",                "s3:GetBucketLocation",                "s3:ListBucket",                "s3:PutObject"            ],            "Resource": [                "arn:aws:s3:::<input-bucket-name>",                "arn:aws:s3:::<input-bucket-name>/*",                "arn:aws:s3:::<output-bucket-name>",                "arn:aws:s3:::<output-bucket-name>/*"                     ]        }    ]}

The following code is the trust policy:

{    "Version": "2012-10-17",    "Statement": [        {            "Effect": "Allow",            "Principal": {                "Service": "sagemaker-ground-truth-plus.amazonaws.com"            },            "Action": "sts:AssumeRole"        }    ]}
    Choose Request project.

Under Ground Truth in the navigation pane, you can choose Plus to see your project listed in the Projects section with the status Review in progress.

An AWS representative will contact you within 72 hours to review your project requirements. When this review is complete, your project status will change from Review in progress to Request approved.

Create project team

SageMaker Ground Truth uses Amazon Cognito to manage the members of your workforce and work teams. Amazon Cognito is a service that you use to create identities for your workers. Complete the following steps to create a project team:

    On the SageMaker console, under Ground Truth in the navigation pane, choose Plus. Choose the Create project team.

The remaining steps depend on whether you create a new user group or import an existing group.

Option 1: Create a new Amazon Cognito user group

If you don’t want to import members from an existing Amazon Cognito user group in your account, or you don’t have any Amazon Cognito user groups in your account, you can use this option.

    When creating your project team, select Create a new Amazon Cognito user group. For Amazon Cognito user group name, enter a descriptive name with no spaces. For Email addresses, enter up to 50 addresses. Use a comma between addresses. Choose Preview invitation to see the email that is sent to the email addresses you provided. Choose Create project team.

Under Ground Truth in the navigation pane, choose Plus to see your project team listed in the Project team section. The email addresses you added are included in the Members section.

Option 2: Import existing Amazon Cognito user groups

If you have an existing Amazon Cognito user group in your account from which you want to import members, you can use this option.

    When creating your project team, select Import existing Amazon Cognito user groups. For Select existing Amazon Cognito user groups, choose the user group from which you want to import members. Choose Create project team.

Under Ground Truth in the navigation pane, choose Plus to see your project team listed in the Project team section. The email addresses you added are included in the Members section.

Access the project portal and Create Batch

You can use the project portal to create batches containing your unlabeled input data and track the status of your previously created batches in a project. To access the project portal, make sure that you have created at least one project and at least one project team with one verified member.

    On the SageMaker console, under Ground Truth in the navigation pane, choose Plus. Choose Open project portal.
    Log in to the project portal using your project team’s user credentials created in the previous step.

A list of all your projects is displayed on the project portal.

    Choose a project to open its details page. In the Batches section, choose Create batch.
    Enter the batch name, batch description, S3 location for input datasets, and S3 location for output datasets. Choose Submit.

To create a batch successfully, make sure you meet the following criteria:

Next, the SageMaker Ground Truth Plus team sets up data labeling workflows, which changes the batch status to In progress. Annotators label the data, and you complete your data quality check by accepting or rejecting the labeled data. Rejected objects go back to annotators to re-label. Accepted objects are delivered to an S3 bucket for you to use for training your ML models.

Conclusion

SageMaker Ground Truth Plus provides a seamless solution for building high-quality training datasets for your ML models. By using AWS managed expert labelers and automating the data labeling workflow, SageMaker Ground Truth Plus eliminates the overhead of building and managing your own labeling workforce. With its user-friendly interface and integrated tools, you can submit your data, specify labeling requirements, and monitor the progress of your projects with ease. As you receive accurately labeled data, you can confidently train your models, maintaining optimal performance and accuracy. Streamline your ML projects and focus on building innovative solutions with the power of SageMaker Ground Truth Plus.

To learn more, see Use Amazon SageMaker Ground Truth Plus to Label Data.


About the Authors

Joydeep Saha is a System Development Engineer at AWS with expertise in designing and implementing solutions to deliver business outcomes for customers. His current focus revolves around building cloud-native end-to-end data labeling solutions, empowering customers to unlock the full potential of their data and drive success through accurate and reliable machine learning models.

Ami Dani is a Senior Technical Program Manager at AWS focusing on AI/ML services. During her career, she has focused on delivering transformative software development projects for the federal government and large companies in industries as diverse as advertising, entertainment, and finance. Ami has experience driving business growth, implementing innovative training programs, and successfully managing complex, high-impact projects.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon SageMaker Ground Truth Plus 数据标注 AI训练数据 机器学习
相关文章