AWS Machine Learning Blog 15小时前
Implement user-level access control for multi-tenant ML platforms on Amazon SageMaker AI
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了在企业级机器学习环境中,如何使用属性访问控制(ABAC)模式来管理Amazon SageMaker AI资源的权限。通过利用IAM策略变量和源身份,可以实现细粒度的用户访问控制,同时最大程度地减少AWS IAM角色的数量。文章分享了最佳实践,帮助组织在不牺牲运营效率的前提下,维护安全性和合规性。

🔑 集中式账户结构可以统一治理策略,提高资源利用率,并简化合规性管理,但需要解决团队间的工作负载隔离和权限管理问题。

💡 通过为每个业务单元创建独立的Amazon SageMaker Studio域,可以实现工作负载隔离,并使用域ARN标记资源。

🛡️ ABAC模式利用IAM策略变量,实现用户级访问控制,同时维护域级执行角色,从而实现IAM的有效扩展。

👤 IAM源身份可以帮助审计,通过追踪CloudTrail上的用户活动,而SageMaker AI上下文键仅在SageMaker Studio内可用。

📦 通过控制对sagemaker:AddTags操作的访问,可以防止通过更改SageMaker AI资源标签来修改资源所有权和访问权限。

☁️ 通过将用户配置文件名与内部用户标识符对齐,确保用户在平台上的唯一性,并实现更精细的资源访问控制。

🔒 结合使用源身份和IAM身份策略,可以限制用户仅访问其自己的S3前缀,从而实现S3访问控制。

🔑 使用Secrets Manager,结合源身份,可以允许访问用户特定的密钥层次结构,从而实现密钥访问控制。

⚙️ 通过在EMR集群上使用标签,并结合源身份,可以实现对EMR集群的访问控制。

📁 通过使用sagemaker:FileSystemDirectoryPath条件,限制SageMaker训练作业只能访问其用户目录。

🔍 源身份可以追踪CloudTrail日志中的用户行为,实现对SageMaker Studio内用户资源访问的精确监控。

📊 通过审计CloudTrail日志,可以追踪SageMaker Studio用户对AWS Glue Data Catalog的访问,增强可审计性和合规性。

Managing access control in enterprise machine learning (ML) environments presents significant challenges, particularly when multiple teams share Amazon SageMaker AI resources within a single Amazon Web Services (AWS) account. Although Amazon SageMaker Studio provides user-level execution roles, this approach becomes unwieldy as organizations scale and team sizes grow. Refer to the Operating model whitepaper for best practices on account structure.

In this post, we discuss permission management strategies, focusing on attribute-based access control (ABAC) patterns that enable granular user access control while minimizing the proliferation of AWS Identity and Access Management (IAM) roles. We also share proven best practices that help organizations maintain security and compliance without sacrificing operational efficiency in their ML workflows.

Challenges with resource isolation across workloads

Consider a centralized account structure at a regulated enterprise such as finance or healthcare: a single ML platform team manages a comprehensive set of infrastructure that serves hundreds of data science teams across different business units. With such a structure, the platform team can implement consistent governance policies that enforce best practices. By centralizing these resources and controls, you can achieve better resource utilization, maintain security compliance and audit trials, and unify operational standards across ML initiatives. However, the challenge lies in maintaining workload isolation between teams and managing permissions between users of the same team.

With SageMaker AI, platform teams can create dedicated Amazon SageMaker Studio domains for each business unit, thereby maintaining resource isolation between workloads. Resources created within a domain are visible only to users within the same domain, and are tagged with the domain Amazon Resource Names (ARNs). With tens or hundreds of domains, using a team-level or domain-level role compromises security and impairs auditing, whereas maintaining user-level roles lead to hundreds of roles to create and manage, and often runs into IAM service quotas.

We demonstrate how to implement ABAC that uses IAM policy variables to implement user-level access controls while maintaining domain-level execution roles, so you can scale IAM in SageMaker AI securely and effectively. We share some of the common scenarios and sample IAM policies to solve permissions management, however, the patterns can be extended to other services as well.

Key concepts

In this solution, we use two key IAM concepts: source identity and context keys.

An IAM source identity is a custom string that administrators can require be passed on a role assumption, that is used to identify the person or application that is performing these actions. The source identity is logged to AWS CloudTrail and also persists through role chaining, which takes place when a role is assumed by a second role through the AWS Command Line Interface (AWS CLI) or API (refer to Roles terms and concepts for additional information).

In SageMaker Studio, if the domain is set up to use a source identity, the user profile name is passed as the source identity to any API calls made by the user from a user’s private space using signed requests (using AWS Signature Version 4). Source identity enables auditability as API requests from the assumed execution role will contain in the session context the attached source identity. If external AWS credentials (such as access keys) are used to access AWS services from the SageMaker Studio environment, the SourceIdentity from the execution role assumption will not be set for those credentials.

SageMaker Studio supports two condition context keys: sagemaker:DomainId and sagemaker:UserProfileName for certain actions related to SageMaker domains. These context keys are powerful IAM policy variables that make it possible for admins to create dynamic ABAC policies that automatically scope permissions based on a user’s identity and domain. As the name implies, the DomainId key can be used to scope actions to specific domains, and the UserProfileName key enables user-specific resource access patterns.

Although the source identity and sagemaker:UserProfileName can be used interchangeably in IAM policies, there are key differences:

In the following sections, we explore a few common scenarios and share IAM policy samples using the preceding principles. With this approach, you can maintain user-level resource isolation without using user-level IAM roles, and adhere to principles of least privilege.

Prerequisites

Before implementing this solution, make sure your SageMaker Studio domain meets the following criteria:

{    "Version": "2012-10-17",    "Statement": [        {            "Effect": "Allow",            "Principal": {                "Service": "sagemaker.amazonaws.com"            },            "Action": ["sts:AssumeRole", "sts:SetSourceIdentity"]        }    ]}
aws sagemaker update-domain \    --domain-id <value> \    --domain-settings-for-update "ExecutionRoleIdentityConfig=USER_PROFILE_NAME"
{    "Sid": "AddTagsOnCreate",    "Effect": "Allow",    "Action": ["sagemaker:AddTags"],    "Resource": ["arn:aws:sagemaker:<region>:<account_number>:*"],    "Condition": {        "Null": {            "sagemaker:TaggingAction": "false"        }    }}

When using ABAC-based approaches with SageMaker AI, access to the sagemaker:AddTags action must be tightly controlled, otherwise ownership of resources and therefore access to resources can be modified by changing the SageMaker AI resource tags.

Solution overview

In this post, we demonstrate how to use IAM policy variables and source identity to implement scalable, user-level access control in SageMaker AI. With this approach, you can do the following:

In the following sections, we share some common scenarios for implementing access control and how it can be achieved using the SourceIdentity and policy variables.

It is recommended that you align SageMaker user profile names with other internal user identifiers to make sure these user profile names are aligned with platform users and are unique in this aspect.

SageMaker AI access control

When managing a shared SageMaker Studio domain with multiple users, administrators often need to implement resource-level access controls to prevent users from accessing or modifying each other’s resources. For instance, you might want to make sure data scientists can’t accidentally delete another team member’s endpoints or access SageMaker training jobs they don’t own. In such cases, the sagemaker:DomainId and sagemaker:UserProfileName keys can be used to place this restriction. See the following sample policy:

{    "Sid": "TrainingJobPermissions",    "Effect": "Allow",    "Action": [        "sagemaker:StopTrainingJob",        "sagemaker:DescribeTrainingJob",        "sagemaker:UpdateTrainingJob"    ],    "Resource": "arn:aws:sagemaker:{region}:{account_number}:training-job/*",    "Condition": {        "StringLike": {            "sagemaker:ResourceTag/sagemaker:user-profile-arn": "arn:aws:sagemaker:<region>:<account_number>:user-profile/${sagemaker:DomainId}/${sagemaker:UserProfileName}"        }        ...    }}

You can also write a similar policy using a source identity. The only limitation is that in this case, the domain ID must be specified or left as a wildcard. See the following example:

{    "Sid": "TrainingJobPermissions",    "Effect": "Allow",    "Action": [        "sagemaker:StopTrainingJob",        "sagemaker:DescribeTrainingJob",        "sagemaker:UpdateTrainingJob"    ],    "Resource": "arn:aws:sagemaker:{region}:{account_number}:training-job/*",    "Condition": {        "StringLike": {            "sagemaker:ResourceTag/sagemaker:user-profile-arn": "arn:aws:sagemaker:<region>:<account_number>:user-profile/<domain_id>/${aws:SourceIdentity}"        }        ...    }}

Amazon S3 access control

Amazon S3 is the primary storage service for SageMaker AI, and is deeply integrated across different job types. It enables efficient reading and writing of datasets, code, and model artifacts. Amazon S3 features like lifecycle policies, encryption, versioning, and IAM controls help keep SageMaker AI workflow data secure, durable, and cost-effective. Using a source identity with IAM identity policies, we can restrict users in SageMaker Studio and the jobs they launch to allow access only their own S3 prefix:

{    "Version": "2012-10-17",    "Statement": [        [            {                "Sid": "ListBucket",                "Effect": "Allow",                "Action": "s3:ListBucket",                "Resource": "arn:aws:s3:::my_bucket",                "Condition": {                    "StringLikeIfExists": {                        "s3:prefix": ["my_domain/users/${aws:SourceIdentity}/*"]                    }                }            },            {                "Sid": "AccessBucketObjects",                "Effect": "Allow",                "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],                "Resource": [                    "arn:aws:s3:::my_bucket/my_domain/users/${aws:SourceIdentity}/*"                ]            }        ]    ]}

You can also implement deny policies on resource policies such as S3 bucket policies to make sure they can only be accessed by the appropriate user. The following is an example S3 bucket policy that only allows get, put, and delete object actions for user-a and user-b SageMaker Studio user profiles:

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "DenyUnlessMatchingUser",            "Effect": "Deny",            "Principal": "*",            "Action": [                "s3:GetObject",                "s3:PutObject",                "s3:DeleteObject"            ],            "Resource": "arn:aws:s3:::my-bucket",            "Condition": {                "StringNotLike": {                    "aws:SourceIdentity": ["user-a", "user-b"]                }            }        }    ]}

Secrets Manager secret access control

With Secrets Manager, you can securely store and automatically rotate credentials such as database passwords, API tokens, and Git personal access tokens (PATs). With SageMaker AI, you can simply request the secret at runtime, so your notebooks, training jobs, and inference endpoints stay free of hard-coded keys. Secrets stored in AWS Secrets Manager are encrypted by AWS Key Management Service (AWS KMS) keys that you own and control. By granting the SageMaker AI execution role GetSecretValue permission on the specific secret ARN, your code can pull it with one API call, giving you audit trails in CloudTrail for your secrets access. Using the source identity, you can allow access to a user-specific hierarchy level of secrets, such as in the following example:

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "UserSpecificSecretsAccess",            "Effect": "Allow",            "Action": [                "secretsmanager:GetSecretValue",            ],            "Resource": "arn:aws:secretsmanager:<region>:<account_number>:secret:user-secrets/${aws:SourceIdentity}/*"        }    ]}

Amazon EMR cluster access control

SageMaker Studio users can create, discover, and manage Amazon EMR clusters directly from SageMaker Studio. The following policy restricts SageMaker Studio users access to EMR clusters by requiring that the cluster be tagged with a user key matching the user’s SourceIdentity. When SageMaker Studio users connect to EMR clusters, this makes sure they can only access clusters explicitly assigned to them through the tag, preventing unauthorized access and enabling user-level access control (refer to IAM policies for tag-based access to clusters and EMR notebooks and Connect to an Amazon EMR cluster from SageMaker Studio or Studio Classic).

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "AllowClusterCreds",            "Effect": "Allow",            "Action": [                "elasticmapreduce:GetClusterSessionCredentials"            ],            "Resource": "*",            "Condition": {                "StringEquals": {                    "elasticmapreduce:ResourceTag/user": "${aws:SourceIdentity}"                }            }        }    ]}

File system access control in SageMaker training jobs

SageMaker training jobs can use file systems such as Amazon Elastic File System (Amazon EFS) or Amazon FSx for Lustre to efficiently handle large-scale data processing and model scaling workloads. This is particularly valuable when dealing with large datasets that require high throughput, when multiple training jobs need concurrent access to the same data, or when you want to maintain a persistent storage location that can be shared across different training jobs. This can be provided to the SageMaker training job using a FileSystemDataStore parameter.

When using a common file system across users, administrators might want to restrict access to specific folders, so users don’t overwrite each other’s work or data. This can be achieved using a condition called sagemaker:FileSystemDirectoryPath. In this pattern, each user has a directory on the file system that’s the same as their user profile name. You can then use an IAM policy such as the following to make sure the training jobs they run are only able to access their directory:

{    "Version": "2012-10-17",    "Statement": [        {            "Sid": "SageMakerLaunchJobs",            "Effect": "Allow",            "Action": [                "sagemaker:CreateTrainingJob"            ],            "Resource": "arn:aws:sagemaker:<region>:<account_number>:training-job/*",            "Condition": {                "ForAllValues:StringEquals": {                    "sagemaker:FileSystemDirectoryPath": [                        "/fsx/users/${aws:SourceIdentity}”                    ]                }            }        }    ]}

Monitor user access with the source identity

The previous examples demonstrate how the source identity can serve as a global context key to control which AWS resources a SageMaker Studio user can access, such as SageMaker training jobs or S3 prefixes. In addition to access control, the source identity is also valuable for monitoring individual user resource access from SageMaker Studio.

The source identity enables precise tracking of individual user actions in CloudTrail logs by propagating the SageMaker Studio user profile name as the sourceIdentity within the SageMaker Studio execution role session or a chained role. This means that API calls made from SageMaker Studio notebooks, SageMaker training and processing jobs, and SageMaker pipelines include the specific user’s identity in CloudTrail events. For more details on the supported scenarios, refer to Considerations when using sourceIdentity.

As a result, administrators can monitor and audit resource access at the individual user level rather than only by the IAM role, providing clearer visibility and stronger security even when users share the same execution role.

Monitor access to the AWS Glue Data Catalog with AWS Lake Formation permissions

You can also use source identity auditing capabilities to track which specific SageMaker Studio user accessed the AWS Glue Data Catalog with AWS Lake Formation permissions. When a user accesses an AWS Glue table governed by Lake Formation, the lakeformation:GetDataAccess API call is logged in CloudTrail. This event records not only the IAM role used, but also the sourceIdentity propagated from the SageMaker Studio user profile, enabling precise attribution of data access to the individual user.

By reviewing these CloudTrail logs, administrators can see which SageMaker Studio user (using the sourceIdentity field) accessed which Data Catalog resources, enhancing auditability and compliance. Refer to Apply fine-grained data access controls with Lake Formation and Amazon EMR from SageMaker Studio for additional information.

Accessing an AWS Glue table with Amazon Athena

When a SageMaker Studio user queries an AWS Glue table through Amazon Athena using a library like the AWS SDK for Pandas, such as running a simple SELECT query from a SageMaker Studio notebook or from a SageMaker processing or training job, this access is logged by Lake Formation in CloudTrail as a GetDataAccess event. The event captures key details, including the IAM role used, the propagated sourceIdentity (which corresponds to the SageMaker Studio user profile name), the AWS Glue table and database accessed, the permissions used (for example, SELECT), and metadata like the Athena query ID.

The following is a typical CloudTrail log entry for this event (simplified for readability):

{    "userIdentity": {        "type": "AssumedRole",        "sessionContext": {            "sessionIssuer": {                "arn": "arn:aws:iam::012345678901:role/my_role"            },        "sourceIdentity": "STUDIO_USER_PROFILE_NAME"        }    },    "eventTime": "2025-04-18T13:16:36Z",    "eventSource": "lakeformation.amazonaws.com",    "eventName": "GetDataAccess",    "requestParameters": {        "tableArn": "arn:aws:glue:us-east-1:012345678901:table/my_database/my_table",        "permissions": [            "SELECT"        ],        "auditContext": {            "additionalAuditContext": "{queryId: XX-XX-XX-XX-XXXXXX}"        }    },    "additionalEventData": {        "requesterService": "ATHENA"    }}

Accessing an AWS Glue table with Amazon EMR

When a SageMaker Studio user queries an AWS Glue table through Amazon EMR (PySpark), such as running a simple SELECT query from a SageMaker Studio notebook connected to an EMR cluster with IAM runtime roles (see Configure IAM runtime roles for EMR cluster access in Studio) or from a SageMaker pipeline with an Amazon EMR step, this access is logged by Lake Formation in CloudTrail as a GetDataAccess event. The event captures key details, including the IAM role used, the propagated sourceIdentity (which corresponds to the SageMaker Studio user profile name), the AWS Glue table and database accessed, and the permissions used (for example, SELECT).

The following is a typical CloudTrail log entry for this event (simplified for readability):

{    "userIdentity": {        "type": "AssumedRole",        "sessionContext": {            "sessionIssuer": {                "arn": "arn:aws:iam::012345678901:role/my-role"            },        "sourceIdentity": "STUDIO_USER_PROFILE_NAME"        }    },    "eventTime": "2025-04-18T13:16:36Z",    "eventSource": "lakeformation.amazonaws.com",    "eventName": "GetDataAccess",    "requestParameters": {        "tableArn": "arn:aws:glue:us-east-1:012345678901:table/my_database/my_table",        "permissions": [            "SELECT"        ]    },    "additionalEventData": {        "LakeFormationAuthorizedSessionTag": "LakeFormationAuthorizedCaller:Amazon EMR",    }}

Best practices

To effectively secure and manage access in environments using ABAC, it’s important to follow proven best practices that enhance security, simplify administration, and maintain clear auditability. The following guidelines can help you implement ABAC with a source identity in a scalable and maintainable way:

Refer to SageMaker Studio Administration Best Practices for additional information on identity and permission management.

Conclusion

In this post, we demonstrated how to implement user-level access control in SageMaker Studio without the overhead of managing individual IAM roles. By combining SageMaker AI resource tags, SageMaker AI context keys, and source identity propagation, you can create dynamic IAM policies that automatically scope permissions based on user identity while maintaining shared execution roles. We showed how to apply these patterns across various AWS services, including SageMaker AI, Amazon S3, Secrets Manager, and Amazon EMR. Additionally, we discussed how the source identity enhances monitoring by propagating the SageMaker Studio user profile name into CloudTrail logs, enabling precise tracking of individual user access to resources like SageMaker jobs and Data Catalog tables. This includes access using Athena and Amazon EMR, providing administrators with clear, user-level visibility for stronger security and compliance across shared execution roles. We encourage you to implement these user-level access control techniques today and experience the benefits of simplified administration and compliance tracking.


About the authors

Durga Sury is a Senior Solutions Architect at Amazon SageMaker, where she helps enterprise customers build secure and scalable AI/ML platforms. When she’s not architecting solutions, you can find her enjoying sunny walks with her dog, immersing herself in murder mystery books, or catching up on her favorite Netflix shows.

Itziar Molina Fernandez is a Machine Learning Engineer in the AWS Professional Services team. In her role, she works with customers building large-scale machine learning platforms and generative AI use cases on AWS. In her free time, she enjoys cycling, reading, and exploring new places.

Will Parr is a Machine Learning Engineer at AWS Professional Services, helping customers build scalable ML platforms and production-ready generative AI solutions. With deep expertise in MLOps and cloud-based architecture, he focuses on making machine learning reliable, repeatable, and impactful. Outside of work, he can be found on a tennis court or hiking in the mountains.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SageMaker AI 权限管理 ABAC IAM 访问控制
相关文章