How to implement role-based access control for AWS (Cognito, DynamoDB)

Tue Jun 22 2021

tags: public howto documentation programming

Introduction

We have projects. Each project can have one or more S3 buckets. Each project can have one or more users. Each user can belong to one or more projects. (There is a one-to-many mapping between projects and buckets, and a many-to-many mapping between users and projects).

We want to make sure that a user can log in and access only the buckets that are part of projects he belongs to. As Amazon AWS is a huge and very complicated system, there were many ways to do it, all of which seemed plausible. I worked with Chris to explore different approaches, eventually settling on the current one. In this design document I list the five different approaches, explore their advantages and disadvantages, and justify why I chose the implementation we did.

Approaches

Five different approaches were tried and four were rejected. I eventually settled on using the AssumeRole method which I believe is the correct tradeoff.

  1. Using IAM User/Role/Bucket policies [Rejected]
  2. Using Cognito User Groups [Rejected]
  3. Using Attribute-based access control [Rejected]
  4. Using AssumeRoleWithWebIdentity [Rejected]
  5. Using AssumeRole [Accepted]

Using IAM User/IAM Role/Bucket policies [Rejected]

When a user is onboarded, we create an IAM User and a Cognito User in the Cognito User Pool.

When the person logs into the Cognito user pool, the identity pool will give him this IAM Role using the JWT loop with the same policies as the Bedrock/Talend IAM user.

On each bucket we create a policy that only allows a particular IAM Role/IAM user to access it.

The user flow is as follows:

  1. Use Session Token to assume IAM Role.
  2. Use IAM Role to access S3 Bucket.

The dealbreaker of this approach is that every user would potentially have to have a unique IAM User/Role, and we couldn't attach all of them to a bucket. This is not scalable as there is a maximum policy length of the bucket, and as you add more users, eventually the bucket will not be able to accommodate more users. We need a way to do group-based authentication, which is what I turned to next.

Using Cognito User Groups [Rejected]

The disadvantage of the previous method is that there was no grouping.

So now we institute a one-to-one-mapping between projects and Cognito user groups. Each project will be a Cognito user group, and that user group would allow access to all buckets in that user group.

The onboarding flow is as follows:

  1. Create bucket. Get bucket ARN.
  2. [Requires bucket ARN] Create IAM role that allows access to bucket. Get IAM role.
  3. [Requires IAM Role ARN] Create Cognito User Group, passing in IAM Role ARN.
  4. [Requires Cognito User Group ARN] Set bucket policy on S3 bucket allowing user group access.

Putting the permissions on the user group is done as follows:

{
// ...
Effect: "Allow",
Resource: [
`arn:aws:s3:::${bucketName}`,
`arn:aws:s3:::${bucketName}/*`,
]
}

Users would get their IAM Role via getCredentialsForIdentity.

The main disadvantage of this approach is that many front end requests would be required to list all groups a user belonged to. When a user logs in, he has to assume a role. But one role is mapped one-to-one to a project. So what happens if a user is part of multiple projects? Then he has to get the first user group role, list the buckets in that project, then switch to the second user group role, list the buckets in that project, and so on. Chris and I felt that this was untenable.

You could try to have a "super user group" that encompassed the permissions of multiple IAM roles. But this could conceivably lead to an exponential number of user groups. For example, if you have 3 projects (A, B, C), then you would need 7 user groups to cover all eventualities: A, B, C, AB, AC, BC, ABC. And if you have 4 projects then you need 15 user groups, and so on... We decided that this approach was not feasible.

Attribute-based access control [Rejected]

The next idea was to put attributes on Cognito users, where each attribute is a project. For example, if the user has three different projects, then he would have {project1: true, project2: true, project16: true}. These project names would have to be globally unique. Then you would create a global policy that grants permissions to the project's resources as long as the user has that project name in his tags.

This approach was recommended by the AWS ABAC tutorial:

You can map the membership attribute to a tag key for principal to be passed on to the IAM permissions policy. This way you can create a single permissions policy and conditionally allow access to premium content based on the value of membership level and tag on the content files.

This approach had several advantages. The first advantage is that you only need one front-end request, and you don't need an exponential number of user groups. It would allow fine-grained user permissions: for example, you could have {project1dev: true, project2dev: true, project16admin: true}, to make the user an admin of project 16.

The key disadvantages was that for this method to work, we would have to have one long policy document that grants permissions to the project's resources as long as the user has that project name in his tags. And every time a new project was created, we would have to append to the policy document. But there is a maximum character limit on a single permissions policy, which would impose a limit on the total number of projects in our system.

While there's a way to get around this with a slightly different ABAC mechanism, there is no free lunch. We faced a dilemma: either we limit the total number of projects per user, or you limit the total number of projects in our system. Eventually we decided to try something else.

AssumeRoleWithWebIdentity [Rejected]

Idea here is to dynamically generate access control based on the WebIdentity provided. We look up the user in the DynamoDB table, check what projects the user belongs to, and builds a custom IAM Role that allows the user the ability to access only those buckets.

User flow:

  1. User signs in with Cognito User Pool to get a JWT.
  2. User calls getId to get an identity pool ID.
  3. User calls AssumeRoleWithWebIdentity with the identity pool ID.
  4. The Lambda function looks at the ID, looks up the DynamoDB table to check the projects the user is in, and builds a custom IAM role which is the intersection of the base IAM role (allows access to all S3 buckets) and the dynamically-generated InlinePolicy.

Advantages:

The key advantage is that this allows us to dynamically generate a custom IAM role depending on the user's projects, which solves a lot of the problems of the previous approaches:

  1. It would require fewer requests to switch roles and access buckets from different projects, making the frontend development much simpler. In the role-based control approach a user has multiple roles, and the frontend needs to make multiple getCredentialsForidentity calls to switch between these roles in order to access buckets from different projects. The new ABAC system would not require switching roles with getCredentialsForIdentity as there is only one role for that particular identity.
  2. It would potentially allow more fine-grained control of project resources. For example, a user group might have "superusers" and "regular users", and this could be reflected in the DynamoDB. In fact, you can put any type of condition/property in the DynamoDB and we could check for that.

The dealbreaker disadvantage is that this approach is not secure, because AssumeRoleWithWebIdentity is a call that does not require credentials to make. (This is not quite the same as an unauthenticated endpoint: you still need a valid Cognito OpenID token). Why is this a problem? Because this means a user could technically write their own custom frontend and call AssumeRoleWithWebIdentity themselves. They could then pass in an empty inline policy, which would give them access to all S3 buckets. While this is a bit of an edge case, we thought this was nevertheless a security flaw and abandoned this idea.

AssumeRole [Accepted]

Like before, the idea here is to dynamically generate the IAM role based on the user.

User flow:

  1. User signs in with Cognito User Pool to get a JWT Bearer Token.
  2. User calls the auths/resources endpoint using a GET request, passing in the JWT Bearer Token.
  3. The auths/resources endpoint calls a Lambda function, which looks at the user ID contained in the JWT Bearer Token, looks up the DynamoDB table to check the projects the user is in.
  4. The Lambda function generates a custom InlinePolicy from the information it obtains in the DynamoDB Table.
  5. The lambda function calls sts.assumeRole to generate a custom IAM role, which is the intersection of the base IAM role (allows access to all S3 buckets) and the dynamically-generated InlinePolicy.

The auths/resources endpoint is private. Anyone who tries to access it without a valid JWT Bearer token will be given a 401 Forbidden error.

This approach is very similar to the previous AssumeRoleWithWebIdentity. The key difference is that the user himself cannot call assumeRole. I have given permissions only for the Lambda function to call assumeRole. So the user cannot create his own frontend like he could in AssumeRoleWithWebIdentity.

Advantages:

  • Secure
  • No exponential number of user groups
  • No need to switch roles
  • Fine-grained access control

Disadvantages:

  • There is a maximum character limit in a single permissions policy (10240 chars). This means that we can only have at most ~50 concurrent projects per user. We decided that this was an acceptable limit.

AssumeRole's Base IAM Role

The Base IAM role allows access to all S3 buckets, and looks like this:

AllowAccessToAllS3BucketsRole:
Type: "AWS::IAM::Role"
Properties:
RoleName: ${self:custom.IAMRolePrefix}-allow-access-to-all-s3-buckets-role
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: 'Allow'
Principal:
AWS:
Fn::GetAtt:
- AllowAssumeRoleRole
- Arn
Action:
- sts:AssumeRole
PermissionsBoundary: 'arn:aws:iam::052567997892:policy/GCCIAccountBoundary'
Policies:
- PolicyName: 'allow-access-to-all-s3-buckets'
PolicyDocument:
Version: '2012-10-17'
Statement:
Action:
- s3:GetObject
- s3:PutObject
- s3:PutObjectAcl
- s3:PutObjectVersionAcl
- s3:ListBucket
Effect: Allow
Resource:
- arn:aws:s3:::*
- arn:aws:s3:::*/*
Condition:
Bool:
aws:SecureTransport: true

For security purposes, only the AllowAssumeRoleRole is allowed to assume this role. This base role should not be passed to the user. Instead, we will create a new role like this:

export async function assumeRole() {
const baseRoleArn = 'arn:aws:iam::052567997892:role/iamrole-imda-psf-lieudev-allow-access-to-all-s3-buckets-role'
const inlinePolicy = ... // create the inline policy here after a DynamoDB query
const assumeRoleParams = {
RoleArn: baseRoleArn, // Role the caller is assuming (base role). In this case, AllowAccessToS3Buckets
RoleSessionName: 'TestUser', // pass the name associated with the user
Policy: inlinePolicy,
DurationSeconds: 3600,
}

try {
const sts = new AWS.STS()
const tokenResult = await sts.assumeRole(assumeRoleParams).promise()
...

We can see that we pass in inlinePolicy into assumeRoleParams, which creates a new role which is the intersection of inlinePolicy and baseRoleArn. This will allow only access to resources that are allowed inside inlinePolicy.