We replaced 14 long-lived IAM users with SSO + temporary credentials. The migration plan, the gotchas, and the policies we now enforce.

On this page

Zero Trust on AWS: Lessons From Implementing IAM Identity Center

Six months ago we had 14 long-lived IAM users, three of which had AdministratorAccess. Today we have zero IAM users with console access and zero static access keys for humans. Every action is traced back to an SSO identity with a session that expires in ≤ 8 hours.

Why We Did This #

We had three near-misses:

An access key checked into a public Gist (caught by GitHub secret scanning in 4 minutes — but still).
A laptop stolen with ~/.aws/credentials containing a long-lived key.
An ex-contractor's key was still active 3 months after their last day. Nobody noticed because the key wasn't rotated.

Any of those could have been catastrophic. Static credentials had to go.

The Architecture We Landed On #

code

┌─────────────────┐    SAML/OIDC    ┌─────────────────────────┐
│ Google Workspace│ ──────────────► │  AWS IAM Identity Center│
└─────────────────┘                 └────────────┬────────────┘
                                                 │ AssumeRole
                              ┌──────────────────┼──────────────────┐
                              ▼                  ▼                  ▼
                       ┌────────────┐     ┌────────────┐     ┌────────────┐
                       │ prod acct  │     │ stage acct │     │  dev acct  │
                       │  (3 roles) │     │  (4 roles) │     │  (5 roles) │
                       └────────────┘     └────────────┘     └────────────┘

Identity provider: Google Workspace, federated to IAM Identity Center
Permission sets: 7 roles total (e.g. ReadOnly, Developer, SREOnCall, BillingViewer, ProdBreakGlass)
Session duration: 1h for ProdBreakGlass, 8h for everything else
CLI access: aws sso login instead of static keys

The 4-Week Rollout #

Week 1: Inventory and Mapping #

bash.bash

aws iam list-users --query 'Users[*].[UserName,CreateDate,PasswordLastUsed]' --output table
aws iam list-access-keys --user-name <each user>
aws iam get-account-authorization-details > iam-snapshot.json

We mapped every IAM user → real human → permission set. Two "users" turned out to be service accounts that had been re-purposed for a human because someone needed admin quickly. Those got split.

Week 2: Build Permission Sets in Terraform #

hcl.hcl

resource "aws_ssoadmin_permission_set" "developer" {
  name             = "Developer"
  instance_arn     = local.sso_instance_arn
  session_duration = "PT8H"
  description      = "Read most things, write to dev/stage, no prod write"
}

resource "aws_ssoadmin_managed_policy_attachment" "developer_readonly" {
  instance_arn       = local.sso_instance_arn
  managed_policy_arn = "arn:aws:iam::aws:policy/ReadOnlyAccess"
  permission_set_arn = aws_ssoadmin_permission_set.developer.arn
}

resource "aws_ssoadmin_permission_set_inline_policy" "developer_dev_write" {
  instance_arn       = local.sso_instance_arn
  permission_set_arn = aws_ssoadmin_permission_set.developer.arn
  inline_policy      = data.aws_iam_policy_document.developer_dev_write.json
}

We code-reviewed every permission set. Three reviewers minimum for anything touching prod.

Week 3: Parallel Run #

For one week, both old IAM users and new SSO access were live. Engineers used SSO for daily work; old creds were the fallback. We measured CloudTrail events per identity to see who was still using the old path.

Week 4: Disable and Delete #

Disable console password on all IAM users (one click in console).
Deactivate all access keys.
Wait one week. No support tickets → delete the IAM users.

Gotchas We Hit #

1. Service Accounts Need Their Own Path #

You can't put a service account into IAM Identity Center. We were tempted to — don't. Use IAM Roles for Service Accounts (IRSA on EKS) or EC2 instance profiles or GitHub OIDC for CI. Static keys for services are a step backward.

yaml.yaml

# GitHub Actions assuming an AWS role via OIDC — no static creds
- uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789012:role/github-deploy
    aws-region: us-east-1

2. CLI Tools That Don't Speak SSO #

Some older CLIs (and some Terraform providers in 2023) didn't refresh SSO sessions cleanly. The fix was to use aws-vault or granted as a wrapper that handles refresh:

bash.bash

$ granted sso login --sso-region us-east-1
$ assume Developer.dev
[Developer.dev] $ terraform plan

3. Break-Glass Access for Real Emergencies #

You will eventually need to do something the normal SSO roles don't allow. We have a ProdBreakGlass permission set that:

Has AdministratorAccess
Session duration: 1 hour
Requires a second human approval via PagerDuty before assumption
Every assume event posts to a #sec-emergency Slack channel
Triggers a follow-up review ticket the next business day

We've used it twice in 6 months. Both times the post-incident review found the SSO permission sets were missing a legitimate permission, and we added it.

4. CloudTrail Cost #

Per-account CloudTrail with management + data events sent to S3 + CloudWatch added ~$180/month across our org. Worth it for the audit trail. We sample data events selectively (S3 + Lambda only) to keep cost in check.

The Policies We Enforce Now #

hcl.hcl

# SCP on the org root
data "aws_iam_policy_document" "no_iam_users" {
  statement {
    sid    = "DenyIAMUserCreation"
    effect = "Deny"
    actions = [
      "iam:CreateUser",
      "iam:CreateLoginProfile",
      "iam:CreateAccessKey",
    ]
    resources = ["*"]
    condition {
      test     = "StringNotEquals"
      variable = "aws:PrincipalTag/AllowIAMUsers"
      values   = ["true"]
    }
  }
}

This SCP blocks new IAM users from being created at all unless the calling principal has the AllowIAMUsers=true tag. That tag is held by exactly one role used only for emergency provisioning.

Map every long-lived credential before deleting anything. One forgotten cron job is enough to ruin a Friday.
Use SCPs to enforce — not just remind. Org-level deny is the only guarantee.
Set short session durations. 1h for prod, 8h for everything else. Forces the muscle memory of sso login.
Make break-glass painful but possible. Easy break-glass = the new normal.
Audit weekly for the first month. aws iam list-users, aws iam list-access-keys, CloudTrail review. Catch regressions before they accumulate.

Numbers After 6 Months #

IAM users with console access: 0 (was 14)
Static access keys for humans: 0 (was 28)
Average session duration: 3.4h (well under the 8h cap — people log in fresh per day)
Break-glass uses: 2
Time to revoke access for a departing employee: < 1 minute (suspend in Google Workspace)

The migration was 4 weeks of focused work. The reduction in mental overhead since has been worth every hour.

Zero Trust on AWS: Lessons From Implementing IAM Identity Center

Zero Trust on AWS: Lessons From Implementing IAM Identity Center

Why We Did This #

The Architecture We Landed On #

The 4-Week Rollout #

Week 1: Inventory and Mapping #

Week 2: Build Permission Sets in Terraform #

Week 3: Parallel Run #

Week 4: Disable and Delete #

Gotchas We Hit #

1. Service Accounts Need Their Own Path #

2. CLI Tools That Don't Speak SSO #

3. Break-Glass Access for Real Emergencies #

4. CloudTrail Cost #

The Policies We Enforce Now #

Numbers After 6 Months #

Stay Updated

Embedding Quality in RAG: How We Cut Hallucinations by 60%

systemd Timers vs Cron: When We Switched and What We Learned

More from Cloud

Kubernetes Resource Requests — Right-Sizing Without Guessing

Edge Databases for Low-Latency Apps — D1, Turso, Neon Serverless

Cross-Cloud Identity Federation — Patterns That Replaced Our Long-Lived Keys

Kubernetes Resource Requests — Right-Sizing Without Guessing

Edge Databases for Low-Latency Apps — D1, Turso, Neon Serverless

Cross-Cloud Identity Federation — Patterns That Replaced Our Long-Lived Keys

CDN Cache Invalidation — Strategies That Don't Break in Production

Terraform Module Versioning and Shared Registries

Supply Chain Security — SBOMs, Attestation, and What to Actually Verify

About Kiril Urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

Prompt Engineering Best Practices: Maximizing LLM Performance