One Terraform state file per environment sounds obvious until you watch a dev plan touch a prod resource. Here's how we actually isolate state and the mistakes we made getting there.

On this page

Terraform State Isolation by Environment

About two years ago I worked on a team where someone ran terraform apply against staging and accidentally destroyed an RDS instance in production. The state files for both environments were in the same S3 bucket, named identically except for a directory prefix. The engineer had cd'd into the wrong folder, typed apply, and answered yes to a destroy plan that referenced what they thought was staging.

Nothing apocalyptic happened — we restored from a snapshot in 40 minutes — but the close call drove a project to lock down environment isolation properly. This post is what stuck.

The cheap version that almost works #

The pattern most teams start with: one S3 backend, one DynamoDB lock table, separate keys per environment.

hcl.hcl

# environments/dev/main.tf
terraform {
  backend "s3" {
    bucket         = "my-tfstate"
    key            = "environments/dev/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "my-tfstate-locks"
  }
}

# environments/prod/main.tf — same bucket, different key
terraform {
  backend "s3" {
    bucket         = "my-tfstate"
    key            = "environments/prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "my-tfstate-locks"
  }
}

This works in the sense that terraform apply in dev/ touches the dev state file and apply in prod/ touches the prod one. The directories are separate. The locks are separate. Most workflows are fine.

What it doesn't protect against: a human with credentials valid for both environments running the wrong command in the wrong directory. The accident I described above happened with this exact setup.

The version we actually run #

Three changes to the cheap version, in order of importance:

1. Separate AWS accounts per environment #

This is the one that did the most heavy lifting. Production resources live in a production AWS account. Staging in a staging account. Dev in a dev account.

The Terraform code provider blocks have explicit account_id allowlists:

hcl.hcl

provider "aws" {
  region = "us-east-1"
  allowed_account_ids = ["123456789012"]  # production only
}

If anyone tries to apply this configuration with credentials for a different account, Terraform refuses before doing anything. This single check would have prevented the original incident: the engineer had no way to even authenticate to prod from the staging directory.

2. Separate state buckets per account #

Each account hosts its own Terraform state bucket. Staging's bucket lives in the staging account, prod's in the prod account. The bucket policies allow access only from within the same account, full stop.

The directory structure looks like this:

code

infra/
├── modules/
│   ├── networking/
│   ├── eks/
│   └── rds/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── backend.tf       # points at dev account's bucket
│   │   └── provider.tf      # allowed_account_ids = dev account
│   ├── staging/
│   │   ├── main.tf
│   │   ├── backend.tf
│   │   └── provider.tf
│   └── prod/
│       ├── main.tf
│       ├── backend.tf
│       └── provider.tf

The shared modules in modules/ are environment-agnostic. The environment-specific composition lives under each environments/{env}/.

3. Different IAM roles for each environment, assumed via SSO #

Engineers don't have static AWS keys. They authenticate to our SSO provider, which lets them assume a role in the appropriate account. The role for dev allows broad access; the role for staging is more restricted; the role for prod is restricted further and limited to a small group.

bash.bash

# what an engineer types day-to-day
aws sso login
aws --profile prod s3 ls           # this only works for the production team
cd infra/environments/prod
terraform plan                      # uses the assumed prod role

The role for prod requires MFA on assumption, with a 1-hour session lifetime. Plan-only operations are allowed broadly; apply operations require an additional explicit approval policy that Terraform Cloud (or in our case, a homemade equivalent) enforces.

What broke when we set this up #

A few real issues during the migration:

Cross-environment data references stopped working. Some of our Terraform code had data "aws_ssm_parameter" lookups that crossed account boundaries — staging reading a value from a prod-account parameter store. When we split accounts, those lookups failed. The fix was to copy the values into each environment's parameter store, or use terraform_remote_state with explicit cross-account read permissions where unavoidable.

Karpenter / Crossplane-style "operators that create infrastructure" got complicated. We had a controller running in dev that was supposed to provision dev-account resources. Splitting accounts meant the controller needed cross-account IAM roles. Fixable, but not free.

Module-internal aws_caller_identity lookups assumed a single account. A few modules had data "aws_caller_identity" "current" and used the result to construct ARNs. When the modules were used in different accounts, the ARNs were correctly different — usually fine, occasionally not. We audited these and made the assumptions explicit (passing account IDs as variables instead of inferring them).

Common mistakes we still see #

People hitting the team with these patterns, asking for help:

"Can I use a single workspace for dev/staging/prod?" Workspaces (terraform workspace) share a backend. If you're using terraform workspace select prod, your prod state lives in the same place as your dev state. We don't allow this for environments. Workspaces are fine for ephemeral previews; they're not isolation.
"Can I just use directories like before but skip the accounts?" Sometimes, if you don't have access to AWS Organizations and can only have one account. In that case, IAM roles with strict policies per environment are the next-best-thing. It's strictly weaker than account separation; document it.
"Do I really need a separate state bucket per account?" Yes. The bucket has to live somewhere; if it lives in a "shared" account that all engineers can access, you've recreated the problem. Bucket per account, with no cross-account read by default.

What we don't bother with #

Encrypting state at the state-file level (beyond S3's default encryption). Sensitive values in state are a problem we addressed by NOT putting sensitive values in state — passwords come from Vault at runtime, not from random_password resources stored in state.
Versioning state files via additional tooling. S3 versioning is on. We've never needed point-in-time state recovery; the few times we've corrupted state, the fix was a terraform import cycle, not a rewind.

Day-2 cost: not zero #

Multi-account adds friction. Logging is split across accounts (we use a centralised CloudTrail aggregation in a shared "audit" account). Cost reports are per account (Cost Explorer with a payer account). Some monitoring tools need to be configured per account.

The friction is real. Worth it. Once a year there's an incident in a peer org where someone trashes prod from a dev terminal, and we get to say "that can't happen here." That's worth quite a lot of friction.

What I'd tell a team starting out #

If you can use AWS Organizations and separate accounts: do it from day one. Refactoring later is genuinely painful (we did it; it took a quarter).

If you can't: at minimum, separate state buckets, separate IAM roles per environment, the allowed_account_ids provider check on every config, and a written rule that "apply" against prod requires a second human to be on a call. The rule is the lever; the tooling enforces it.

The only Terraform structure I've seen reliably prevent the kind of incident that started this story is account-level isolation. Anything weaker has worked sometimes and failed sometimes. Account isolation has, for us, always worked.

Best Practices: Terraform State Isolation by Environment

Terraform State Isolation by Environment

The cheap version that almost works #

The version we actually run #

1. Separate AWS accounts per environment #

2. Separate state buckets per account #

3. Different IAM roles for each environment, assumed via SSO #

What broke when we set this up #

Common mistakes we still see #

What we don't bother with #

Day-2 cost: not zero #

What I'd tell a team starting out #

Stay Updated

A Pragmatic Multi-Region Strategy for Small Teams

Systemd Tricks We Use to Keep Services Boring

More from Infrastructure

Backstage Software Catalog: Getting Adoption Past the Demo

Terraform Import at Scale: Bringing Legacy Infra Under Code

Zero-Downtime Postgres Migrations: Expand-Contract in Practice

Backstage Software Catalog: Getting Adoption Past the Demo

Terraform Import at Scale: Bringing Legacy Infra Under Code

Zero-Downtime Postgres Migrations: Expand-Contract in Practice

Postgres Read Replicas: Routing Reads Without Stale-Data Bugs

External Secrets Operator: One Secrets Workflow Across Clouds

AWS Graviton Migration: What Broke and What We Saved

About Kiril Urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025