We test infrastructure code with three layers: validation, plan review, and integration tests. The setup that catches real bugs without slowing down PRs.

On this page

Infrastructure Testing: Three Layers That Work

Testing infrastructure-as-code is harder than testing application code. The "tests" run against real cloud APIs (slow, expensive), the failure modes are different (resource X exists but is misconfigured), and traditional unit tests don't apply directly. After iterating on testing for our Terraform codebase, we landed on three layers that together catch most bugs without making CI unbearably slow. This is what we run.

Why infra testing is hard #

Some specific challenges:

Tests cost money. Spinning up real AWS resources costs real money per test run. A test suite that runs on every PR could cost hundreds of dollars per day.

Tests are slow. Provisioning real infrastructure takes minutes. A test that creates a VPC, subnets, and an instance isn't returning in 30 seconds.

Tests are flaky. Cloud APIs have transient errors, eventual consistency, and rate limits. Tests that pass 99 times in a row will fail the 100th time for reasons outside your control.

Failure modes are weird. "Resource was created but tagged wrong" is a real bug class that traditional tests don't cover well.

The three-layer approach addresses these by using different tests at different points in the pipeline.

Layer 1: validation (fast, runs on every PR)#

The cheapest tests. They check syntax, structure, and obvious bugs without provisioning anything:

terraform validate: parses the config, catches syntax errors and obvious issues like undefined variables.

tflint: catches Terraform-specific antipatterns. Deprecated arguments, unused variables, naming convention violations.

tfsec or checkov: security and compliance checks. "S3 bucket without encryption," "security group with 0.0.0.0/0 ingress on a non-web port," etc.

Custom OPA / Sentinel policies: organization-specific rules. "Every resource must have specific tags," "RDS must use KMS encryption," etc.

These run in seconds. CI fails the PR if any check fails.

In our pipeline:

yaml.yaml

- name: terraform validate
  run: terraform validate

- name: tflint
  run: tflint --recursive

- name: tfsec
  run: tfsec . --soft-fail-warnings

- name: opa policy check
  run: opa eval -i tfplan.json -d policies/ "data.policies.deny"

These four steps catch ~60-70% of bugs that would otherwise slip through. They're free to run, fast, and reliable.

Layer 2: plan review (medium speed, runs on PR)#

For each PR, we run terraform plan against the actual environment (without applying):

yaml.yaml

- name: terraform plan
  run: |
    terraform plan -out=plan.tfplan
    terraform show -json plan.tfplan > plan.json

- name: post plan to PR
  run: ./post-plan-comment.sh plan.json

The plan output gets posted as a comment on the PR. Reviewers see what will change before approving.

Why this is valuable:

Plans show actual diff, not just "code changed"
Reviewers can spot unintended changes (a new resource being created they didn't expect)
Catches changes that look small in code but have large effect (e.g., a parameter change that requires resource replacement)

We have automated plan analysis that flags risky operations:

Resource destructions (any "destroy" operation)
Resource replacements (any "destroy and recreate")
Changes to security-relevant resources (IAM, security groups, KMS keys)

The PR comment highlights these.

Layer 3: integration tests (slow, runs less often)#

For our most-used Terraform modules, we have real integration tests. They:

Provision the module in a sandbox AWS account
Assert on the actual created resources
Tear down

We use Terratest (Go library) for this:

go.go

func TestVPCModule(t *testing.T) {
  options := &terraform.Options{
    TerraformDir: "../examples/basic",
    Vars: map[string]interface{}{
      "vpc_cidr": "10.99.0.0/16",
    },
  }
  defer terraform.Destroy(t, options)
  terraform.InitAndApply(t, options)
  
  vpcID := terraform.Output(t, options, "vpc_id")
  vpc := aws.GetVpcById(t, vpcID, "us-east-1")
  
  assert.Equal(t, "10.99.0.0/16", vpc.CidrBlock)
  assert.Len(t, vpc.Subnets, 6)  // 3 public + 3 private
  // ... more assertions
}

Each test takes ~5-15 minutes. Costs ~$0.50-2 per run.

We don't run these on every PR. We run them:

On PRs that touch the module being tested
Nightly across all modules (catches regressions from upstream provider changes)
Before releasing a new version of a module

For modules with high reuse, integration tests are worth the cost. For one-off Terraform code, the medium-tier checks are sufficient.

What to test in integration tests #

The assertions that pay off:

Resources exist and have expected attributes. "VPC exists with the right CIDR," "instance has the right tags."

Network connectivity works. "From the public subnet, can I curl 169.254.169.254?" "From the private subnet, can I reach the database?"

IAM permissions are correct. Assume the role; try operations; assert pass/fail.

Lifecycle: tear down works. Some misconfigurations only show up at destroy time.

What we don't test (because it's the cloud's job, not ours):

"Does AWS create the resource correctly?" — we trust the provider.
"Does the resource respond to API calls?" — same.

We test our composition and configuration, not AWS's correctness.

Sandbox account discipline #

Integration tests need their own AWS account. Why:

Isolation: tests can't accidentally affect production
Easy cleanup: an account dedicated to tests can be aggressive about deletion
Cost attribution: tests' AWS spend is visible separately

Our sandbox account has:

A nightly job that finds and deletes any resources older than 24 hours (catches tests that didn't clean up)
Service Control Policies preventing high-cost resources (no r5.24xlarge instances, no databases above small sizes)
Budget alerts at $500/month (haven't hit it)

What we DON'T do #

A few testing approaches we've abandoned:

Mocking cloud APIs. Tools like LocalStack mock AWS APIs. Useful for local development, not great for testing — the mocks have their own quirks that don't match real AWS. We use real AWS in sandbox accounts instead.

Testing every module. Some modules are too small to justify integration tests (e.g., a wrapper around a single S3 bucket). Validation + plan review is enough.

Testing every PR with full integration. Too slow, too expensive. Focused integration tests on the modules being changed.

Snapshot testing of plan output. Plan output changes frequently for benign reasons (provider version updates, etc.). False positives hurt more than they help.

What we caught with this setup #

Real bugs the testing layers caught:

A typo that would have destroyed and recreated a database. Layer 2 (plan review) flagged the destructive change. Reviewer caught it before merge.

A new variable's default value caused a security group to be open to the internet. Layer 1 (tfsec) flagged the SG rule on the PR. Fixed before merge.

A module change broke the IAM role configuration in subtle ways. Layer 3 (integration test) caught it — the test that tries to use the role failed.

A provider version bump introduced different default tagging behavior. Layer 3 nightly run caught the regression. Pinned to the previous version while we updated the module.

Without these layers, each of these would have shipped to staging or production before being caught. The cost of catching in CI is much lower.

What's still hard #

Things our testing doesn't cover well:

Cross-state interactions. Module A's state interacts with Module B's state via remote state references. Integration tests run modules in isolation; they don't catch cross-state issues. We rely on staging environment testing for this.

Long-tail provider quirks. A specific combination of options that the provider handles weirdly. Hard to test for these proactively; we add tests as we discover them.

Behaviors that depend on cloud account state. "The account has hit a service quota; the next resource creation will fail" — hard to test until it happens in production.

Drift after deployment. A resource changed via the console after Terraform created it. Tests don't catch this; drift detection (a separate tool / scheduled task) does.

Cost reality #

Per-month testing infrastructure cost:

CI runner time: included in our overall CI budget
Sandbox AWS account: ~$80-150/month average, ~$300 in busy months
Engineer time on test maintenance: ~2 hours/week

Total: ~$200-400/month + engineer time. Compared to the cost of one production incident from a missed bug, the math is clearly favorable.

What I'd tell a team starting #

Start with Layer 1. Validation, linting, security scanning. Cheap and high-leverage.

Layer 2 (plan review) on every PR. The "what's actually changing" view catches a class of bugs nothing else does.

Layer 3 (integration tests) for shared modules. The infrastructure that many services use is worth the investment.

Don't try to test everything. Per-PR full integration tests are too slow and expensive. Focus.

Sandbox account from day one. Don't run integration tests in production-adjacent accounts.

Keep tests stable. Flaky infra tests train people to ignore failures. If a test is flaky, fix it or delete it; don't leave it.

Infrastructure testing is a different discipline from application testing. The patterns are different, the costs are different, the failure modes are different. The three-layer approach (validation → plan → integration) covers the bulk of what you need without each layer trying to do everything. The teams that struggle with infra testing are usually missing one of the layers (no plan review, no integration tests, etc.) and getting bitten by the class of bugs that layer would catch.

Infrastructure Testing Strategies: Validating Your IaC

Infrastructure Testing: Three Layers That Work

Why infra testing is hard #

Layer 1: validation (fast, runs on every PR)#

Layer 2: plan review (medium speed, runs on PR)#

Layer 3: integration tests (slow, runs less often)#

What to test in integration tests #

Sandbox account discipline #

What we DON'T do #

What we caught with this setup #

What's still hard #

Cost reality #

What I'd tell a team starting #

Stay Updated

What We Learned Running Weekly Game Days on Our CI/CD Pipeline

Operational Checklist: Kubernetes Secrets and External Vault Integration

More from Infrastructure

Backstage Software Catalog: Getting Adoption Past the Demo

Terraform Import at Scale: Bringing Legacy Infra Under Code

Zero-Downtime Postgres Migrations: Expand-Contract in Practice

Backstage Software Catalog: Getting Adoption Past the Demo

Terraform Import at Scale: Bringing Legacy Infra Under Code

Zero-Downtime Postgres Migrations: Expand-Contract in Practice

Postgres Read Replicas: Routing Reads Without Stale-Data Bugs

External Secrets Operator: One Secrets Workflow Across Clouds

AWS Graviton Migration: What Broke and What We Saved

You might have missed

GitOps with Argo CD: Best Practices for 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

Process Management and Monitoring in Linux

About Kiril Urbonas