We test infrastructure code with three layers: validation, plan review, and integration tests. The setup that catches real bugs without slowing down PRs.
Testing infrastructure-as-code is harder than testing application code. The "tests" run against real cloud APIs (slow, expensive), the failure modes are different (resource X exists but is misconfigured), and traditional unit tests don't apply directly. After iterating on testing for our Terraform codebase, we landed on three layers that together catch most bugs without making CI unbearably slow. This is what we run.
Some specific challenges:
Tests cost money. Spinning up real AWS resources costs real money per test run. A test suite that runs on every PR could cost hundreds of dollars per day.
Tests are slow. Provisioning real infrastructure takes minutes. A test that creates a VPC, subnets, and an instance isn't returning in 30 seconds.
Tests are flaky. Cloud APIs have transient errors, eventual consistency, and rate limits. Tests that pass 99 times in a row will fail the 100th time for reasons outside your control.
Failure modes are weird. "Resource was created but tagged wrong" is a real bug class that traditional tests don't cover well.
The three-layer approach addresses these by using different tests at different points in the pipeline.
The cheapest tests. They check syntax, structure, and obvious bugs without provisioning anything:
terraform validate: parses the config, catches syntax errors and obvious issues like undefined variables.
tflint: catches Terraform-specific antipatterns. Deprecated arguments, unused variables, naming convention violations.
tfsec or checkov: security and compliance checks. "S3 bucket without encryption," "security group with 0.0.0.0/0 ingress on a non-web port," etc.
Custom OPA / Sentinel policies: organization-specific rules. "Every resource must have specific tags," "RDS must use KMS encryption," etc.
These run in seconds. CI fails the PR if any check fails.
In our pipeline:
- name: terraform validate
run: terraform validate
- name: tflint
run: tflint --recursive
- name: tfsec
run: tfsec . --soft-fail-warnings
- name: opa policy check
run: opa eval -i tfplan.json -d policies/ "data.policies.deny"
These four steps catch ~60-70% of bugs that would otherwise slip through. They're free to run, fast, and reliable.
For each PR, we run terraform plan against the actual environment (without applying):
- name: terraform plan
run: |
terraform plan -out=plan.tfplan
terraform show -json plan.tfplan > plan.json
- name: post plan to PR
run: ./post-plan-comment.sh plan.json
The plan output gets posted as a comment on the PR. Reviewers see what will change before approving.
Why this is valuable:
We have automated plan analysis that flags risky operations:
The PR comment highlights these.
For our most-used Terraform modules, we have real integration tests. They:
We use Terratest (Go library) for this:
func TestVPCModule(t *testing.T) {
options := &terraform.Options{
TerraformDir: "../examples/basic",
Vars: map[string]interface{}{
"vpc_cidr": "10.99.0.0/16",
},
}
defer terraform.Destroy(t, options)
terraform.InitAndApply(t, options)
vpcID := terraform.Output(t, options, "vpc_id")
vpc := aws.GetVpcById(t, vpcID, "us-east-1")
assert.Equal(t, "10.99.0.0/16", vpc.CidrBlock)
assert.Len(t, vpc.Subnets, 6) // 3 public + 3 private
// ... more assertions
}
Each test takes ~5-15 minutes. Costs ~$0.50-2 per run.
We don't run these on every PR. We run them:
For modules with high reuse, integration tests are worth the cost. For one-off Terraform code, the medium-tier checks are sufficient.
The assertions that pay off:
Resources exist and have expected attributes. "VPC exists with the right CIDR," "instance has the right tags."
Network connectivity works. "From the public subnet, can I curl 169.254.169.254?" "From the private subnet, can I reach the database?"
IAM permissions are correct. Assume the role; try operations; assert pass/fail.
Lifecycle: tear down works. Some misconfigurations only show up at destroy time.
What we don't test (because it's the cloud's job, not ours):
We test our composition and configuration, not AWS's correctness.
Integration tests need their own AWS account. Why:
Our sandbox account has:
r5.24xlarge instances, no databases above small sizes)A few testing approaches we've abandoned:
Mocking cloud APIs. Tools like LocalStack mock AWS APIs. Useful for local development, not great for testing — the mocks have their own quirks that don't match real AWS. We use real AWS in sandbox accounts instead.
Testing every module. Some modules are too small to justify integration tests (e.g., a wrapper around a single S3 bucket). Validation + plan review is enough.
Testing every PR with full integration. Too slow, too expensive. Focused integration tests on the modules being changed.
Snapshot testing of plan output. Plan output changes frequently for benign reasons (provider version updates, etc.). False positives hurt more than they help.
Real bugs the testing layers caught:
A typo that would have destroyed and recreated a database. Layer 2 (plan review) flagged the destructive change. Reviewer caught it before merge.
A new variable's default value caused a security group to be open to the internet. Layer 1 (tfsec) flagged the SG rule on the PR. Fixed before merge.
A module change broke the IAM role configuration in subtle ways. Layer 3 (integration test) caught it — the test that tries to use the role failed.
A provider version bump introduced different default tagging behavior. Layer 3 nightly run caught the regression. Pinned to the previous version while we updated the module.
Without these layers, each of these would have shipped to staging or production before being caught. The cost of catching in CI is much lower.
Things our testing doesn't cover well:
Cross-state interactions. Module A's state interacts with Module B's state via remote state references. Integration tests run modules in isolation; they don't catch cross-state issues. We rely on staging environment testing for this.
Long-tail provider quirks. A specific combination of options that the provider handles weirdly. Hard to test for these proactively; we add tests as we discover them.
Behaviors that depend on cloud account state. "The account has hit a service quota; the next resource creation will fail" — hard to test until it happens in production.
Drift after deployment. A resource changed via the console after Terraform created it. Tests don't catch this; drift detection (a separate tool / scheduled task) does.
Per-month testing infrastructure cost:
Total: ~$200-400/month + engineer time. Compared to the cost of one production incident from a missed bug, the math is clearly favorable.
Start with Layer 1. Validation, linting, security scanning. Cheap and high-leverage.
Layer 2 (plan review) on every PR. The "what's actually changing" view catches a class of bugs nothing else does.
Layer 3 (integration tests) for shared modules. The infrastructure that many services use is worth the investment.
Don't try to test everything. Per-PR full integration tests are too slow and expensive. Focus.
Sandbox account from day one. Don't run integration tests in production-adjacent accounts.
Keep tests stable. Flaky infra tests train people to ignore failures. If a test is flaky, fix it or delete it; don't leave it.
Infrastructure testing is a different discipline from application testing. The patterns are different, the costs are different, the failure modes are different. The three-layer approach (validation → plan → integration) covers the bulk of what you need without each layer trying to do everything. The teams that struggle with infra testing are usually missing one of the layers (no plan review, no integration tests, etc.) and getting bitten by the class of bugs that layer would catch.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.
K8s Secrets are barely encrypted. We moved every secret to Vault with the Vault Agent injector and never went back. The setup checklist.
Explore more articles in this category
Backups are easy. Restores are hard. The quarterly drill we run, what's failed during it, and the discipline that makes "we have backups" actually mean something.
Replication is the foundation of database HA. What we monitor, how we practice failover, and the gotchas that show up only when you actually fail over.
Why Postgres connection limits bite at unexpected times, the pooling layer we put in front, and the pool-mode tradeoffs we learned the hard way.