How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.

On this page

A Pragmatic Multi-Region Strategy for Small Teams

Multi-region can easily become a science project. This is what worked for a five-person platform team supporting a SaaS product.

Starting Point: Single Region, Shared VPC #

We began with everything in one AWS region: RDS, EKS, S3, and a shared VPC.

RTO/RPO were theoretical.
Failover docs existed, but no one had run them end-to-end.

Step 1: Read Replicas and S3 Replication #

Added a read replica of RDS in a second region.
Enabled S3 cross-region replication for critical buckets.
Agreed on an RPO of 15 minutes.

Step 2: Terraform Modules for Two Regions #

Instead of cloning the entire stack, we:

Created a region-aware module for shared resources.
Used workspaces to differentiate primary vs secondary.

```hcl module "vpc" { source = "./modules/vpc" region = var.region primary = var.is_primary } ```

Step 3: DNS and Runbooks #

Used Route53 failover records with health checks on a lightweight /healthz endpoint.
Wrote an explicit runbook:
- who can declare a disaster,
- which Terraform workspaces to apply,
- how to flip DNS and confirm.

We didn’t solve every theoretical edge case, but we can now lose a region and recover in under an hour with a plan the team has actually rehearsed.

A Pragmatic Multi-Region Strategy for Small Teams

A Pragmatic Multi-Region Strategy for Small Teams

Starting Point: Single Region, Shared VPC #

Step 1: Read Replicas and S3 Replication #

Step 2: Terraform Modules for Two Regions #

Step 3: DNS and Runbooks #

Stay Updated

Shell Scripting Best Practices: Writing Maintainable Scripts

Systemd Tricks We Use to Keep Services Boring

More from Cloud

Caching Patterns — Read-Through, Write-Through, Cache-Aside in Practice

Kubernetes Resource Requests — Right-Sizing Without Guessing

Edge Databases for Low-Latency Apps — D1, Turso, Neon Serverless

Caching Patterns — Read-Through, Write-Through, Cache-Aside in Practice

Kubernetes Resource Requests — Right-Sizing Without Guessing

Edge Databases for Low-Latency Apps — D1, Turso, Neon Serverless

Cross-Cloud Identity Federation — Patterns That Replaced Our Long-Lived Keys

Pipeline Observability — Why CI Failures Don't Trigger Alerts (And Should)

Terraform Module Versioning and Shared Registries

About Kiril Urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

A Pragmatic Multi-Region Strategy for Small Teams

Starting Point: Single Region, Shared VPC#

Step 1: Read Replicas and S3 Replication#

Step 2: Terraform Modules for Two Regions#

Step 3: DNS and Runbooks#

Stay Updated

Shell Scripting Best Practices: Writing Maintainable Scripts

Systemd Tricks We Use to Keep Services Boring

More from Cloud

Caching Patterns — Read-Through, Write-Through, Cache-Aside in Practice

Kubernetes Resource Requests — Right-Sizing Without Guessing

Edge Databases for Low-Latency Apps — D1, Turso, Neon Serverless

About Kiril Urbonas

You might have missed

GitOps with Argo CD: Best Practices for 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

Starting Point: Single Region, Shared VPC #

Step 1: Read Replicas and S3 Replication #

Step 2: Terraform Modules for Two Regions #

Step 3: DNS and Runbooks #