We expanded from one Kubernetes cluster to four across two regions. The traffic-routing layer was the hardest piece. Here's what we tried, what worked, and what we'd do again.
A year ago we ran one Kubernetes cluster — a comfortable mid-size EKS in us-east-1, hosting our entire production stack. Today we run four: two in us-east-1 (one for general workloads, one isolated for our heaviest data-intensive services), and two in us-west-2 (the same split, mirrored). The motivation was a mix of compliance, regional latency, and a hard separation we needed for one specific workload.
The compute side of going multi-cluster is well-trodden territory. The hard part, the part that took most of our planning, was traffic routing. This post is about that layer specifically — what we evaluated, what we shipped, what surprised us.
A request enters the system. It's destined for some service. We need to:
For a single cluster, "route to cluster" is trivial. With multiple clusters, you've added a layer of decision-making. Where that decision lives — DNS, an external load balancer, a service mesh, an application — defines your operational model.
Simplest model: each cluster has its own ingress and its own subdomain. Route53 with weighted records or latency-based routing decides which cluster the user hits.
Pros: zero new infrastructure. Easy to reason about. Survives a cluster failure cleanly (remove records, traffic shifts).
Cons: TTL granularity. Even at 30s TTL (we wouldn't want lower), failover takes 30s+ to propagate. Per-request routing decisions aren't possible — DNS is sticky per resolution. Cross-cluster failover during a partial degradation is awkward.
We didn't choose this for our user-facing traffic but kept it for some internal admin-tier traffic where the simplicity wins.
Put an L7 load balancer (AWS ALB or GCP HTTP LB) in front of all clusters, with the clusters as backends. The load balancer decides which cluster gets the request based on health checks and configured rules.
For us, this meant a single ALB in us-east-1 with target groups pointing at NLBs in front of each cluster. Cross-region targets are possible but trickier — typically requires global accelerators or a separate routing layer.
Pros: per-request routing. Sub-second failover on cluster health changes. Familiar AWS-native model.
Cons: ALB doesn't natively span regions. We'd need either Global Accelerator (works but adds cost and a layer) or duplicate ALBs with DNS as the cross-region fallback (back to DNS).
We use this model for routing within a region (one ALB → multi-cluster backends in same region). Cross-region failover sits at DNS.
A service mesh that spans clusters can route at the application layer with full awareness of both clusters. Istio multi-cluster, Linkerd multi-cluster — both work.
Pros: per-request decisions including intelligent routing (e.g., "route to nearest cluster unless it's saturated, then spill to the next"). Automatic mTLS across clusters. Failover and circuit breaking baked in.
Cons: enormous operational complexity. Cross-cluster control plane sync. Certificate management across clusters. The blast radius of a mesh misconfiguration is the entire mesh.
We trialed Istio multi-cluster for two months. We didn't stick with it. The complexity was too high for our team size, and the operations of "the mesh itself" became its own product. Larger organizations make it work; for us it was too much.
A pragmatic mix:
[ Route53 latency-based ]
│
┌────────────────────┴────────────────────┐
▼ ▼
[ ALB us-east-1 ] [ ALB us-west-2 ]
│ │ │ │
▼ ▼ ▼ ▼
[cluster 1] [cluster 2] [cluster 3] [cluster 4]
(general) (data) (general) (data)
No service mesh spanning clusters. Some clusters internally use Istio for east-west service-to-service traffic, but the cross-cluster layer is plain ALB + DNS.
A user hits api.example.com:
Failover scenarios:
Most services are cluster-local — they only talk to other services in the same cluster. A few need to talk across clusters (e.g., a data-export service in cluster A needs to read from a database that lives in cluster B).
We use external-name Kubernetes Services that resolve to the destination cluster's external load balancer:
apiVersion: v1
kind: Service
metadata:
name: shared-database
namespace: app
spec:
type: ExternalName
externalName: db.cluster-b.internal.example.com
ports:
- port: 5432
The application connects to shared-database.app.svc.cluster.local, which resolves via ExternalName to the destination cluster's address. Network ACLs allow only specific source clusters to reach the destination port.
This is operationally simple. It does mean cross-cluster traffic crosses the cluster boundary (and the AZ/region boundary) at the network level, which has cost and latency implications. We minimize these connections deliberately — most workloads run within a single cluster.
Health check granularity. The ALB's target health check is per-target, not per-listener-rule. We had a case where one rule's backend was failing but the same backend was fine for another rule. The health check marked the whole target unhealthy. Workaround: separate target groups per use case, accept the duplication.
Cross-region replication lag. Our database is replicated us-east-1 → us-west-2 with async replication. During a region failover from east to west, requests can briefly see stale data from the replica. Customers occasionally noticed (our app surfaces "your last action might take a moment to reflect"). We added a banner during failover events.
Cost from inter-AZ data transfer. When workloads in cluster 1 talked to workloads in cluster 2 (same region but different AZs), AWS charged inter-AZ data transfer. We didn't notice for a month, then the bill caught us. Solution: keep noisy inter-service traffic intra-cluster; use cluster-2 for genuinely separate workloads.
Active-active database. We considered. The complexity of multi-master conflict resolution (or a CRDT layer) wasn't worth it for our use case. We use active-passive: writes to the primary region, replication to the secondary. Failover involves promoting the replica.
Cluster federation (KCP, Karmada, etc.). Promising but immature for our needs. The operational story for "which cluster does this resource actually live in" was unclear. We may revisit.
Mesh-managed cross-cluster traffic. Istio multi-cluster, as mentioned. Theoretically powerful; in practice, more complexity than we needed for the routing requirements we had.
If I were doing this for a new team:
The temptation with multi-cluster is to reach for the most sophisticated tool first (a mesh, federation, etc.). The simpler tools are 80% of the value at 10% of the operational cost.
If you're considering this kind of architecture, look at:
Routing across clusters is one of those problems where the obviously-elegant solution (a service mesh) is operationally heavy, and the obviously-crude solution (DNS) is operationally light but loses fidelity. The interesting answer for most teams is something boring in the middle: regional load balancers with health checks, DNS for cross-region, simple service discovery patterns within. Boring is what scales.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
We had Datadog for app metrics, Loki for logs, and zero useful insight into what our LLM service was actually doing. Here's the observability stack we built specifically for model serving.
We mapped every byte that ends up in our production containers. The map showed three places trust was implicit. Each became a control.
Explore more articles in this category
There are two hard problems in computer science." We've worked on the cache-invalidation one for a while. The patterns that hold up at scale and the ones that look clean and aren't.
We use Step Functions for batch processing, document ingestion, and a few agentic workflows. The patterns that work, the limits we hit, and where we'd reach for something else.
After two years of running Karpenter on production EKS clusters, the NodePool patterns that survived, the ones we replaced, and the tuning that matters.