We ran Istio for a year, then switched to Linkerd. Both can do the job. The decision came down to operational fit, not features.
About three years ago we wanted mTLS between services and per-route observability. We picked Istio. After roughly a year of running it in production, we migrated to Linkerd. Both meshes work. The migration wasn't because Istio is "bad" — it was a fit problem. This post is the comparison from someone who's run both in anger.
Listing what we wanted helps because mesh decisions are often driven by features people don't end up using:
What we did NOT want:
This list determined the comparison criteria.
We ran Istio 1.10-ish for about a year. We kept it on roughly the latest minor version.
What worked:
STRICT mode, everything inter-service was encrypted with rotating certs.What didn't:
The deal-breaker was a slow-burn problem: every quarter, an Istio upgrade or config change introduced a regression somewhere. We spent 1-2 engineer-weeks per quarter on Istio operational toil. That added up.
We migrated to Linkerd 2.x. Took about 2 months including testing and gradual rollout.
What worked:
linkerd viz tap, linkerd diagnostics) give clear, focused information.What didn't:
Same workload (all web services in one cluster, ~800 pods, ~40 services):
| Metric | Istio | Linkerd |
|---|---|---|
| Sidecar memory per pod | ~80MB | ~12MB |
| Sidecar CPU per pod (baseline) | ~50m | ~10m |
| Total cluster overhead | ~64GB RAM, ~40 cores | ~10GB RAM, ~8 cores |
| p50 service-to-service latency overhead | ~1.5ms | ~0.8ms |
| p99 latency overhead | ~6ms | ~3ms |
| Ops engineer time per quarter | ~80 hours | ~15 hours |
The latency numbers are within margin of error for most services but real for high-throughput ones. The ops time difference is the most material.
The two meshes can't run in the same pod. Migration is per-namespace:
We did this over 8 weeks, namespace by namespace, with rollback ready. The migration tool was just kubectl and our standard deploy pipeline. We didn't try to do a "shadow traffic" cutover; we just moved one service at a time.
The hardest part was reviewing every Istio CRD we'd written and translating to Linkerd equivalents (or determining the equivalent didn't exist and we had to do it differently). We had ~30 VirtualServices and DestinationRules. Most translated to "nothing — Linkerd handles this by default." A few translated to ServiceProfiles. One had to be reimplemented at the application layer.
I want to be clear: Istio is the right pick for some teams. Specifically:
If any of those apply, Istio's feature ceiling is higher than Linkerd's. The features are real and useful for the teams that need them.
For everyone else (probably most teams):
Linkerd is simpler, lighter, and has a smaller surface area for things to go wrong. For our team — and for many teams that "just want a mesh" — it's a better fit.
Did mTLS actually deliver value?
Yes, but less dramatic than the marketing implies. Most of our value came from the observability and per-route metrics, not from "we now encrypt our internal traffic." Internal traffic was already on a private network. mTLS adds defense-in-depth and identity-based authorization (we use it to gate sensitive services to only specific callers). Both are useful. Neither was the killer feature for us.
Does mesh latency overhead matter?
For most workloads, no. ~1ms of added latency is invisible inside a request that takes 50ms. For latency-sensitive workloads (real-time trading, high-frequency RPCs), it matters and you'd want to benchmark. We have one service that's sensitive enough that we excluded it from the mesh.
What about the eBPF / sidecar-less direction?
Cilium has been pushing toward sidecar-less mesh via eBPF. Istio has Ambient Mode (also reduces sidecar dependency). These are interesting but not yet the default for either project. For now, sidecars are the path. We'll re-evaluate when sidecar-less is more mature.
Should you adopt a service mesh at all?
Not always. If you have <10 services, the operational cost of running a mesh probably exceeds the value. mTLS can be done with cert-manager + app-side TLS. Observability can be done with OpenTelemetry instrumentation. Mesh becomes worth the cost when you have many services with consistent cross-cutting concerns.
We hit the threshold around 15-20 services. Below that, we'd skip the mesh.
For a new cluster with no mesh:
Feature lists drive a lot of mesh evaluation but operational fit is what determines whether the mesh is a net positive. The mesh that's technically more capable but consumes 4x the engineer-time is usually not the right pick.
We're watching Linkerd's policy framework (Authorization Policy CRDs) for fine-grained access control between services. We currently use a mix of Linkerd auth + NetworkPolicies; consolidating to one would simplify.
We're also watching Gateway API as the standard for ingress + traffic management, which both meshes are converging toward. Mesh portability via standard CRDs is appealing for the long term.
But the day-to-day reality is: mesh runs, we don't think about it most weeks, traffic between services is encrypted and observable. That's the outcome we wanted three years ago. We got there via a route that involved running two different meshes. The destination matters more than the path.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
A field report from rolling out retrieval-augmented generation in production, including cache bugs, bad embeddings, and how we fixed them.
Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.
Explore more articles in this category
You always have known vulnerabilities. The question is how you triage, patch, and respond. The discipline we run after a few real incidents and a lot of routine work.
Three terms that get mixed up constantly. The actual differences, where each one sits in the request path, when you reach for which, and where the same tool plays all three roles.
Helm gives you a lot of rope. The patterns we used that backfired, the ones we replaced them with, and what to skip if you're starting today.