We track the four DORA metrics plus a handful of others. The trade-off between what's measurable and what's meaningful, and how we use the numbers.
The DORA metrics (deployment frequency, lead time, change failure rate, MTTR) are widely cited and rarely well-implemented. We've tracked them for about three years across our engineering org. The metrics are useful when you use them right and misleading when you don't. This post is what we measure, how we use the numbers, and the metrics conversation we've had to have repeatedly.
For reference:
The original DORA research correlates these with org performance. Faster, more reliable shipping → better business outcomes. The relationship is real; the implementation matters.
Our metric set, with target ranges:
| Metric | Target | Current |
|---|---|---|
| Deployment frequency (per team) | Daily | Daily for 80% of teams |
| Lead time (commit to prod) | < 1 day | ~14 hours median |
| Change failure rate | < 15% | ~9% |
| MTTR | < 1 hour | ~45 min median |
| Code review time | < 1 day | ~6 hours median |
| Test suite duration | < 15 min | ~12 min |
| Production p99 latency | < 500ms | varies per service |
| Error budget burn rate | < 1.5x | varies |
The DORA four are headline; the others are leading indicators or operational signals.
Most metrics derive from existing systems:
Total system that produces metrics: ~3,000 lines of code + a Postgres table + a dashboard. Not glamorous, but reliable.
The metrics are diagnostics, not goals. Reading them:
Low deployment frequency might mean: heavyweight deploy process, fear of breaking things, large batch sizes, inefficient code review. Each has different fixes.
Long lead time might mean: slow CI, painful code review, environment setup overhead, manual approvals.
High change failure rate might mean: insufficient testing, lack of canary deploys, complex changes, cultural pressure to ship fast at expense of quality.
Long MTTR might mean: poor observability, missing runbooks, on-call doesn't know the system, complex rollback path.
When a metric trends wrong, the diagnostic is "what specifically is causing this?" Not "the metric is bad; ship faster."
The mistakes that ruin DORA metrics:
Don't use them as performance reviews. "Sara's team has lower deployment frequency than David's team" → Sara's team starts shipping smaller PRs to game the metric, quality drops. The metric becomes useless.
Don't set arbitrary targets across teams. Different teams have different shapes of work. A platform team that ships once a week might be perfectly healthy; a feature team that ships once a week probably isn't. Compare to baseline, not to other teams.
Don't optimize the metric directly. "We need to improve deployment frequency" → batches get smaller, but if the bottleneck was code review (not deploy mechanics), nothing has actually improved. Diagnose the bottleneck; fix that.
Don't ignore quality metrics. Deployment frequency without change failure rate is incomplete. Fast and broken is worse than slow and stable.
When someone (usually leadership) sees the metrics, the first reaction is often "let's improve [metric] by X% next quarter." This is the wrong framing.
The right framing: "We see [metric] is at [value]. What's blocking it from being better? Are those blockers worth removing?"
Sometimes the answer is: yes, these blockers are pure friction; let's remove them. We've cut lead time from 5 days to 14 hours over two years by removing specific blockers (slow CI, painful manual approval steps, etc.).
Sometimes the answer is: the blockers are there for a reason. A regulated payment service has slower lead time because of review requirements; that's correct, not a bug.
The metric tells you "where to look"; it doesn't tell you "what to fix."
We compute metrics per team and as an org-wide rollup.
Per-team is more useful diagnostically. Org-wide aggregates trends but can hide signal.
Per-team has its own pitfall: teams with very different work shapes (a research team, a customer-facing team, a platform team) shouldn't be compared on the same metrics with the same targets.
We have separate "tiers" of teams:
DORA is one framework; there are others:
SPACE (Satisfaction, Performance, Activity, Communication, Efficiency): broader, includes developer experience.
Engineer satisfaction survey (we run quarterly): captures things metrics can't (frustration, blockers, feeling of progress).
Code review quality: not just speed but rigor. We sample reviewed PRs occasionally to check that reviews aren't rubber-stamps.
We use DORA as the primary numerical headline because it's simple and well-understood. SPACE-style measurement and surveys add the qualitative side.
Examples of metric-driven improvements:
Lead time was 5 days. Investigation: code review averaged 2.5 days. Fix: dedicated review time blocks, automated reviewers for small PRs, escalation if a PR sat un-reviewed > 24 hours. Lead time dropped to ~14 hours.
Deployment frequency stalled. Investigation: deploys took 45 min and required manual approvals at 4 stages. Fix: trimmed approvals to 2 (one technical, one business for sensitive changes), parallelized deploy steps. Deploy frequency increased; team morale improved (fewer "waiting on a deploy" frustrations).
Change failure rate spiked one quarter. Investigation: a new architecture pattern caused a class of bugs we hadn't seen before. Fix: added integration tests for the pattern, training on the new failure modes. Change failure rate dropped back to baseline within 2 sprints.
MTTR was bimodal — most incidents resolved in 15 min, a few took 8+ hours. Investigation: the long ones lacked runbooks for specific subsystems. Fix: runbook gap analysis, dedicated work to fill gaps. Long-tail MTTR improved.
The metrics didn't fix anything by themselves. They surfaced where to look.
Things the dashboard misses:
Quality of decisions: a team can ship daily and ship the wrong thing. Deployment frequency doesn't capture business impact.
Tech debt accrual: a team can hit all velocity metrics while building unmaintainable code. Eventually catches up.
Engineer well-being: the metrics can be green while the team is burning out.
Innovation: experimental work and research often doesn't produce shippable code on a regular cadence. Metric-targeting can discourage exploration.
These are the reasons the metrics aren't a complete picture. They're a useful piece of the picture; not the whole.
Start with the four DORA metrics. They're well-understood and easy to compute.
Collect them for at least a quarter before acting. Trend matters more than a snapshot.
Use metrics for diagnosis, not as performance targets. "Where's the bottleneck?" not "improve this number."
Don't compare teams with different work shapes. Per-team baselines, not relative rankings.
Pair quantitative metrics with qualitative signals (surveys, retros). The numbers can hide problems people see directly.
Resist the "let's set a 20% improvement target" framing. Identify specific blockers; remove them; the metric improves as a side effect.
DevOps metrics are valuable when they point at things you can change. They're harmful when they become the goal itself. The discipline is in keeping the focus on what's slowing teams down — and trusting that fixing those things will move the metrics naturally.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Design for region failure. Active/passive and active/active, data replication, and failover testing.
Practical game day scenarios for CI/CD: broken rollbacks, permission issues, and slow feedback loops—and how we fixed them.
Explore more articles in this category
You always have known vulnerabilities. The question is how you triage, patch, and respond. The discipline we run after a few real incidents and a lot of routine work.
Three terms that get mixed up constantly. The actual differences, where each one sits in the request path, when you reach for which, and where the same tool plays all three roles.
Helm gives you a lot of rope. The patterns we used that backfired, the ones we replaced them with, and what to skip if you're starting today.