We launched Backstage in October. Six months in, 80% of services are catalogued, on-boarding takes a third of the time, and we mostly know what owns what.
Six months ago we deployed Backstage. Today: 127 of 158 services in the catalog, on-boarding time for new engineers down from 3 days to 1, and we mostly know what owns what. Below is the rollout plan, the parts that worked, and the ones we'd skip.
Three separate incidents in two months had the same root cause: nobody knew who owned the service that was breaking. Slack threads chained through six teams before someone said "actually, that's been ours since Karen left." We needed a system of record for ownership, dependencies, and basic operational data.
We considered:
Backstage won.
Standard Backstage Helm install on EKS, Postgres for backend, GitHub auth, GitHub provider for entity discovery.
catalog:
providers:
github:
providerId:
organization: 'kirilurbonas'
catalogPath: '/catalog-info.yaml'
filters:
branch: 'main'
repository: '.*'
schedule:
frequency: { minutes: 30 }
timeout: { minutes: 5 }
This auto-discovers any repo that contains a catalog-info.yaml file. Repos without it don't show up. That's the magic: the catalog is sourced from the same repos as the code.
We didn't try to onboard everything at once. We picked 20 critical services and wrote their catalog-info.yaml ourselves:
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payments-api
description: Customer-facing payments endpoint
annotations:
github.com/project-slug: kirilurbonas/payments-api
pagerduty.com/integration-key: ${PAGERDUTY_KEY}
grafana/dashboard-selector: "service=payments-api"
sonarqube.org/project-key: payments-api
spec:
type: service
lifecycle: production
owner: team-checkout
system: checkout
providesApis:
- payments-public-api
consumesApis:
- billing-internal-api
dependsOn:
- resource:default/postgres-checkout-prod
- resource:default/redis-checkout-prod
The annotations are the secret sauce. Each tells a Backstage plugin where to find the corresponding data:
A team doesn't write 200 lines of metadata. They write 20 lines of annotations and Backstage stitches the rest from existing tools.
Once 20 services were in, we did something specific: made onboarding new engineers go through Backstage. New starters were given access to Backstage on day one and asked to find:
If any of those were missing for a service, they couldn't complete the task. The team responsible got a polite ping. Within 2 weeks of this practice, 50+ services had been added.
By week 10 we had 95 services in the catalog. The "who owns what" question became answerable for most things.
Backstage's TechDocs feature renders Markdown from your repos as a documentation site. We required every catalogued service to have:
docs/ folder in the repoindex.md with a 2-paragraph "what is this and what does it do"runbook.md (literally a copy-paste of the existing runbook in any format)metadata:
annotations:
backstage.io/techdocs-ref: dir:.
Six weeks later: documentation for 80% of services, accessible from the service's catalog page. Most of it was just moving existing docs into the right place; the value was in discoverability, not creation.
Backstage Software Templates let teams scaffold a new service through a UI:
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: typescript-service
title: TypeScript Service
spec:
parameters:
- title: Service Info
properties:
name: { type: string, pattern: '^[a-z0-9-]+$' }
team: { type: string, enum: [team-checkout, team-platform, team-ml] }
steps:
- id: fetch-base
name: Fetch base template
action: fetch:template
input: { url: ./skeleton, values: { name: ${{ parameters.name }} } }
- id: publish
name: Publish to GitHub
action: publish:github
input:
repoUrl: github.com?owner=kirilurbonas&repo=${{ parameters.name }}
- id: register
name: Register in catalog
action: catalog:register
input: { repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }} }
A new service from this template comes with:
catalog-info.yaml already wired updocs/ skeletonNew service spin-up time dropped from ~2 days to ~30 minutes. The 30 minutes is mostly waiting for repo permissions to propagate.
| Entity Type | Count |
|---|---|
| Service (Component) | 127 |
| Library (Component) | 41 |
| Website (Component) | 12 |
| API | 84 |
| Resource (DBs, queues) | 67 |
| System (logical grouping) | 14 |
| User (engineer) | ~80 |
| Group (team) | 18 |
Coverage: 127 services in catalog / 158 services total = 80%. The missing 31 are mostly internal tools and "we'll get to it" services.
The single biggest lever. We didn't ask teams to fill out metadata; we asked them to point at existing tools. Annotations link Backstage to the source of truth elsewhere. Updating PagerDuty updates Backstage; teams can't drift.
Making "new engineer's day 1 task" require Backstage created social pressure to keep it current. No mandates, no audits — the bug surface was visible immediately.
Old way to start a service: 2 days
New way: 30 min, all integrations work
Once teams used the template once, they wanted every new service in Backstage so it could use templates. Pull > push.
We never had to ask a team to "add their service to Backstage." Adding a catalog-info.yaml to their repo did it automatically within 30 minutes. The barrier to adoption was as small as we could make it.
We tried writing a custom plugin to integrate an internal cost-allocation tool. Two weeks of work, broke on a Backstage upgrade, replaced with an annotation that links to a dashboard. Lesson: link to the source of truth instead of duplicating it.
We tried showing per-service cost on the catalog page. The data was always 24h stale, sometimes wrong, and led to "why does my service cost X" rabbit holes. Removed.
We initially required every internal library to have an entry. Most weren't owned by anyone in particular and the friction was higher than the value. Now we only catalog libraries with > 3 internal consumers.
Out-of-the-box search prioritized by entity name, not relevance. Teams searching "payment" got the payments-tests library before payments-api. We tuned the search index weights to fix this; took longer than expected.
| Metric | Pre-Backstage | Month 6 |
|---|---|---|
| New engineer time-to-productive | 3 days | 1 day |
| "Who owns this service?" question | hours/days | minutes |
| New service spin-up | ~2 days | ~30 min |
| Services with documentation | ~40% | ~80% |
| Services with runbook | ~25% | ~75% |
| Time spent maintaining catalog | n/a | ~3 hr/week (1 person) |
Despite our best efforts, ~5% of catalog entries are wrong at any given time. Wrong owners (team renamed; catalog says old name), wrong APIs (deprecated, not removed). We have a quarterly "catalog hygiene week" to clean up.
Our Confluence and Backstage docs aren't fully merged. New stuff goes to TechDocs; old stuff stays in Confluence with cross-links. Not ideal but acceptable.
Backstage Groups should map to teams. Our HR system, PagerDuty, GitHub Teams, and Slack all have slightly different ideas of what "team-checkout" is. We treat HR as truth and reconcile downstream — error-prone.
catalog-info.yaml in repos. Don't require humans to register entities.For everyone else, the question isn't "should we use Backstage" but "what would take its place if we don't." If the answer is "nothing currently does this and we feel it," start.
Backstage doesn't fix culture problems. It surfaces them. The "who owns this" question still requires teams that can own things; Backstage just makes the answer findable when those teams exist.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
We deployed the same edge function on both platforms and measured for a quarter. Where each wins, where each loses, and the surprises along the way.
Three layers of pooling, three different jobs. We learned the hard way which to use when. Real numbers from a 8k-connection workload.
Explore more articles in this category
We ran Pulumi in TypeScript and Terraform in HCL side by side across 60+ services. Each won different categories of work. Here's the breakdown.
How we shipped three schema migrations with zero customer impact. Expand-then-contract, dual-writes, and the rollback plan we never had to use — but tested anyway.
How we went from 200 alerts per week (most ignored) to 15 actionable alerts with clear runbooks and useful dashboards.