We cut our average CI build time from 28 minutes to 6 minutes. The changes that mattered, ranked by impact.
A while ago our average CI build took 28 minutes. Engineers context-switched away during builds; PRs sat for hours waiting for checks; the deploy queue compounded. We did focused work to bring it to 6 minutes median. Most of the savings came from a small number of specific changes; some of the changes were less obvious. This post is what worked, ranked by impact.
A typical PR triggered:
Total wall-clock: ~28 min. Several stages ran in parallel but the long pole was unit tests + integration tests.
The CI runner was self-hosted on a fixed-size EC2 fleet (~12 runners). At peak times, jobs queued, adding 5-15 minutes of waiting.
The biggest single win: aggressive caching of dependencies and build artifacts.
What we cache:
node_modules keyed by package-lock.json hashpip cache keyed by requirements.txt hash.m2 repository keyed by pom.xml hashGitHub Actions has built-in actions/cache. It uses backend storage tied to the repo (10GB limit). For our scale, sufficient.
A specific example: a Node service's CI did npm ci every run, taking ~3 minutes. With cache, ~30 seconds when the lockfile is unchanged.
Saving across all services: ~6 minutes typical. Largest single change.
Unit tests took 12 minutes serially. The test suite had ~1500 tests. Single-threaded.
Changes:
pytest-xdist to parallelize by file across N workers--maxWorkers to parallelize across coresA 12-minute serial run became a ~3-minute parallel run with 4 workers, with no test logic changes.
Watch out for test interdependencies. Some tests assumed previous tests had set up state. Parallel runs broke them. Fixing the inter-test coupling was real work but the right thing — those tests were brittle anyway.
Saving: ~9 minutes.
Not every PR touches every service. A doc-only change shouldn't run all the integration tests.
We added path filters:
*.md files skips all CI except docs validation.GitHub Actions' paths: filter for workflows handles this. We set it up per-service.
Watch out: dependencies between services. A change to a shared library might affect three services. We err on the side of running too many tests rather than too few.
Saving: ~3 minutes average (depends on PR; some saves more, most less).
Our CI runners were m5.xlarge (4 vCPU, 16GB). Compute-bound jobs (large compilations, parallel test runs) bottlenecked on CPU.
We switched to c6i.4xlarge for compute-heavy jobs (16 vCPU, 32GB). Cost per minute is 4x; but jobs run 3x faster, so cost-per-CI-run is roughly even, and engineers wait less.
For light jobs (lint, security scans), kept smaller instances; cost matters more than time.
Saving: ~2 minutes on heavy jobs.
Building containers without cache means every layer rebuilds. With proper layer caching, only changed layers rebuild.
We use BuildKit with registry-based cache:
- uses: docker/build-push-action@v5
with:
cache-from: type=registry,ref=our-ecr-repo/cache:latest
cache-to: type=registry,ref=our-ecr-repo/cache:latest,mode=max
The cache image is updated on every successful build. New PRs pull from this cache; only changed layers actually rebuild.
The Dockerfile structure matters too — dependencies in their own layer (so npm ci is cached when the lockfile is unchanged), copy source code last.
For a Node service: image build went from 4 min to 45 seconds in the typical case (only app code changed; deps cached).
Saving: ~3 minutes per build typically.
CI runner saturation was causing 5-15 min queue times during peak.
Changes:
The "kill previous run" change is important. If a developer pushes 3 commits in 10 minutes, only the third needs full CI; the first two are obsolete. Without this, all three queue and run.
Saving: queue time dropped from 5-15 min to 0-3 min during peak.
Integration tests originally ran the full suite for every PR. We split into:
The split is by code paths exercised, not arbitrary categorization. Tests that exercise critical paths are in the core; tests that exercise edge cases of specific features are extended.
Saving: ~4 minutes on PRs not affected by extended tests.
Some checks (formatting, basic linting) are fast and obvious. We moved them from CI to pre-commit hooks:
ruff format for Pythonprettier for TypeScript / JSON / YAMLgolangci-lint for Gomarkdownlint for docsPre-commit catches these at commit-time; CI doesn't need to run them. CI still does a final check (in case someone bypassed the hook), but the failure mode is "your local hook should have caught this," not "wait 30 seconds for CI to tell you."
Saving: ~30 seconds. Small in CI time but improves developer experience.
For some workflows, we kick off the deploy build before tests pass. The image gets built and pushed; if tests fail, the image is just orphaned. If tests pass, the deploy can use the already-built image.
This trades some ECR storage cost for parallelism. For services with long build times (5+ min), worthwhile.
Saving: ~2 minutes when it pays off (most successful PRs).
Some changes we tried that didn't help:
Bigger CI runners across the board. Diminishing returns. Beyond c6i.4xlarge, our tests didn't get faster. Some bottlenecks are I/O or single-threaded.
Distributed test sharding via custom orchestration. We considered breaking test runs across many small workers via a custom orchestration layer. The plumbing complexity wasn't worth the marginal speedup over pytest-xdist.
Pre-warmed test databases via persistent state. We tried keeping a long-running test database in CI with state pre-loaded. State drift between tests caused flakiness; reverted to per-job database setup.
Reducing test count. We considered "are all these tests needed?" Mostly yes. The few we removed were tests of long-deprecated code paths.
Median CI time: ~6 minutes. p95: ~9 minutes. Queue time: usually < 1 minute.
Per-PR breakdown:
Most things run in parallel; the longest path is what we see.
Maintaining fast CI requires ongoing discipline:
Review CI durations weekly. A creep up by 30 seconds per build is invisible per-build but adds up.
Fail fast, fail clear. Expensive checks (integration tests) run after cheap checks (lint, unit). A failure surfaces in the cheaper tier first; engineer fixes it without paying the full CI cost.
Test code is code. Bad tests cause flakiness, slow runs, and false confidence. We treat test code with the same review standards as production code.
Don't add more checks blindly. Each new check adds wall-clock time. Adding checks should require justification.
Self-hosted CI runners on Spot:
Compared to the engineer time saved (every engineer waits less per PR), the ROI is large.
Compared to managed CI alternatives (CircleCI, BuildKite cloud, GitHub-hosted runners): we're cheaper and faster. The trade is the operational overhead.
Cache aggressively. Dependencies, build artifacts, container layers. Largest single lever.
Parallelize tests. Most test suites parallelize fine with off-the-shelf tools.
Path-based skipping for unaffected jobs. Doc PRs shouldn't run integration tests.
Faster runners for compute-heavy jobs. Cost-per-CI-run stays similar; engineers wait less.
Per-PR concurrency limits. Don't run obsolete commits' jobs.
Pre-commit hooks for fast local checks. Faster feedback for developers; less CI work.
Watch the trends. Build time creeps up. Weekly check, action when it does.
CI optimization is one of the highest-ROI engineering investments per hour of effort. Engineers wait less; PRs merge faster; deploys flow more smoothly. The 22-minute improvement per build, multiplied across hundreds of PRs per week, is significant time recaptured.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
A real story of removing console-only changes, adding drift detection, and getting Terraform back in charge.
We started with a single Celery worker handling everything. Eight months and three architecture changes later, here's what scaled and what we learned about queue design.
Explore more articles in this category
You always have known vulnerabilities. The question is how you triage, patch, and respond. The discipline we run after a few real incidents and a lot of routine work.
Three terms that get mixed up constantly. The actual differences, where each one sits in the request path, when you reach for which, and where the same tool plays all three roles.
Helm gives you a lot of rope. The patterns we used that backfired, the ones we replaced them with, and what to skip if you're starting today.