Bills hit $3,400/mo for runner minutes. We moved to self-hosted on EKS spot. The savings were real; the surprises were too.

On this page

GitHub Actions Self-Hosted Runners: Why We Switched and What Broke

Our GitHub-hosted runner bill grew to $3,420/month across ~140 active workflows. We migrated to self-hosted runners on EKS using spot instances. The new bill is $210/month. Here's what worked, what broke, and what we'd do differently.

Why It Got Expensive #

Three forces compounded:

Build matrix expansion. Adding macOS jobs (5× cost multiplier) for an Electron app.
Test parallelism. Splitting a 22-min suite into 8 shards × 3-min each. Total wall time dropped, total minutes climbed.
CI-driven security scans running on every PR push. Each scan was 4 minutes; PRs averaged 11 pushes.

GitHub's Linux x64 minutes are cheap ($0.008/min), but $3,400/mo gets attention.

The New Architecture #

code

┌─────────────────────────────────────────────────────────┐
│ GitHub Actions Workflow                                 │
│   runs-on: [self-hosted, linux, x64, prod]              │
└────────────────────────┬────────────────────────────────┘
                         │ webhook
                         ▼
┌─────────────────────────────────────────────────────────┐
│ Actions Runner Controller (ARC) on EKS                  │
│   - Watches GitHub queue                                │
│   - Spins up ephemeral runner pod per job               │
│   - Pod runs on spot c7i.xlarge node pool               │
└─────────────────────────────────────────────────────────┘

We use the official actions-runner-controller Helm chart. Each runner is a fresh pod, scheduled on a Karpenter-managed spot node pool.

yaml.yaml

# values.yaml (trimmed)
template:
  spec:
    nodeSelector:
      karpenter.sh/capacity-type: spot
    tolerations:
      - key: ci
        operator: Equal
        value: "true"
        effect: NoSchedule
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        resources:
          requests: { cpu: 2, memory: 4Gi }
          limits:   { cpu: 4, memory: 8Gi }

A custom Karpenter NodePool provisions c7i.xlarge and c7i.2xlarge spot nodes with a CI-only taint so other workloads don't land there.

What Broke (And How We Fixed It)#

1. macOS Jobs Were Stuck on GitHub-Hosted #

ARC doesn't run macOS. Our Electron build needs macOS for code signing.

Fix: kept macOS jobs on GitHub-hosted runners (the expensive ones), moved everything else to self-hosted. macOS still costs ~$800/mo but it's 5 jobs/day, not 50.

2. Spot Interruptions Killed In-Flight Jobs #

Roughly 1 in 30 jobs got terminated mid-run. The job retried, but engineers saw red Xs and got nervous.

Fix: two-tier setup. Critical jobs (deploys, release builds) run on runs-on: [self-hosted, linux, x64, on-demand] with a small on-demand node pool. Bulk jobs (tests, lints, scans) tolerate spot interruption and just retry.

yaml.yaml

deploy:
  runs-on: [self-hosted, linux, x64, on-demand]   # never spot
test:
  runs-on: [self-hosted, linux, x64, spot]        # spot is fine

3. Cache Hit Rate Dropped to Zero #

GitHub-hosted runners have caching for actions/cache baked in. Self-hosted runners need their own cache backend or each job downloads dependencies fresh.

yaml.yaml

- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: npm-${{ hashFiles('package-lock.json') }}

Without a cache backend, this works against a brand-new pod each run. Every job redownloaded every dependency.

Fix: deployed an S3-backed cache server using runs-on/cache-action so actions/cache writes/reads from S3. Cache hit rate went from 0% to 78%; average job time dropped from 6.2 min to 2.4 min.

4. Docker-in-Docker Drama #

About 30% of our jobs build container images. The runner pod can't docker build without privileged mode (security risk) or a daemon.

Fix: switched to buildkit as a sidecar with rootless mode + remote builder pattern.

yaml.yaml

- uses: docker/setup-buildx-action@v3
  with:
    driver: remote
    endpoint: tcp://buildkitd.ci-system.svc:1234

A long-lived buildkitd deployment handles all builds. Cache layers are shared across PR branches. Image build time dropped 40% from the cache reuse alone.

5. Secrets Mounting on the Runner #

Our previous workflows used AWS_ACCESS_KEY_ID from GitHub secrets. On self-hosted we wanted to use IAM Roles for Service Accounts (IRSA).

Fix: each runner pod has a service account with a scoped IAM role. The job assumes the role automatically; no AWS keys in GitHub secrets at all.

yaml.yaml

serviceAccount:
  create: true
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/github-runner-deploy

This was the biggest security win of the migration. Zero static AWS credentials in GitHub.

Numbers After 90 Days #

Metric	Before	After
Monthly cost	$3,420	$210
Avg PR feedback time	12 min	9 min
Cache hit rate	64%	78%
Spot interruption rate	n/a	3.4%
Static AWS credentials in GH	11	0
On-call pages from CI	2	4

The slight uptick in CI-related on-call (2 → 4) is because we now own more of the stack. None were severe.

Two pools, on-demand + spot. Critical paths get on-demand; bulk work gets spot.
Always set up an S3 cache backend on day one. It's the single biggest performance lever.
Use a remote buildkit instance for Docker builds. Your CI gets faster and more secure simultaneously.
IAM Roles for Service Accounts is the killer feature. Stop putting AWS keys in GitHub secrets.
Tag runner labels carefully. [self-hosted, linux, x64, prod] means anything matching self-hosted qualifies. Be specific to avoid wrong pools picking up jobs.
Monitor runner queue depth. If jobs queue > 30s consistently, scale your pool.

What's Worse Than GitHub-Hosted #

You own incidents. When the runner controller restarts, jobs queue. We had two outages in 90 days.
More config drift potential. Each team can request labels/pools, and discipline is needed.
macOS still expensive. No good self-hosted answer for Apple silicon CI.

When to Stay on GitHub-Hosted #

Don't migrate if any of these apply:

Your monthly bill is < $500. The savings won't be worth the operational cost.
You don't have a Kubernetes cluster you can extend cheaply.
You're a team of 1–3 engineers. Maintenance overhead vs. benefit doesn't pencil out.

When to Migrate #

Strong case if all of these are true:

Monthly bill > $1,500 and growing.
You already operate Kubernetes for product workloads.
Your CI workflows are mostly Linux x64.
You can dedicate ~1 engineer-month to the migration.

We hit those thresholds; the migration paid for itself in 6 weeks.

GitHub Actions Self-Hosted Runners: Why We Switched and What Broke

GitHub Actions Self-Hosted Runners: Why We Switched and What Broke

Why It Got Expensive #

The New Architecture #

What Broke (And How We Fixed It)#

1. macOS Jobs Were Stuck on GitHub-Hosted #

2. Spot Interruptions Killed In-Flight Jobs #

3. Cache Hit Rate Dropped to Zero #

4. Docker-in-Docker Drama #

5. Secrets Mounting on the Runner #

Numbers After 90 Days #

What's Worse Than GitHub-Hosted #

When to Stay on GitHub-Hosted #

When to Migrate #

Stay Updated

Vector Database Selection: Pinecone, pgvector, Qdrant After 6 Months in Production

Linux Memory Management: When OOM Killer Strikes Your K8s Pods

More from DevOps

HashiCorp Vault as a Secrets Backend for Kubernetes

Kafka Partition Strategies — Scaling Consumers Without Reshuffling Everything

Pipeline Observability — Why CI Failures Don't Trigger Alerts (And Should)

HashiCorp Vault as a Secrets Backend for Kubernetes

Kafka Partition Strategies — Scaling Consumers Without Reshuffling Everything

Pipeline Observability — Why CI Failures Don't Trigger Alerts (And Should)

Burn-Rate Alerting — The SLO Discipline That Prevents Alert Fatigue

Container Resource Limits — What They Actually Do at the Kernel Level

Kubernetes Resource Requests — Right-Sizing Without Guessing

About Kiril Urbonas

You might have missed

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

Linux Performance Tuning for Containers and Kubernetes Nodes