We mapped every byte that ends up in our production containers. The map showed three places trust was implicit. Each became a control.

On this page

Secure Container Supply Chain Controls — Deep Dive

We started this work after a news cycle about a compromised npm package. Nothing of ours was affected, but the question that came up internally was uncomfortable: if a malicious package were introduced into one of our dependencies tomorrow, would we ship it to production?

Honest answer: probably yes, within a day or two. We had no controls between "PR merged" and "container running in prod" that would catch a malicious dependency. The work below was the response.

Mapping where bytes come from #

Before adding controls, we mapped where bytes that end up in our production containers come from. The chain looked like:

Source code: our git repo
Direct dependencies: declared in package.json/requirements.txt/go.mod
Transitive dependencies: pulled in by direct deps
Base image: e.g., node:20-alpine
Base image's dependencies: APT/APK packages baked in
Build-time tools: TypeScript compiler, webpack, gradle, etc.
Container registry: where we push the built image
Cluster pulling and running: Kubernetes nodes

Each numbered item represents a place where an attacker could insert hostile code if they compromised the corresponding party. Roughly half of those points (3, 5, 7, 8) we hadn't actively controlled.

The three controls that did most of the work #

Control 1: Pin direct dependencies + lockfile + content-hash verification #

The first instance of trust we had to make explicit was the resolver. Saying "we depend on react: ^19" trusts npm's resolver to pick a safe version. Saying "we depend on react@19.2.4 with hash sha512-..." trusts only what we've previously verified.

We require:

Lockfiles checked in (package-lock.json, poetry.lock, go.sum)
CI runs npm ci (not npm install) — fails if lockfile is stale or any hash mismatches
Renovate/Dependabot updates lockfiles with explicit PRs we review

The CI check is the one that closes the gap. Without it, an engineer could accidentally update a dependency by running npm install foo locally and not commit the lockfile change. The next CI run would re-resolve, possibly to a different version, possibly compromised.

This caught one issue in our codebase — a dev dependency had drifted in a feature branch. Renovate noticed and opened a PR. Mundane fix; the value is the visibility.

Control 2: Pin base images by digest #

FROM node:20-alpine is moving. Yesterday's node:20-alpine is different from today's. The image content is whatever was on Docker Hub when you built. If node:20-alpine were silently replaced by a malicious version (we've never seen this happen, but it's possible), we'd have shipped it.

We pin to digest:

dockerfile.dockerfile

FROM node:20-alpine@sha256:1a7d234c8b00b001a... AS builder

The digest is content-addressed. The bytes can't change without the digest changing. Renovate updates these via PR; we see the digest change and approve the new one explicitly.

For our self-built builder image (with our toolchain), we tag by content hash and pin to that:

dockerfile.dockerfile

FROM ourorg/builder@sha256:8a4f2c1...

Nothing in our Dockerfiles references :latest or :main or any moving tag.

Control 3: Sign images we build, verify on pull #

Pinning by digest stops compromise of the base layer. It doesn't stop compromise of OUR builds — if our CI were compromised, an attacker could push a malicious image with a valid (pinned) name to our own registry, and our cluster would pull it.

We sign every image we build and require valid signatures on pull. The implementation uses sigstore/cosign:

bash.bash

# In CI, after build:
cosign sign --key cosign.key $REGISTRY/$IMAGE@$DIGEST

# In Kubernetes, via admission policy:
# Reject any pod whose image isn't signed by our key

The Kubernetes admission policy is a Kyverno rule:

yaml.yaml

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-signed-images
spec:
  validationFailureAction: enforce
  rules:
    - name: check-image-signature
      match:
        any:
          - resources:
              kinds: [Pod]
      verifyImages:
        - imageReferences:
            - "registry.internal/*"
          attestors:
            - entries:
                - keys:
                    publicKeys: |-
                      -----BEGIN PUBLIC KEY-----
                      ...our public key...
                      -----END PUBLIC KEY-----

Now even if an attacker pushed an image to our registry, the cluster would refuse to pull it without a valid signature from our CI key.

The signing key lives in HashiCorp Vault and is only accessible to the CI workflow, gated by GitHub OIDC + branch restriction (only main branch builds can sign). Even a compromised PR can't sign an image.

What this doesn't protect against #

Honest about scope:

A compromised direct dependency you've blessed. If one of our pinned dependencies is malicious (e.g., a popular package that someone took over), we'll ship it. Pinning slows the attack but doesn't prevent it. Mitigation: Trivy scans + Snyk for known vulnerabilities; vigilance on Dependabot PRs.

A compromised CI runner. If GitHub Actions itself is compromised, our signing key is still safe (it's in Vault), but anything that the CI is allowed to do (build, scan, sign) could be done with malicious intent. Mitigation: limit blast radius with role-scoped credentials, monitor for unusual activity.

A zero-day in a base image. If node:20-alpine has a critical CVE we don't know about, we ship the CVE. Mitigation: regular base-image updates, pinned but rotated weekly.

A malicious commit by an authorized contributor. If a developer with merge rights goes rogue, none of these controls help. Mitigation: code review requirement, audit trails, separation of duties for sensitive paths.

These aren't gaps in our supply chain controls per se — they're attacks at a different layer. We document them so people know what we're not promising.

SBOM as audit trail #

For every image we build, we generate a Software Bill of Materials (SBOM) and attach it as a sigstore attestation:

bash.bash

syft $REGISTRY/$IMAGE@$DIGEST -o spdx-json > sbom.spdx.json
cosign attest --key cosign.key --predicate sbom.spdx.json --type spdx \
  $REGISTRY/$IMAGE@$DIGEST

When a CVE in a popular package gets disclosed, we can answer "which of our running images contains the affected version" by querying the SBOMs. Without SBOMs, that's a manual exercise that takes hours and gets stale immediately.

We've used this once for real, when a xz-utils issue was reported in early 2024. Within an hour of the disclosure, we knew which of our images were affected and could prioritize rebuild and rollout. Without SBOMs we'd have spent half a day at minimum.

The GitHub OIDC + Vault dance #

A subtle but important thing: our CI doesn't have any long-lived AWS or signing credentials. It authenticates to AWS via GitHub OIDC (assuming a role) and to Vault via the same mechanism (Vault is configured to trust GitHub OIDC tokens for specific repos and branches).

The trust policy is tight:

json.json

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"Federated": "arn:aws:iam::123:oidc-provider/token.actions.githubusercontent.com"},
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": {
        "token.actions.githubusercontent.com:sub": "repo:ourorg/ourrepo:ref:refs/heads/main"
      }
    }
  }]
}

Only the main branch's Actions can assume this role. PR builds (which run untrusted code from contributors) cannot. This means: a malicious PR cannot publish a signed image. Only main-branch builds can. Combined with branch protection rules requiring code review before merge, the path from "malicious PR" to "running in prod" requires compromising a reviewer's account.

What we measure #

Three things, monthly:

Pinning compliance: percentage of Dockerfiles in the org with :tag references vs @sha256:... references. Goal: 100%. Currently 100% in main branches.
Image signing coverage: percentage of images currently running in prod that pass signature verification. Goal: 100%. Currently 100% (admission policy enforces).
SBOM coverage: percentage of running images with attached SBOMs. Goal: 100%. Currently 100%.

The numbers being 100% is the point. If they slip, something has slipped past the controls; we'd want to know.

Operational cost #

The controls add real friction for engineers:

Renovate PRs need review. Maybe 4-5 per week across the org. Each takes 1-2 minutes.
Image-signing failures are loud. When the signing step fails (rare; usually a CI runner network blip), the whole build fails and someone has to investigate.
Adding a new base image requires going through the pin-and-verify process. Not difficult but requires intent.

Total overhead: maybe 30-60 minutes per engineer per month. We accept the cost.

What I'd tell a team adopting this #

Do it in this order:

Pin everything (dependencies and base images). Lowest cost, highest immediate benefit.
Add scanning (Trivy + Snyk for known vulnerabilities). Catches stuff you wouldn't have caught otherwise.
Move CI auth to OIDC, get rid of static credentials. Reduces blast radius.
Sign images and verify on pull. The full supply chain story.
Generate and attach SBOMs. Pays off the next time a critical CVE drops.

Skipping ahead is tempting but each layer relies on the previous one. SBOMs without pinning are useless because the SBOM doesn't reflect what you actually shipped. Signing without OIDC means a leaked CI key is a disaster.

The whole thing took us about a quarter of dedicated effort, spread across two engineers. Most of the benefit came from steps 1-3; steps 4-5 are paying it forward against a class of incidents we hope to never see.

Deep Dive: Secure Container Supply Chain Controls

Secure Container Supply Chain Controls — Deep Dive

Mapping where bytes come from #

The three controls that did most of the work #

Control 1: Pin direct dependencies + lockfile + content-hash verification #

Control 2: Pin base images by digest #

Control 3: Sign images we build, verify on pull #

What this doesn't protect against #

SBOM as audit trail #

The GitHub OIDC + Vault dance #

What we measure #

Operational cost #

What I'd tell a team adopting this #

Stay Updated

Deep Dive: Multi-Cluster Traffic Routing Strategies

Deep Dive: SLO-Based Monitoring for APIs

More from DevOps

On-Call Without Burnout: Rotations, Runbooks, and Escalation

Feature Flags for Safe Deploys: Decoupling Release From Deploy

Blameless Postmortems: The Template and Facilitation That Works

On-Call Without Burnout: Rotations, Runbooks, and Escalation

Feature Flags for Safe Deploys: Decoupling Release From Deploy

Blameless Postmortems: The Template and Facilitation That Works

Four Signals That Matter: Choosing SLIs Users Actually Feel

External Secrets Operator: One Secrets Workflow Across Clouds

Docker Compose in Production: When It Fits and When It Doesn't

You might have missed

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

Process Management and Monitoring in Linux

About Kiril Urbonas