Vault + Kubernetes auth + Vault Agent Injector. The setup, the failure modes during pod startup, and the patterns that beat raw Kubernetes Secrets.

On this page

HashiCorp Vault as a Secrets Backend for Kubernetes

Kubernetes Secrets are base64-encoded data sitting in etcd. That's not encryption; it's encoding. For anything sensitive, you want a real secrets backend — encrypted at rest, encrypted in transit, audit-logged, with rotation. Vault is the most common answer for self-managed setups; we've run it as the Kubernetes secrets backend for ~two years. This post is what works, what bit us, and the patterns that earn their place.

The mental model #

Kubernetes Secrets are convenient but weak. Vault is strong but operational. The integration model:

Vault stores the actual secret material, encrypted with its own master keys, audit-logged.
Kubernetes service accounts authenticate to Vault using a JWT (the pod's service account token).
Vault returns short-lived secrets (15 min to 24 hours).
The pod uses the secret; when it expires, the pod re-fetches.

The pod never has long-lived credentials. The secret material is in Vault, behind audit logging and access controls. Kubernetes Secrets aren't eliminated — they often still hold session-cache-style values that aren't secret enough to need Vault — but the high-value secrets all move.

Auth method: Kubernetes JWT #

Vault's Kubernetes auth method validates pod-issued JWTs against the cluster's TokenReviewer API. The flow:

Pod starts; its service account token is mounted as a JWT.
Pod calls Vault, presenting the JWT.
Vault calls back to Kubernetes's TokenReview API to confirm the JWT is valid.
Vault checks the bound service account and namespace against its policies.
If allowed, Vault returns a Vault token good for some TTL.

The configuration on Vault's side ties:

hcl.hcl

path "secret/data/payments/*" {
  capabilities = ["read"]
}

…to a policy, and the policy to a role like:

hcl.hcl

vault write auth/kubernetes/role/payments \
  bound_service_account_names=payments-sa \
  bound_service_account_namespaces=payments \
  policies=payments-policy \
  ttl=15m

Now any pod running as payments-sa in the payments namespace can fetch secrets at secret/data/payments/* for 15 minutes.

The TTL is the key knob. Shorter TTL = smaller blast radius if a token leaks; longer TTL = fewer renewals, less Vault load.

Vault Agent Injector #

The piece that makes this usable from applications. Without it, every app needs Vault-client code, token renewal, etc. With it, the Vault Agent Injector watches for pods with specific annotations and sidecar-injects a Vault Agent that handles all the Vault interaction.

Pod annotation:

yaml.yaml

metadata:
  annotations:
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/role: "payments"
    vault.hashicorp.com/agent-inject-secret-db-creds: "secret/data/payments/db"

What happens at pod startup:

Injector mutates the pod, adding a Vault Agent init container and sidecar.
Init container authenticates to Vault, fetches the secret, writes it to a shared emptyDir volume.
Main container starts; reads the secret from disk.
Sidecar continues running; renews the secret as it nears expiry.

The application sees a file like /vault/secrets/db-creds that contains the secret. App reads the file; nothing Vault-specific in app code.

What about the static secret problem?#

Some secrets really are static — API keys for third-party services that don't support rotation. For these, Vault's KV-v2 engine still adds value:

Encrypted at rest in Vault (not just base64 in etcd).
Audited (every read is logged).
Access controlled (RBAC via policies).
Versioned (previous versions of the secret retrievable).

The TTL on the Vault token is short even for static secrets — the secret material doesn't rotate, but access to it does. If a pod is compromised, the attacker has at most TTL minutes before they need to re-authenticate.

Dynamic secrets: the real win #

The biggest Vault feature most teams underuse: dynamic database credentials.

Vault's database engine connects to your DB (Postgres, MySQL, etc.) and creates a per-request user with limited permissions, returning the credentials. The user expires after the TTL; Vault revokes it automatically.

Pod calls Vault → "give me read-only access to the analytics DB." Vault returns (user: vault_abc123, password: ...). Pod uses those for 1 hour. Then they're revoked.

Benefits:

No shared passwords. Each pod has unique credentials. Compromise of one = compromise of one.
Auto-revocation. A killed pod's credentials expire automatically.
Audit. Every credential issuance is logged with which pod requested it.

We use dynamic credentials for our analytics DB (where every service gets a unique read-only user) and for some operator workloads that occasionally need admin access.

The startup ordering problem #

The biggest gotcha: pod startup ordering. The Vault Agent init container fetches secrets before the main container starts. If Vault is unavailable, the init container retries, then fails, then the pod fails to start.

Vault outage = no new pods start until Vault recovers. On a Vault outage during a deploy, you stop being able to roll out new versions.

Mitigations:

Vault HA. Run Vault in HA mode (multiple replicas, separate AZs). Doesn't help against Vault-backend-database outages.
agent-pre-populate-only. Annotation that runs the agent once at startup, then exits. Useful when secrets don't need rotation; pod starts with whatever Vault returned at start.
Cached secrets. Some Vault setups cache the last-good secret on local disk; if Vault is unavailable on restart, the pod uses the cached value with a warning.

The "Vault is critical infrastructure" reality means you treat its availability with the same seriousness as the cluster itself.

Rotation discipline #

Vault doesn't automatically rotate the secrets it stores. You have to:

For dynamic secrets: configured per-engine; happens automatically based on lease TTL.
For static secrets: rotate them periodically via Vault APIs. Schedule via cron / CI / Vault's own rotation feature.

We rotate static secrets quarterly. Database root passwords (used by Vault to create dynamic users) get rotated yearly with a careful plan because changing the root means re-bootstrapping the dynamic engine.

What we got wrong initially #

Treating Vault like another optional component. It became a critical path; we didn't have HA for the first 6 months. One Vault restart took down deploys for an hour. Now Vault is treated with the same operational discipline as the cluster control plane.

Tokens with long TTLs to avoid renewals. Set TTLs to 24 hours to reduce Vault load; never bothered with renewals. When a pod's token expired, the pod broke. Now TTLs are short and renewals happen automatically via the sidecar.

One Vault role per service. Started with hundreds of fine-grained roles; became unmanageable. Refactored to fewer roles with more careful policies. ~30 roles across the whole cluster is what we ended up with.

No backup story. Vault has its own state (in Consul, integrated storage, or external). We didn't think about it until we needed to restore. Now Vault snapshots are part of the backup discipline.

When Vault isn't worth it #

Honest list:

Small clusters (< 5 services). The operational cost of Vault outweighs the security benefit. AWS Secrets Manager (or equivalent) with External Secrets Operator is simpler.
Teams without operational depth. Vault is real infrastructure; not for "occasional sysadmin" teams.
Workloads where the secrets aren't actually sensitive. Internal-only API keys, dev configs — Kubernetes Secrets with proper RBAC is enough.

What we monitor #

Vault response time p99. Slow Vault = slow pod startup.
Token issuance rate. Sudden drops = something broken in the auth flow.
Failed authentications. Indicates misconfigured roles or compromised tokens.
Vault storage backend health. The data store behind Vault (Consul, integrated storage) is the real availability bottleneck.

What to read next #

Cross-cloud identity federation patterns — the cloud-native version of the same problem
Secrets management in practice — from env files to Vault — the broader maturity arc
Operational checklist — Kubernetes secrets and external Vault integration — checklist version of the patterns above
Cloud security best practices — securing AWS infrastructure — adjacent infrastructure-level security

Vault solves real problems and creates real operational responsibility. For teams that need it (regulated industries, large clusters, dynamic secrets) it's hard to beat. For teams that don't, simpler answers are usually right. The wrong choice in either direction creates work that doesn't pay back.

HashiCorp Vault as a Secrets Backend for Kubernetes

HashiCorp Vault as a Secrets Backend for Kubernetes

The mental model #

Auth method: Kubernetes JWT #

Vault Agent Injector #

What about the static secret problem?#

Dynamic secrets: the real win #

The startup ordering problem #

Rotation discipline #

What we got wrong initially #

When Vault isn't worth it #

What we monitor #

What to read next #

Stay Updated

pg_stat_statements — Postgres Query Analysis Without Guessing

MLOps — Model Registry vs MLflow Tracking, And When You Need Both

More from DevOps

Best Infrastructure-as-Code Tools in 2026 — Terraform, OpenTofu, Pulumi, and More

Terragrunt Explained — When You Actually Need It

Crossplane vs Terraform for Platform Teams

Best Infrastructure-as-Code Tools in 2026 — Terraform, OpenTofu, Pulumi, and More

Terragrunt Explained — When You Actually Need It

Crossplane vs Terraform for Platform Teams

Best CI/CD Platforms in 2026 — GitHub Actions, GitLab, Jenkins, and More

Best LLM APIs and AI Infrastructure in 2026 — A Cost and Capability Map

Best APM and Observability Tools in 2026 — Compared by Cost and Use Case

You might have missed

Prompt Engineering Best Practices: Maximizing LLM Performance

Embedding Models Comparison: Choosing the Right Model for Your Use Case

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

About Kiril Urbonas