K8s Secrets are barely encrypted. We moved every secret to Vault with the Vault Agent injector and never went back. The setup checklist.
Kubernetes Secrets are not really secret. They're base64-encoded strings stored in etcd, accessible to anyone with cluster RBAC for the namespace. For development that's fine; for production it's a gap that bothered us enough to fix. We moved every production secret to HashiCorp Vault, fetched at runtime via the Vault Agent injector, about a year ago.
This is the checklist of what made the integration actually work, in the order we'd tackle it on a new cluster.
Before installing anything, we made an inventory. A "secret" is anything where leakage causes harm — API keys, DB passwords, signing keys, OAuth client secrets, encryption keys. Random configuration values (feature flags, log levels) aren't secrets and don't belong in Vault.
The inventory landed at about 40 distinct secrets across our 14 production services. Smaller than we expected. Most "secrets" people had been treating as such were actually configuration.
We use HCP Vault (HashiCorp Cloud Platform) Plus tier. Self-hosted is fine but adds operational cost we didn't want. Whatever you pick, the cluster needs network access to Vault's API endpoint.
Verify before continuing:
# from inside the cluster
kubectl run -it --rm vault-test --image=hashicorp/vault \
--restart=Never -- vault status -address=https://your-vault.example.com
Expects to return Initialized: true, Sealed: false. If it can't reach Vault, fix the networking before trying anything else.
Vault needs to trust the cluster's service account tokens. From the Vault side:
# Enable the Kubernetes auth method
vault auth enable kubernetes
# Tell Vault about the cluster
vault write auth/kubernetes/config \
kubernetes_host="https://kubernetes.default.svc.cluster.local" \
kubernetes_ca_cert=@/path/to/ca.crt \
token_reviewer_jwt=@/path/to/token-reviewer-jwt
The token_reviewer_jwt is a long-lived service account token in the cluster that Vault uses to validate other service accounts' tokens. We create a dedicated vault-auth-delegator SA for this, with system:auth-delegator cluster role binding.
The injector is a mutating webhook that watches pod annotations and injects an init container + sidecar that fetches secrets and writes them to a shared volume. Install via Helm:
helm repo add hashicorp https://helm.releases.hashicorp.com
helm install vault hashicorp/vault \
--namespace vault-system --create-namespace \
--set "injector.enabled=true" \
--set "server.enabled=false" \
--set "injector.externalVaultAddr=https://your-vault.example.com"
server.enabled=false because we use HCP Vault, not in-cluster Vault. The injector still runs in-cluster.
Verify:
kubectl get pods -n vault-system
# expects: vault-agent-injector-xxxxx Running
For each app/service that needs secrets, create a Vault role bound to a Kubernetes service account:
vault write auth/kubernetes/role/checkout-app \
bound_service_account_names=checkout \
bound_service_account_namespaces=production \
policies=checkout-app-policy \
ttl=1h
The role says "any pod running with the checkout service account in the production namespace can authenticate, with these policies, for 1 hour."
Then write the policy that grants read access to specific secret paths:
# checkout-app-policy.hcl
path "secret/data/production/checkout/*" {
capabilities = ["read"]
}
Apply it:
vault policy write checkout-app-policy checkout-app-policy.hcl
The naming convention matters. We follow secret/data/{env}/{app}/{secret-name} and the policy scopes to {env}/{app}/*. Application can read its own secrets and nothing else.
In the application's Deployment manifest:
spec:
template:
metadata:
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "checkout-app"
vault.hashicorp.com/agent-inject-secret-db: "secret/data/production/checkout/db"
vault.hashicorp.com/agent-inject-template-db: |
{{- with secret "secret/data/production/checkout/db" -}}
DB_HOST={{ .Data.data.host }}
DB_USER={{ .Data.data.user }}
DB_PASSWORD={{ .Data.data.password }}
{{- end }}
spec:
serviceAccountName: checkout
containers:
- name: app
image: ourorg/checkout
command: ["/bin/sh", "-c", "source /vault/secrets/db && exec /app/checkout"]
The injector adds an init container that fetches the secret and writes it to /vault/secrets/db. The application reads it from there. The application code doesn't know about Vault — it just reads env-style files from a shared volume.
After deploying, check:
kubectl logs deployment/checkout -c vault-agent
# Should show: "[INFO] template server: rendered ... /vault/secrets/db"
kubectl exec deployment/checkout -c app -- ls -la /vault/secrets/
# Should show: -rw------- 1 root root ... db
If the pod is in CrashLoopBackOff, check the init container logs first:
kubectl logs deployment/checkout -c vault-agent-init
Common failure: the role binding doesn't match (typo in service account name or namespace). Vault returns 403, init container exits, main container can't read the missing secrets file.
The default behaviour is "fetch at pod start, never re-fetch." For most secrets that's fine. For credentials with short TTLs (like Vault's database secrets engine), you want continuous renewal:
vault.hashicorp.com/agent-inject-template-db: |
{{- with secret "database/creds/checkout-role" -}}
DB_USER={{ .Data.username }}
DB_PASSWORD={{ .Data.password }}
{{- end }}
The agent will renew the secret as it approaches expiration and re-render the file. The application has to either re-read the file periodically or get a SIGHUP-style reload signal (Vault Agent can send one).
We don't use this for most apps — the rotation complexity isn't worth it for static API keys. We do use it for our DB credentials, where Vault issues short-lived per-pod credentials (5 minute TTL), and the agent rotates them automatically.
CI doesn't run as a Kubernetes pod, so the K8s auth method doesn't apply. We use Vault's GitHub OIDC auth method:
vault auth enable jwt
vault write auth/jwt/config \
oidc_discovery_url="https://token.actions.githubusercontent.com" \
bound_issuer="https://token.actions.githubusercontent.com"
vault write auth/jwt/role/github-actions-deploy \
role_type="jwt" \
user_claim="actor" \
bound_audiences="https://github.com/yourorg" \
bound_claims='{"repository_owner":"yourorg","ref":"refs/heads/main"}' \
policies="ci-deploy"
ttl=15m
In GitHub Actions:
- uses: hashicorp/vault-action@<sha>
with:
url: https://your-vault.example.com
method: jwt
role: github-actions-deploy
secrets: |
secret/data/ci/aws-deploy-key access_key | AWS_ACCESS_KEY_ID
secret/data/ci/aws-deploy-key secret_key | AWS_SECRET_ACCESS_KEY
The CI gets short-lived credentials, only valid for the duration of the job. No long-lived static secrets in GitHub.
Vault has an audit device for logging every secret access. Enable it before going live:
vault audit enable file file_path=/var/log/vault/audit.log
For HCP Vault, the audit log streams to the HashiCorp Cloud Platform's logging integration; we forward it to our SIEM.
Every read of every secret is logged with the requesting identity, timestamp, path, and granular outcome. When (not if) you have a security review, the audit log is what answers "did anyone read X secret outside normal operations?"
A short list of issues people on the team have run into:
Pods stuck in Init. Vault Agent init container can't reach Vault. Network policy issue, DNS issue, or Vault itself unreachable. Check kubectl logs <pod> -c vault-agent-init.
Secrets not refreshing. The agent template only re-renders if the secret's TTL is short enough. For static secrets, the agent fetches once and that's it. Restart the pod (or use Vault Agent in long-running mode with explicit renewal) to refresh.
RBAC permission denied. The Kubernetes auth role's bound_service_account_namespaces doesn't include the namespace. Easy to typo. Quick check: vault read auth/kubernetes/role/your-role.
Token TTL too short. Default ttl=1h is fine for most apps. For services with long initial startup that read secrets only at boot, this is irrelevant — you don't need the token after init. For services that re-fetch, ensure the TTL covers your refresh interval.
These come from Vault's metrics endpoint scraped by Prometheus.
Start with one app. Get the full path working — Kubernetes auth method, role, policy, agent injector, app reads secrets — before you try to roll out widely. The first one is the slowest; subsequent apps take maybe 30 minutes each.
Decide your secret-path naming convention early. We use {env}/{app}/{secret-name}. Changing convention later is annoying because you have to update both Vault paths AND application code.
Don't migrate everything at once. We did it service-by-service over a sprint. Each migration is small but the cumulative effect is significant. By the end, no one had to think "wait, where does this secret live" — the answer was always "Vault, fetched by the agent."
The biggest win isn't the cryptographic improvement (real but boring). It's the audit trail. Six months in, when somebody asks "who accessed the prod DB password during last Tuesday's incident," the Vault audit log answers in 5 minutes. Without it, that question is unanswerable.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
We test infrastructure code with three layers: validation, plan review, and integration tests. The setup that catches real bugs without slowing down PRs.
How a small team moved from single-region risk to a simple active/passive multi-region setup without doubling complexity.
Explore more articles in this category
You always have known vulnerabilities. The question is how you triage, patch, and respond. The discipline we run after a few real incidents and a lot of routine work.
Three terms that get mixed up constantly. The actual differences, where each one sits in the request path, when you reach for which, and where the same tool plays all three roles.
Helm gives you a lot of rope. The patterns we used that backfired, the ones we replaced them with, and what to skip if you're starting today.