Version-pinned modules across many repos. The release process, semver discipline, and the breaking-change communication that keeps a shared registry sane.
The first version of any "shared Terraform module" lives in one repo, sourced by Git URL with a branch reference. Six months later there are ten consumers, three of them are broken by recent module changes, and nobody's sure what version anyone is on. The fix is real version management: semver, a registry, and discipline about breaking changes. This post is the model we run.
When module consumers source by branch:
module "vpc" {
source = "git::ssh://git@github.com/company/tf-modules.git//vpc?ref=main"
}
Every terraform init pulls the current main. The module owner ships a change; one consumer's next apply silently inherits it. Sometimes that's fine. Often it isn't.
We've had:
Each of these would have been avoidable with version pinning.
The shape we run:
tf-modules/) with subdirectories per module. One repo, one CI pipeline, easy to reason about.vpc-v1.2.3, eks-cluster-v3.0.0. Tags are the source of truth for "version X of module Y."module "vpc" {
source = "git::ssh://git@github.com/company/tf-modules.git//vpc?ref=vpc-v1.2.3"
}
That's the whole system. No private registry server, no fancy tooling. Git tags + discipline.
Standard semver: MAJOR.MINOR.PATCH.
The hard part: deciding what's "breaking." Some changes look minor and aren't:
validation block on an existing variable (existing usage might fail validation now).terraform state mv to manage drift might be affected).When in doubt, treat it as breaking. The cost of an unnecessary major version is low; the cost of breaking consumers silently is high.
A module PR's full lifecycle:
breaking, minor, patch.release-please or just a script), updates CHANGELOG.md, and creates the Git tag.The version-impact label is the integrity check. We require reviewer agreement on the label — the author can propose minor, but the reviewer confirms whether the change is actually breaking.
Each consumer of a module pins to an exact version:
module "vpc" {
source = "git::ssh://git@github.com/company/tf-modules.git//vpc?ref=vpc-v1.2.3"
cidr_block = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
When the module ships v1.3.0 with a new feature, the consumer doesn't get it automatically. They have to update the ref and run terraform plan to see what changes. This is deliberate — explicit upgrades are the point.
Tools like dependabot (which now supports Terraform) can open PRs against the consumer repos when new module versions ship. We use this for minor/patch updates (auto-suggested PR; reviewed and merged). Major version updates require human consideration.
For breaking releases, the CHANGELOG.md entry includes a "Migration" section:
## v3.0.0 - 2026-04-15
### Breaking changes
- Renamed `enable_logs` variable to `enable_flow_logs` for clarity.
- Removed deprecated `legacy_subnet_mode` variable.
### Migration
- Replace `enable_logs = true` with `enable_flow_logs = true` in your consumer.
- If you were using `legacy_subnet_mode`, set `subnet_strategy = "split-az"` to get equivalent behavior.
### New features
- Added support for IPv6 dual-stack mode (`enable_ipv6 = true`).
Major releases also trigger a notification to a Slack channel where consumers are subscribed. The notification includes the migration guide link.
We don't force consumers to upgrade on a schedule. They upgrade when they're ready. Older versions of the module remain installable indefinitely — Git tags never go away.
Old versions remaining usable forever sounds great but accumulates debt. After a year, you have consumers spread across 5+ major versions of the same module. Security fixes only land on new versions; old ones stay vulnerable.
Mitigations:
A few things that look reasonable for some teams and not for us:
Run a Terraform Cloud / Terraform Enterprise registry. Yes, it's nice — has a UI, semver enforcement. Costs money, adds another moving piece. For our scale (one org, ~20 modules, ~40 consumer repos), Git tags + CHANGELOG.md is enough.
Floating version ranges (>=1.0.0). Terraform allows them. We never use them. The implicit upgrade is exactly the problem we're avoiding.
Per-module Git repos. A repo per module is one extreme; a monorepo for all modules is the other. We use the monorepo — one CI pipeline, one set of conventions, easier cross-module changes. The downside (heavier merge contention) hasn't been a problem at our scale.
Auto-bumping based on commit message conventions. Tools like commitlint + conventional commits can derive the version automatically. We tried it; the manual label-on-PR step turned out to be more reliable.
Operational metrics for the module ecosystem:
Skipping the version label on a PR. "It's a small change." Then a consumer breaks. The label discipline only works if applied universally.
Adding required variables in a minor release. A new required variable breaks every consumer who hasn't set it. Required variables = major version bump. (Or: add as optional with a default, then a later major release removes the default if you must.)
Combining breaking and non-breaking changes in one release. Bigger blast radius; harder migration. One PR, one logical change.
Letting the CHANGELOG drift. Without the changelog, consumers don't know what changed between versions. Make it part of the merge gate.
Module versioning is one of those problems that looks bureaucratic and turns out to be operational reliability. The discipline takes a couple of months to land; afterward it's invisible, which is the goal. Modules ship; consumers upgrade when ready; breaking changes are visible and signposted.
Get the latest tutorials, guides, and insights on AI, DevOps, Cloud, and Infrastructure delivered directly to your inbox.
Most LLM eval suites correlate poorly with what real users experience. The eval patterns we run that move with prod metrics — and the ones that lied to us.
Production monitoring catches user-facing issues. CI failures stay invisible until someone notices the merge queue is stuck. The metrics and alerts that make pipelines observable.
Explore more articles in this category
EXPLAIN ANALYZE output is dense and intimidating. Once you can read it, most slow-query investigations finish in minutes. The patterns we keep seeing.
Backups are easy. Restores are hard. The quarterly drill we run, what's failed during it, and the discipline that makes "we have backups" actually mean something.
Replication is the foundation of database HA. What we monitor, how we practice failover, and the gotchas that show up only when you actually fail over.