We use feature flags on roughly every customer-facing change. The provider tradeoff, the patterns that hold up, and the failure modes that show up only after a couple of years.

On this page

Feature Flags in Production: Provider Choice and Operational Reality

We use feature flags on roughly every customer-facing change. The pitch — "deploy code dark, then turn it on gradually" — is true and we've leaned on it heavily. The operational reality is messier than the pitch suggests: flag debt, evaluation latency, accidental fail-open states, and the question of "who's actually allowed to flip what." This post is what we've learned after a couple of years of running flags at scale.

What we use flags for #

Three primary use cases:

Gradual rollouts. New feature ships to 1% of users, then 10%, then 50%, then 100% over a few days. Catches issues that don't show up in staging — bad inputs, weird edge cases, scale problems. Roughly 80% of our flag usage.

Kill switches. A piece of code that we might need to disable fast. Payments retry logic, AI feature paths, expensive computations. If something starts misbehaving in prod, we flip the kill switch and dig in. ~10% of usage.

Customer-specific overrides. A specific customer needs a feature ahead of general availability (or after it's been deprecated). Targeting rules in the flag platform handle this. ~10% of usage.

We do NOT use flags for:

Configuration values that aren't actually toggles (env vars and ConfigMaps do that better)
A/B experimentation with statistical significance (a stats-focused platform like Statsig fits that better than the general-purpose feature flag platforms)
Permanent feature gating tied to subscription tiers (that's billing logic, not flags)

Each of these tools-vs-flags confusions has bitten us at least once. Keeping the categories separate matters.

Provider choice #

We evaluated several providers and have run two of them in production. Brief comparison from actually using them:

LaunchDarkly — most-mature feature, broadest platform support, expensive at scale. Best fit for large teams that need fine-grained targeting, experimentation, and lots of language SDKs. We ran this for the first ~18 months; the bill grew faster than our team did.

GrowthBook — open source, can self-host or use their cloud, simpler model. Lacks some of LaunchDarkly's advanced targeting. Fine for "I need feature flags, not feature experiments." We moved here for the cost/simplicity trade.

Unleash — similar shape to GrowthBook, also OSS. We didn't run it but it's the alternative we'd consider.

Roll your own — for small teams with simple needs, a feature_flags table in your database with a small SDK is enough. The maintenance gets real once you cross ~30 flags or need percentage rollouts with consistent bucketing.

The general pattern: start with the simplest thing that works, switch when the gap to "real" flag platform features is causing pain. We probably should have started with GrowthBook instead of LaunchDarkly. The migration cost was real.

The patterns that hold up #

A few practices that survived the migration and a couple of incidents:

Flag-on-by-default for cleanup. New flags default to "on" in code, with the platform default being "off" for rollouts. Once a flag has been fully rolled out and stable, we can remove the platform configuration and the code keeps working.

Naming conventions. Every flag is <area>.<feature>.<purpose>, like checkout.express-pay.enabled or payments.retry-v2.kill-switch. Searchable, scannable, hard to mix up.

Owner per flag. Each flag has a tagged owner — usually the team that introduced it. When flag debt accumulates, we know who to ask. We've been burned by orphan flags from departed engineers.

Expiry dates on rollouts. Every rollout flag has a target removal date. After full ramp + stability, the flag should be removed within a few weeks. Without expiry, flags accumulate forever.

SDK init in service template. All services init the flag SDK the same way, with the same fallback behavior, the same logging. Reduces the surface area for "this service handles flags differently."

The operational gotchas #

What's bitten us:

Evaluation latency. Some SDKs do remote evaluation per check — a network round-trip every time you ask "is this flag on?" Latency adds up. We use SDKs that bulk-fetch flag state at startup and re-fetch every 30 seconds in the background. Each isEnabled() check becomes a local map lookup.

Fail-open vs fail-closed. When the flag platform is unreachable, what's the default? Both options have failure modes:

Fail-open (default to "on"): if the flag is a kill switch and the platform is down, you can't turn off the broken code path.
Fail-closed (default to "off"): if the flag controls a feature you wanted on, an outage of the flag platform disables features.

We pick per-flag. Kill switches fail-open (we'd rather keep the feature working in a platform outage than be unable to disable broken code). New-feature flags fail-closed (better to not ship to users than to ship a half-tested feature without supervision).

The SDK has explicit fail_open / fail_closed defaults per flag, set at creation time.

Bucketing consistency. Percentage rollouts should consistently hash the same user into the same bucket. Switching providers changed the hash function — a user who was at 50% with LaunchDarkly might be at 30% with GrowthBook. We migrated users by hand for the dozen flags that needed continuity.

Flag dependencies. Flag A is on only when Flag B is also on. The platforms support this but it gets messy fast. We avoid chained dependencies; if logic requires multiple flags, encode it in code with one flag as input rather than wiring dependencies in the platform.

The flag that became permanent. A handful of our oldest flags have been "100% rolled out" for over a year but never got removed because the code paths under them are subtly different. They're now de facto configuration toggles. We're slowly cleaning them up; the lesson is that "remove this flag" is a real piece of work that has to be scheduled, not assumed.

How we audit flags #

A quarterly review:

All flags > 90 days old since creation: still rolling out? If yes, why hasn't it converged?
All flags > 100% rolled out for > 30 days: ready to remove?
Owner check: is the listed owner still here?
Coverage: is the flag actually being used in code, or did the code paths get refactored away?

Each item gets one of: keep (with reason), remove, or rewrite. The review takes 30 minutes, catches ~3-5 flags ripe for removal each quarter.

Without this, flag count grows monotonically and the platform turns into a graveyard of dead toggles. We've seen orgs with 1000+ flags and no idea which are live — every code search returns multiple flag checks per file.

What the platforms still don't do well #

A few gaps we work around:

Code-side cleanup. Removing a flag requires removing the SDK calls AND deleting the platform configuration. No platform we've used reliably finds dead flag references in your codebase. We grep periodically for flags that no longer exist in the platform.

Cross-environment coordination. Flag values in dev vs staging vs prod are managed independently by default. We've shipped code that worked because dev had the flag on and broke in prod where it was off. We now keep a synced flag spec per environment with explicit per-env overrides documented.

Auditing who flipped what. Most platforms log changes but the UIs aren't great for "show me every flag change in the last month." We export the audit log weekly to S3 and grep it when we need to.

What I'd tell a team starting #

Use the platform, don't roll your own. Past ~30 flags, you want bucketing, targeting, audit logs, and a UI for non-engineers. Building this is real work; using a platform is cheaper.

Pick the simplest provider that fits your scale. GrowthBook or Unleash for most teams; LaunchDarkly when you actually need its advanced features.

Flag debt is real. Schedule removal. Don't assume someone will get to it.

Decide fail-open vs fail-closed per flag, at creation. Not at incident time.

Kill switches need to be tested. A kill switch that's never been flipped is theater. Tabletop exercises that include flipping kill switches.

Don't reach for flags for non-flag problems. A/B experimentation, configuration, billing — different tools, different shapes.

Feature flags are one of the most useful patterns in modern deployment. The operational discipline around them — naming, ownership, expiry, audits — is what determines whether they help long-term or turn into a slow swamp of dead toggles. The platforms do part of the work; the discipline is on you.

Feature Flags in Production — Provider Choice and Operational Reality

Feature Flags in Production: Provider Choice and Operational Reality

What we use flags for #

Provider choice #

The patterns that hold up #

The operational gotchas #

How we audit flags #

What the platforms still don't do well #

What I'd tell a team starting #

Stay Updated

Distributed Tracing with OpenTelemetry — What We Ship, What We Skip

Ansible Tutorial — Configure a Server in 30 Minutes

More from DevOps

Helm Chart Anti-Patterns We've Stopped Using

Job Queues — Sidekiq, Celery, BullMQ Patterns That Hold Up

Internal Developer Platforms — Backstage in Practice

Helm Chart Anti-Patterns We've Stopped Using

Job Queues — Sidekiq, Celery, BullMQ Patterns That Hold Up

Internal Developer Platforms — Backstage in Practice

Chaos Engineering — What We Actually Run as Game Days

Kubernetes 101 — Pods, Deployments, and Services Explained

Your First CI/CD Pipeline with GitHub Actions

About Admin

You might have missed

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Model Deployment Strategies: From Development to Production

Feature Flags in Production: Provider Choice and Operational Reality

What we use flags for#

Provider choice#

The patterns that hold up#

The operational gotchas#

How we audit flags#

What the platforms still don't do well#

What I'd tell a team starting#

Stay Updated

Distributed Tracing with OpenTelemetry — What We Ship, What We Skip

Ansible Tutorial — Configure a Server in 30 Minutes

More from DevOps

Helm Chart Anti-Patterns We've Stopped Using

Job Queues — Sidekiq, Celery, BullMQ Patterns That Hold Up

Internal Developer Platforms — Backstage in Practice

About Admin

You might have missed

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Model Deployment Strategies: From Development to Production

What we use flags for #

Provider choice #

The patterns that hold up #

The operational gotchas #

How we audit flags #

What the platforms still don't do well #

What I'd tell a team starting #