Skip to content

Add feature flag system with rollouts, plan gating, and back-office admin#896

Merged
tjementum merged 155 commits into
mainfrom
feature-flags
May 15, 2026
Merged

Add feature flag system with rollouts, plan gating, and back-office admin#896
tjementum merged 155 commits into
mainfrom
feature-flags

Conversation

@tjementum
Copy link
Copy Markdown
Member

@tjementum tjementum commented May 14, 2026

Summary & Motivation

Introduce a full feature flag system spanning the backend (definitions, persistence, evaluator, reconciler), authentication plumbing (JWT refresh, cookie middleware, YARP transform), the SPA infrastructure (typed hook, header-driven cache), the back-office admin surface, and the user-facing settings panels. Flag state is computed server-side at token refresh and propagated to the SPA via the x-user-feature-flags response header, eliminating per-render polling and keeping evaluation consistent across refreshes, navigations, and cross-tab sessions.

Two of the declared flags are load-bearing: google-oauth gates the OpenID Connect sign-in path and subscriptions toggles the Stripe billing surface. The rest (beta-features, sso, account-overview, compact-view, experimental-ui) are illustrative — they demonstrate each subtype and exercise the back-office UI but aren't part of the framework. Downstream products should delete the illustrative ones and declare their own.

Definitions and registry

  • Flags are public static readonly FeatureFlagDefinition fields on SharedKernel.FeatureFlags.FeatureFlags. The registry is reflected from those fields at startup, so adding a flag is a one-line declaration with no manual array maintenance. Definitions live in FeatureFlags.cs; the reflection/validation mechanism lives in FeatureFlagsRegistry.cs (partial class) so the file developers edit stays free of plumbing
  • Sealed subtypes — SystemFeatureFlag, TenantAbTestFlag, UserAbTestFlag, PlanGatedTenantFlag, TenantOwnerConfigurableFlag, UserConfigurableFlag — enforce every cross-property invariant (scope, configurability, AB eligibility, plan tier, kill-switch, stable-module) at compile time. trackInTelemetry and isKillSwitchEnabled are required parameters on every subtype so the decisions are explicit at the call site
  • Every backend build runs the GenerateFeatureFlagsManifest MSBuild target which serializes the registry to JSON and then runs a Node script that produces labels.generated.ts (Lingui t macros for every flag's Label and Description) and registry.generated.ts (typed runtime registry used by useFeatureFlag<Key>()). The result: the SAME flag declaration becomes a strongly-typed key union on both sides — removing a flag in C# raises a TypeScript compile error in every SPA call site — and Lingui's extractor picks up the generated t macros so each flag's English text lands in the shared .po catalogue ready for translators
  • Keys are validated as lowercase kebab-case so they are safe to use verbatim in URLs, JWT claim payloads, telemetry property names, and frontend route params

Schema and bucket assignment

  • One feature_flags table holds both the global base row per flag and the override rows scoped to tenants and users, deduplicated by a unique (flag_key, tenant_id, user_id) index with NULLS NOT DISTINCT
  • Indexes target the hot evaluator paths and a filtered partial index covers plan-driven rows
  • rollout_bucket columns are added to tenants and users and back-filled via a one-shot van der Corput sequence so even a 1% rollout is evenly distributed from day one; subsequent inserts pull from dedicated Postgres sequences to remain race-safe under concurrent signups

Evaluation

  • FeatureFlagEvaluator runs at JWT refresh and returns the keys enabled for the (tenant, user) pair. Precedence per flag: manual per-flag override > entity-global A/B inclusion pin > rollout bucket range > plan tier > default off
  • RolloutBucketHasher derives a stable per-flag starting bucket from the flag key and handles the wrap-around case (e.g. range 90..10)
  • A/B inclusion pins are unconditional and trump the rollout: AlwaysOn forces inclusion regardless of rollout percentage, NeverOn forces exclusion regardless. Use them to escape-hatch a tenant or user out of (or into) a rollout decision without changing the global percentage
  • Topological sort enforces single-level parent dependencies so a child flag is only evaluated after its parent has been considered

Reconciliation, soft-delete, and orphan handling

  • FeatureFlagDefinitionReconciler runs once at startup and converges the database to the C# definitions: missing base rows get inserted, rows whose key was removed from code are marked orphaned, and re-adding the same key auto-restores the base row plus any orphaned tenant/user overrides
  • Reconcile fails fast if a developer attempts to add a flag whose key matches a previously hard-deleted row — explicit guard against silent data resurrection
  • The back-office surfaces orphans, lets admins inspect previously-overridden tenants and users, and offers a manual delete that cascades to every override row

Plan-gated flags

  • PlanGatedTenantFlag ties a flag to a PlanTier (Basis / Standard / Premium). A background subscription-state evaluator writes plan-driven rows on upgrade and clears them on downgrade, so feature gating survives subscription changes without the customer-facing app having to know about feature flags
  • Manual overrides on a plan-gated flag are honored and tagged separately so admins can distinguish a deliberate grant from a plan-driven entitlement

Authentication and header propagation

  • AuthenticationCookieMiddleware and the YARP response transform were merged so cookie refresh and x-user-feature-flags emission happen sequentially in one hook, eliminating the race the split design had between cookie swap and flag evaluation. The refresh token's jti is rotated on every inline refresh so the next endpoint-triggered refresh uses a fresh identifier
  • AuthenticationProvider (shared-webapp) reads the header from every authenticated response and pushes it into a React state slice; useFeatureFlag reads straight from that state instead of polling the server
  • When the endpoint-triggered refresh fails transiently, the gateway suppresses the x-user-feature-flags header rather than emit a stale claim — the SPA keeps its previous state and reconciles on the next successful refresh

Back-office admin UI

  • List page with status, scope, A/B eligibility, configurability, and orphan/delete badges
  • Detail page with live rollout %, audience stats chips, and sortable/filterable paginated tenant and user tables backed by server-side sort and a stable tie-break by ID
  • Activate/Deactivate toggle is shown only for kill-switch flags so non-kill-switch flags can never be globally deactivated by accident; the toggle is hidden everywhere else (admins use overrides instead)
  • Single-click override toggle with optimistic state, A/B-aware color coding, and a three-click cycle that clears a redundant override when the resulting state would match the rollout default
  • Per-entity "Feature flag rollouts" dialog launched from the "..." menu on account and user detail pages (First / Default / Last) with a corresponding badge on the page header
  • Manual orphan delete dialog, read-only badge for stable modules, and the 5-minute claim refresh window surfaced in the page subtitle and as the description on every mutation toast
  • Every back-office write endpoint has its expected authorization policy pinned with an architecture-test guard so future changes can't silently widen access

Self-service UI

  • /account/settings Features panel surfaces every TenantOwnerConfigurableFlag to tenant owners; non-owners see a read-only state
  • A user preferences panel surfaces UserConfigurableFlag instances so each user can toggle preferences independently of the tenant
  • Both panels share the same OpenAPI-typed mutations as the back-office, so behavior stays consistent

Telemetry

  • Telemetry events fire on every flag mutation (activation, rollout %, override, A/B inclusion pin, manual delete) with snake_case property names aligned to OpenTelemetry conventions
  • Override events carry a FeatureFlagOverrideTrigger axis (Internal / Owner / Self) so dashboards can attribute every change to its originator. Plan-source transitions emit their own dedicated FeatureFlagPlanOverrideActivated/Deactivated events
  • OpenTelemetryEnricher and ApplicationInsightsTelemetryInitializer emit a single comma-separated user.feature_flags dimension carrying every TrackInTelemetry flag the user has enabled, so dashboards can group-by feature flag with native KQL. The feature_flag.* namespace is intentionally avoided because OpenTelemetry reserves it

Checklist

  • I have added tests, or done manual regression tests
  • I have updated the documentation, if necessary

@tjementum tjementum requested a review from a team as a code owner May 14, 2026 21:58
@tjementum tjementum added Enhancement New feature or request Deploy to Staging Set this label on pull requests to deploy code or infrastructure to the Staging environment labels May 14, 2026
@tjementum tjementum self-assigned this May 14, 2026
tjementum added 21 commits May 15, 2026 00:04
…ble groups, Account rename, and user-scoped detail
tjementum added 22 commits May 15, 2026 13:30
@tjementum tjementum moved this to 🏗 In Progress in Kanban board May 15, 2026
@tjementum tjementum linked an issue May 15, 2026 that may be closed by this pull request
@tjementum tjementum removed the Deploy to Staging Set this label on pull requests to deploy code or infrastructure to the Staging environment label May 15, 2026
@sonarqubecloud
Copy link
Copy Markdown

@tjementum tjementum merged commit ad4fc63 into main May 15, 2026
29 checks passed
@tjementum tjementum deleted the feature-flags branch May 15, 2026 16:26
@github-project-automation github-project-automation Bot moved this from 🏗 In Progress to ✅ Done in Kanban board May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Feature flag system

1 participant