Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions docs/config-manager/DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Configuration manager — design

> This PR ships the design only; implementation lands as separate PRs (sei-config + sei-chain). Phase vocabulary from [CLAUDE.md](../../CLAUDE.md): Phase 2 = today's two-file layout; Phase 3 = unified `sei.toml`.

## Background

A `seid` node's config is spread across `config.toml` (Tendermint), `app.toml` (Cosmos + Sei sections: `evm`, `state-store`, `giga_executor`, …), `client.toml`, cobra flags, and `SEID_*`/`SEI_*` env vars resolved by Viper — loaded in `PersistentPreRunE` (`root.go:79-104` → `InterceptConfigsPreRunHandler` → `interceptConfigs`, in the vendored `sei-cosmos` fork). The **sei-config library already exists and is the asset** (unified `SeiConfig`, `DefaultForMode()`, `Validate()`, a key→env→file registry, `SEI_*`/`SEID_*` resolution, an empty `MigrationRegistry` at `CurrentVersion=1`, atomic two-file IO) — **but nothing calls it yet.** This project makes `seid` a **consumer** of that library: a second, *experimental* configuration manager that `seid` selects over the legacy loader when an experimental env var (or config setting) says so. The risk is entirely at that selection seam and in round-trip fidelity against *real* config files — not in the library.

## Goals

- `seid` **selects** its configuration manager at startup via an experimental env var/config — legacy loader (default) vs the sei-config-backed manager; legacy path byte-for-byte unchanged.
- The sei-config-backed manager exposes the library's capabilities in-binary (`doctor` / `generate --mode` / `migrate`).
- A versioning contract for migrating config safely across `seid` releases.
- All changes **inside the sei-chain repo** — zero lines of the `sei-cosmos` fork.

## Non-goals

Phase 3 (one physical `sei.toml`); owning `seid init`, `client.toml`, secrets, or hot-reload; writing migration functions (`CurrentVersion=1`, nothing to migrate); folding sei-config into sei-chain *now* (stays a dependency — see *Future: fold-in*); mode extensions (`replayer`, `seed`/CRD — [Appendix A](#appendix-a--modes-beyond-the-core)).

## ⚠️ Decision: the experimental manager lives in `seid`, driven by urfave/cli v3

The sei-config-backed manager — including its `config …` surface (`doctor`/`generate`/`migrate`/`show`) — runs **inside the `seid` binary** on urfave/cli v3. There is no separate config tool: `seid` is the single consumer and single binary, so CLI-vs-node version skew is impossible (the tool *is* the node binary). The one seam-relevant constraint: **`config` must skip `PersistentPreRunE`** — that hook runs for every subcommand, so without a guard `seid config …` would trigger the legacy interception (and under `SEI_CONFIG_MANAGER=v2`, the gated seam *recursively*), mutating files just to inspect them. Extend the `init` skip at `root.go:97` to also short-circuit `config`. Delegation pattern + accepted costs: [Appendix B](#appendix-b--seid-config-cobraurfave-integration).

## Architecture — two repos

| Piece | Repo | Role |
|---|---|---|
| `seiconfig` library | sei-config | The brain (exists). Resolution, modes, validate, migrate, legacy IO. **Consumed by** `seid`. |
| Selection seam | sei-chain | `PersistentPreRunE` reads the experimental flag; routes to legacy or the sei-config-backed manager. |
| Experimental manager | sei-chain | In-binary, urfave/cli v3: resolves config + `config …` verbs (`doctor \| generate \| migrate \| show`). |

**Future (deferred).** Two consolidations may follow, both reversible and out of scope now: (1) **fold-in** — sei-config collapses into the sei-chain tree so `seid` owns its own config; the dependency arrow is one-way (sei-chain → sei-config), so it's a `git mv` + import change with the seam unchanged. (2) **controller consolidation** — `sei-k8s-controller` stops calling the library directly and drives config through `seid` itself, making `seid` the single config authority. Both deferred; the only discipline now is keeping sei-config a clean leaf, which CLAUDE.md already mandates.

**The seam contract (load-bearing).** Inject at `root.go:101-103`, after the `init` skip. When gated on, the new path must produce exactly what the legacy path does, because `start.go`/`newApp` read config through **two** channels: `serverCtx.Config` (a fully-populated, `SetRoot`/`ValidateBasic`-passing `*tmcfg.Config`) **and** `serverCtx.Viper` (used as `AppOptions` — every Sei section read via `appOpts.Get("evm.http_port")` dotted lookups). So the gated path **materializes the two legacy files, then re-enters the same Viper read+merge tail** (`sei-cosmos/server/util.go:162-219, 317-323`) and calls `bindFlags` for flag>env>file precedence. **It must not feed `app.New` from the in-memory struct** — that silently drops unmodeled keys. Test for: Viper left unpopulated, flag-precedence inversion, `init`-vs-`start` divergence. `client.toml` is handled before the gate, out of scope.

## Env-var gate contract

`SEI_CONFIG_MANAGER` (experimental, opt-in), **value-based**: `v2` → the sei-config-backed manager; unset/`legacy` → legacy (default); anything else → hard startup error (never silent fallback). Read via raw `os.Getenv` atop `PersistentPreRunE`; keeps a clean two-way door. **`SEI_` collision (must-fix):** `seid` already claims `SEI_` via `WithViper("SEI")` and the library uses `SEI_` too — gated on, both resolve the same env vars, fine *only if they agree on destination*. The implementation PR ships a **collision audit** (diff Viper `AutomaticEnv` `SEI_*` keys vs the library's `buildEnvMap`; any disagreement blocks release). Precedence (low→high): mode default < file < `SEI_*` env < flag; `SEI_*` beats deprecated `SEID_*` (stderr warning).

## Versioning & migration

A `schema_version` integer owned by the registry, **decoupled from the seid release** (bumps only on shape change). `doctor` compares it to `CurrentVersion`: newer → refuse to start; older → "migration available." **No auto-migrate on boot** — migration is explicit (`seid config migrate`), dry-run by default, `.bak` before `--write`, no-ops when current; auto-migrate + no-downgrade is a per-pod one-way door that breaks rollback. The MVP seam only **stamps `schema_version` on write**; `doctor`/refuse-on-newer/migrate ride with the (deferred) CLI and the first real migration.

## Modes

Keep the prototype's four — `validator / full / seed / archive`. Modes own **static, role-shaped defaults at generate time only** (which APIs/EVM/state-store are on, pruning, listen addresses); **nothing at runtime**. Per-node identity (`moniker`, `persistent_peers`, `external_address`, keys) comes from operator/controller overrides, **never** a mode default — guard test: `Validate()` fails CI if an identity key appears in any mode's defaults. `DefaultForMode(mode)` stays pure. (Taxonomy nuance → [Appendix A](#appendix-a--modes-beyond-the-core).)

## MVP — the first implementation PR

**Value:** *a real `seid` home dir resolves through the library and produces a node that behaves identically to the legacy path, behind an off-by-default flag* — which de-risks everything downstream. **In:** gate + seam (both channels); the collision audit; a **fidelity test against a sanitized real `config.toml`/`app.toml`** asserting every operator-set key `seid` consumes survives read→write (the non-negotiable safety property); a `KNOWN_UNMAPPED_FIELDS` list (e.g. `ChainID` lives in genesis.json). **Done:** legacy path provably unchanged with flag unset; gated path starts identically and refuses-to-start on `Validate` errors; `make ci` green.

## Deferred (un-defer trigger)

- **`seid config …` CLI** → once the seam is proven on a non-prod node. Thin wrappers over `Validate()`/`DefaultForMode()`; must ship a deterministic exit-code scheme (0 = clean/no-op; nonzero = validation-fail/migration-aborted; distinct code for refuse-on-newer) for initContainer/Job use.
- **Unified `sei.toml` (Phase 3)** → after the two-file round-trip is trusted on ≥3 real fixtures for a release cycle.
- **Migration functions** → when the first breaking change forces `CurrentVersion`→2.
- **K8s render-at-init + secret-field enforcement** → before any env uses ConfigMap delivery (until then, secret deny-list documented, not enforced).
- **Mode/CRD alignment** ([Appendix A](#appendix-a--modes-beyond-the-core)) → when generating `replayer` nodes is required.
- **Controller consolidation onto `seid`** → `sei-k8s-controller` drives config through `seid` instead of calling the library directly (see *Future*); until then the library exposes `ConfigIntent` for direct use.

## Open questions

1. Replicate `SEI_LOG_LEVEL` extrapolation (`util.go:187-217`) in the gated path, or accept a documented delta?
2. Where does on-disk `schema_version` live for legacy-only checkouts — a managed header in `app.toml`, or only the future `sei.toml`?
3. Final `seid config …` naming/flags.

## Cross-repo coordination

**This PR (sei-config):** the design; any library contract change (e.g. version stamping) follows as a sei-config PR, tagged for sei-chain to pin. **Follow-up sei-chain PR(s):** the env-gated seam + fidelity test + collision audit, then the in-binary CLI. The seam PR must not merge until the collision audit passes and the fidelity test is green.

---

## Appendix A — modes beyond the core

Out of core scope; captured so the analysis isn't lost.

- The deployed `SeiNode` CRD union is `validator / fullNode / archive / replayer` — **no `seed`**, and **`fullNode`** (not `full`). The prototype ships `validator / full / seed / archive`.
- `seed` produces operator-CLI defaults only and has no CRD target; `replayer` (mandatory snapshot + peers) is first-class in the fleet but absent from the prototype.
- Aligning the enum — add `replayer`, reconcile `seed`, settle `full` vs `fullNode` — is a one-way door only on the **`generate --mode` CLI surface** (the public contract). The migration registry keys on integer version, not mode strings, so a `v1→v2` migration function rewrites `cfg.Mode` in-place and absorbs the rename cleanly. **Deferred per owner decision; un-defer when generating `replayer` nodes is required.**

## Appendix B — `seid config` cobra↔urfave integration

- **Delegation:** one `config` cobra command with `DisableFlagParsing: true` (already used at `root.go:176,200`) hands the raw arg tail to the urfave `cli.Command`; urfave owns only the `config` subtree and never sees global cobra flags. Errors propagate via `RunE` to `main.go`'s `os.Exit`; urfave's own exit handler is a no-op.
- **Accepted costs:** go.mod already carries urfave/cli **v2** (load-bearing in `sei-db/…/litt/cli`); v3 adds a second major version — legal, deliberate. Shell completion can't introspect a `DisableFlagParsing` subtree, so `config` subcommands won't autocomplete (deferred).
Loading