From dfd73b2d010c3b553dcb09ec22c5f1054ece2a0e Mon Sep 17 00:00:00 2001 From: bdchatham Date: Mon, 8 Jun 2026 11:40:11 -0700 Subject: [PATCH] docs(planner): bring doc.go to the data-structure documentation standard MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Expand internal/planner/doc.go to the package data-structure documentation standard: add Type Roles, Invariants (each tagged with its guarding test or marked documented-but-unguarded), Zero-Value & Sentinel Semantics, and Concurrency & Staleness. The optimistic-lock + single-patch story previously lived only in CLAUDE.md — this resolves that doc drift by stating it at the structure's definition. Existing sections are preserved verbatim. Also wire the idiomatic-reviewer subagent into CLAUDE.md so idiom-conformance review is automatically in the loop for this repo. Produced by the Tide /idiomatic expert (sei-protocol/Tide#126) piloting on this package. Co-Authored-By: Claude Opus 4.8 (1M context) --- CLAUDE.md | 1 + internal/planner/doc.go | 99 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 100 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index 1f6e020..6095e9e 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -15,6 +15,7 @@ Kubernetes operator for managing Sei blockchain nodes. Single binary, two contro Always use the available subagents for relevant work: - **kubernetes-specialist**: Use for all Kubernetes design, deployment, troubleshooting, and operational decisions. Consult before making changes to CRDs, RBAC, kustomize configs, StatefulSet specs, or any cluster-facing resource definitions. - **platform-engineer**: Use for architectural decisions about platform composability, developer experience, CI/CD workflows, and infrastructure abstraction patterns. +- **idiomatic-reviewer**: Use to review code changes for idiomatic conformance to Go, controller-runtime, and this repo's own documented patterns. It digests this CLAUDE.md + the package's `doc.go` into a local idiom profile that outranks generic idiom, then gives two-altitude feedback (design + surgical), each finding cited. Reviews for idiom; it does not author the system (that's kubernetes-specialist). Backed by the `/idiomatic` skill. ## Code Standards diff --git a/internal/planner/doc.go b/internal/planner/doc.go index 8684150..93876b6 100644 --- a/internal/planner/doc.go +++ b/internal/planner/doc.go @@ -26,6 +26,105 @@ // conditions when creating plans and when observing terminal plans. The executor // does not set conditions — it only mutates plan/task state and phase transitions. // +// # Type Roles +// +// TaskPlan: the structure this package owns. Constructed by the plan builders +// +// (buildBasePlan, buildNodeUpdatePlan, buildMarkReadyPlan, the group +// builders); persisted by the controller into .status.plan; read by the +// planner on subsequent reconciles; mutated in-memory by Executor. Carries +// Phase, the ordered Tasks, TargetPhase, FailedPhase, and on failure +// FailedTaskIndex/FailedTaskDetail. +// +// PlannedTask: one step of a TaskPlan. Built by buildPlannedTask with a +// +// deterministic ID (task.DeterministicTaskID), serialized params, and a +// per-type retry budget (taskMaxRetries). Lives inside TaskPlan.Tasks in +// .status; advanced in-memory by advanceTask. +// +// NodeResolver / NodePlanner / GroupPlanner: the builders. NodeResolver.ResolvePlan +// +// is the node entry point; ForGroup selects a GroupPlanner. Stateless apart +// from the optional BuildSidecarClient factory (nil in tests to skip the probe). +// +// Executor[T client.Object]: drives a persisted plan. Stateless per reconcile — +// +// re-deserializes the current task each call. Mutates only plan/task state +// and, for *SeiNode, .Status.Phase. It never writes to the cluster and never +// persists status; the controller owns the single flushing patch. +// +// # Invariants +// +// Assertions that must always hold. Each names its guarding test, or is marked +// unguarded. +// +// - Atomic plan creation: a newly built plan is persisted before any task +// executes. The controller requeues immediately after the creating +// reconcile (planAlreadyActive gate in controller/node/controller.go); +// execution begins on the next reconcile, so external observers see the +// plan before any side effect. Guarded for persistence by +// TestReconcile_CreatesPlanOnFirstRun; the persist-BEFORE-execute ordering +// (no task submitted on the creating reconcile) is documented but unguarded. +// - Single-patch model: each reconcile snapshots obj.DeepCopy() once, +// accumulates all status mutations (plan state, conditions, phase, +// currentImage) in-memory, and flushes one Status().Patch at the end. The +// executor and the planner mutate only the in-memory object. Enforced at the +// call site (controller/node/controller.go); patch-count-per-path is +// documented but unguarded. +// - Optimistic-lock writes: every .status patch base is built with +// client.MergeFromWithOptimisticLock{} so a stale reconcile cannot silently +// clobber a fresher one and corrupt plan-creation idempotency. Enforced at +// the call site; the stale-write race itself is unguarded (would require an +// envtest concurrency harness). +// - FailedPhase == "" means retry, not terminal: a NodeUpdate plan leaves +// FailedPhase empty, so on failure setTargetPhase is a no-op and the node +// stays Running; the planner rebuilds the plan on the next reconcile. An +// Init plan sets FailedPhase=Failed and transitions out. The empty value is +// guarded by node_update_test.go; the resulting retry-not-terminal phase +// behavior (node remains Running after a NodeUpdate failure) is documented +// but unguarded. +// - nil plan == steady state: when no drift is detected for a Running node, +// buildRunningPlan returns nil and .status.plan stays nil. Absence of a plan +// is the steady state, not an error. Guarded by +// TestBuildRunningPlan_SidecarReady_NoPlan / _SidecarUnknown_NoPlan. +// - Terminal-plan cleanup: when the planner observes a Complete or Failed +// plan it updates conditions and nils .status.plan; an Active plan is left +// untouched. Guarded by the TestHandleTerminalPlan_* tests. +// - Conditions are always-present: condition state is expressed as +// True/False/Unknown with a stable Reason, never by removal (see +// CLAUDE.md "Conditions"). isConditionTrue asserts on Status, not presence. +// +// # Zero-Value & Sentinel Semantics +// +// - .status.plan == nil: no active plan — steady state for a Running node, or +// a resource that has not yet been resolved. Not an error. +// - TaskPlan.FailedPhase == "": failures retry rather than transitioning the +// node to a terminal phase. A non-empty FailedPhase (e.g. PhaseFailed on Init +// plans) is the terminal target. +// - TaskPlan.TargetPhase == "": no phase transition on completion. Group plans +// (SeiNodeDeployment) leave it empty; setTargetPhase is a no-op for any +// object that is not a *SeiNode, so only SeiNode plans drive .Status.Phase. +// - PlannedTask.SubmittedAt == nil: the task has not been submitted yet (or was +// reset for retry). Set once on first submission; used as the plan-duration +// start. Cleared on retry so resubmission re-stamps it. +// - PlannedTask.MaxRetries == 0: the first ExecutionFailed is terminal. +// - FailedTaskIndex / FailedTaskDetail == nil: set only on a Failed plan; +// nil on Active/Complete plans. +// +// # Concurrency & Staleness +// +// Reconciles for the same object run serially within the controller, but two +// near-simultaneous reconciles (e.g. across a leader change or a watch + requeue) +// can both observe .status.plan == nil and both build a plan. To prevent the +// second from silently overwriting the first and corrupting plan-creation +// idempotency, all status writes use a base built with +// client.MergeFromWithOptimisticLock{} (resourceVersion-checked); a stale patch +// is rejected and that reconcile retries. The planner and executor perform no +// I/O against status themselves — they mutate the in-memory object only, and the +// controller flushes exactly one optimistic-lock-protected Status().Patch per +// reconcile (the single-patch model). See sei-k8s-controller/CLAUDE.md +// "Status patches" and "Single-patch model" for the repo-wide rule. +// // # Plan Types // // Init plans transition a node from Pending to Running. They include