nodetask: add RestartSeid kind (sidecar restart-seid), remove RestartPod#389
Conversation
PR SummaryHigh Risk Overview The nodetask controller now maps Reviewed by Cursor Bugbot for commit 6f7fba7. Bugbot is set up for automated code reviews on this repo. Configure here. |
RestartSeid is the SeiNode-scoped, sidecar-backed successor to RestartPod. It dispatches the seictl restart-seid task (v0.0.56), which restarts the seid process in place — seid re-reads config.toml WITHOUT bouncing the sidecar, so the sidecar's ready flag survives and there is no ~30-40s mark-ready reapproval gap. Empty payload, no caller-supplied pod UID. Poll-to-completion (registry false): the controller polls until restart-seid reports terminal (the sidecar waits for seid's RPC to come back, and fails loud — no SIGKILL — if seid outlives the grace window; that failure surfaces as a Failed SeiNodeTask). Completion means "seid RPC serving again", not caught-up/voting — gate height with a downstream AwaitNodesAtHeight. Named RestartSeid (not RestartNode) to avoid conflating a Kubernetes Node with a SeiNode, and to align the kind with the restart-seid sidecar task. RestartPod is removed entirely (kind, RestartPodPayload/podUID, spec field, the three podUID CEL rules, restartPodParams, restart_pod.go + tests, registry entry). pod_cycle.go stays — replace_pod still uses it. Verified zero live kind:RestartPod CRs on harbor/prod/dev before removal. Requires seictl v0.0.56. Roll out the controller image (built on v0.0.56) before/with the CRD; a new-CRD + old-controller window fails closed (UnsupportedKind), never silently. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
d428fab to
bbd20be
Compare
podCycle was extracted to share the StatefulSet-fetch / owned-pod / delete helpers between replace-pod and restart-pod. With RestartPod removed, replace-pod is the sole user, so the shared abstraction is no longer justified (CLAUDE.md: no premature helpers). Dissolve it: - fetchStatefulSet/ownedPods/deletePod become methods on replacePodExecution (which now holds cfg ExecutionConfig directly, no embedded podCycle); guardSelectorAndReplicas/ownedByStatefulSet are plain unexported funcs. - Delete pod_cycle.go / pod_cycle_test.go; the still-relevant unit tests move into replace_pod_test.go and the leftover restart-named fixtures are renamed to replace-pod equivalents. - Drop podReady (dead — only restart-pod used it). Pure internal refactor: replace-pod's revision-gated, readiness-blind behavior is byte-for-byte preserved; no API/CRD change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Linear: PLT-438 · Consumes seictl#199 (v0.0.56)
What
RestartSeid— the SeiNode-scoped, sidecar-backed successor toRestartPod. Dispatches the seictlrestart-seidtask, which restarts the seid process in place (SIGTERM the process → kubelet restarts the container) so seid re-readsconfig.tomlwithout bouncing the sidecar. RemovesRestartPodentirely.Named RestartSeid (not RestartNode) to avoid conflating a Kubernetes Node with a SeiNode, and to align the kind with the
restart-seidsidecar task +RestartSeidTaskclient.Why
RestartPoddeleted the whole pod → restarted the sidecar → lost its in-process readiness flag → seid's start-gate + rbac-proxy probe (both on/v0/healthz) waited for the controller's ~30s mark-ready reapproval → ~30–40s not-signing gap per restart. Restarting only the seid process keeps the sidecar (and itsreadyflag) alive → no gap. Validated on harborarctic-1/syncer-0-0-0(2026-06-07).How
RestartSeidkind, emptyRestartSeidPayload, CEL union +has(self.restartSeid).restartSeidParams→(sidecar.TaskTypeRestartSeid, sidecar.RestartSeidTask{}).(false)= poll-to-completion — the sidecar task waits for seid's RPC to come back and fails loud (no SIGKILL) if seid outlives the grace window; that failure surfaces as a Failed SeiNodeTask (tested).effectiveTimeout10m (envelope > the sidecar's ~6.5m worst case).RestartPodPayload/podUID, spec field, the threepodUIDCEL rules,restartPodParams,restart_pod.go+ tests, registry entry.pod_cycle.gokept (used byreplace_pod).Completion = "seid RPC serving again", NOT caught-up/voting — gate height with a downstream
AwaitNodesAtHeight.Breaking change — verified safe
Removing
RestartPodfrom the kind enum is a one-way door. Confirmed zero livekind: RestartPodSeiNodeTask CRs on harbor/prod/dev (the RestartPod-carrying controller image was never rolled out), and no platform/workflow/runbook reference.kind: RestartPodis now rejected by the enum (regression-tested).Rollout order
Requires the controller image rebuilt against seictl v0.0.56. Roll out the controller image before/with the CRD: a new-CRD + old-controller window fails closed (
UnsupportedKind— a RestartSeid CR Fails synthesis, never silently no-ops), so the ordering is safe either way; no live consumer depends on RestartSeid yet.Test
SeiNodeTaskParamsFormaps RestartSeid → restart-seid +RestartSeidTask{}; nil payload → reasonedParamsBuildFailed.restartSeid; union rejects multi-payload;kind: RestartPodrejected by the enum.Reason=TaskFailed).make manifests generate,make test,make test-integration,golangci-lint --new-from-rev=origin/main→ 0.🤖 Generated with Claude Code