feat(sidecar): add restart-seid task for in-place seid restart#199
Conversation
Adds a `restart-seid` sidecar task that restarts the co-located seid process in place — seid re-reads config.toml on the restart WITHOUT bouncing the sidecar, so the sidecar's in-process readiness flag survives and /v0/healthz stays 200 (no mark-ready reapproval gap). The handler finds the running `seid start` process via /proc (corroborating argv[0]==seid with the "start" subcommand so it never matches seid-init or the bash wrapper), drives the existing actions.GracefulStop (SIGTERM → 30s grace → SIGKILL), and completes when seid's local RPC serves /status again. The kubelet restarts the seid container (restartPolicy: Always) once its main process exits; this works because seid and the sidecar share the pod PID namespace and run as the same UID. The handler never starts seid and never flips the engine ready flag — it is not a readiness operation. The three OS interactions (find-pid, signal, probe-rpc-up) are injectable for unit testing. Adds the RestartSeidTask client struct + SubmitRestartSeidTask helper. Bumps version.json v0.0.55 → v0.0.56. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PR SummaryHigh Risk Overview The new Wiring: Reviewed by Cursor Bugbot for commit 967c90e. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a7b70b3. Configure here.
Address cross-review (k8s + platform + sei-network): - sei-network BLOCKER: never silently force-kill a validator. Grace 30s→90s (the ~3s figure was idle-only; loaded shutdown = WAL flush + PebbleDB/IAVL close, possibly mid-compaction). Replace the inherited unconditional-SIGKILL GracefulStop with a graceful-only stop: SIGTERM, poll until exit or the grace deadline; if seid is still alive at the deadline, FAIL the task and leave seid running (a stuck-but-alive validator is safer than a force-kill mid-commit). No SIGKILL path remains; no force opt-in added (deferred, YAGNI). - k8s: close the silent no-op — if the seid process isn't found but its RPC is already serving, return a hard error rather than completing a restart that didn't happen. Genuinely-down (RPC not serving) still proceeds to wait-for-up. - platform: fix the inaccurate seidRPCUp comment (it checks latest_block_height parses, not node_info.network). - Document the completion contract: complete = "seid RPC serving again", NOT caught-up/voting; gate height downstream (AwaitNodesAtHeight). Tests: grace-timeout → fail-without-SIGKILL (asserts only SIGTERM sent); not-found+RPC-down → wait; not-found+RPC-up → hard error; waitForUp context-cancellation; plus happy-path, timeout, isSeidStart table. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Suggested version: Changes in (empty)
Cutting a Release (and modifying non-markdown files)This PR is modifying both Automatically created GitHub ReleaseA draft GitHub Release has been created. |

Linear: PLT-438
What
Adds a
restart-seidsidecar task that restarts the co-located seid process in place — seid re-readsconfig.tomlon the restart without bouncing the sidecar.Why
Today the only way to make a running node re-read config (e.g. a refreshed
persistent-peersset fromdiscover-peers) is to delete the whole pod (the controller'sRestartPodkind). That restarts the sidecar too, which loses its in-process readiness flag; seid's start-gate and the rbac-proxy readiness probe both sit on/v0/healthz, which only returns 200 aftermark-ready— re-marked by the controller on a ~30s poll. Net: a ~30–40s not-signing gap per restart on a validator.Restarting only the seid process keeps the sidecar (and its
readyflag) alive →/v0/healthzstays 200 → seid reboots immediately, no gap. Validated on harborarctic-1/syncer-0-0-0(2026-06-07): seid restarts in place, sidecarrestarts=0/ready throughout, pod UID unchanged.How
seid startprocess via/proc—comm == seidcorroborated with thestartsubcommand, so it never matchesseid-initor the bash wait-loop wrapper. (The sidecar image is distroless; this is done in Go, not viaps.)actions.GracefulStop: SIGTERM → 30s grace → SIGKILL. Works because seid + sidecar share the pod PID namespace (shareProcessNamespace: true) and run as the same UID (65532) — noCAP_KILL./statusagain (the sidecar's own/v0/healthzstays 200, so completion probes seid directly). The kubelet restarts the seid container (restartPolicy: Always) once its process exits.readyflag — not a readiness op.The three OS interactions (find-pid / signal / probe-rpc-up) are injectable for unit tests.
Changes
sidecar/engine/types.go—TaskRestartSeidtask typesidecar/tasks/restart_seid.go(+ test) — the handler; reusesactions.GracefulStop/SignalPID/PIDAliveserve.go— register the handlersidecar/client/tasks.go—TaskTypeRestartSeid+RestartSeidTask{}(mirrorsMarkReadyTask);sidecar/client/client.go—SubmitRestartSeidTaskversion.json— v0.0.55 → v0.0.56 (cuts the release + container build on merge)Test
isSeidStartcmdline table.RestartSeidTaskround-trip.go build ./...,go test ./sidecar/...green.Consumed by
sei-k8s-controller
RestartNodeSeiNodeTask kind (supersedesRestartPod) — wired after this releases.🤖 Generated with Claude Code