feat(sidecar): add restart-seid task for in-place seid restart by bdchatham · Pull Request #199 · sei-protocol/seictl

bdchatham · 2026-06-07T02:34:47Z

Linear: PLT-438

What

Adds a restart-seid sidecar task that restarts the co-located seid process in place — seid re-reads config.toml on the restart without bouncing the sidecar.

Why

Today the only way to make a running node re-read config (e.g. a refreshed persistent-peers set from discover-peers) is to delete the whole pod (the controller's RestartPod kind). That restarts the sidecar too, which loses its in-process readiness flag; seid's start-gate and the rbac-proxy readiness probe both sit on /v0/healthz, which only returns 200 after mark-ready — re-marked by the controller on a ~30s poll. Net: a ~30–40s not-signing gap per restart on a validator.

Restarting only the seid process keeps the sidecar (and its ready flag) alive → /v0/healthz stays 200 → seid reboots immediately, no gap. Validated on harbor arctic-1/syncer-0-0-0 (2026-06-07): seid restarts in place, sidecar restarts=0/ready throughout, pod UID unchanged.

How

Find the running seid start process via /proc — comm == seid corroborated with the start subcommand, so it never matches seid-init or the bash wait-loop wrapper. (The sidecar image is distroless; this is done in Go, not via ps.)
Drive the existing actions.GracefulStop: SIGTERM → 30s grace → SIGKILL. Works because seid + sidecar share the pod PID namespace (shareProcessNamespace: true) and run as the same UID (65532) — no CAP_KILL.
Complete when seid's local RPC serves /status again (the sidecar's own /v0/healthz stays 200, so completion probes seid directly). The kubelet restarts the seid container (restartPolicy: Always) once its process exits.
Does not flip the engine ready flag — not a readiness op.

The three OS interactions (find-pid / signal / probe-rpc-up) are injectable for unit tests.

Changes

sidecar/engine/types.go — TaskRestartSeid task type
sidecar/tasks/restart_seid.go (+ test) — the handler; reuses actions.GracefulStop / SignalPID / PIDAlive
serve.go — register the handler
sidecar/client/tasks.go — TaskTypeRestartSeid + RestartSeidTask{} (mirrors MarkReadyTask); sidecar/client/client.go — SubmitRestartSeidTask
version.json — v0.0.55 → v0.0.56 (cuts the release + container build on merge)

Test

Handler: happy path (found → SIGTERM → gone → RPC up), grace-timeout → SIGKILL escalation, seid-not-found → wait-for-up, RPC-never-up → timeout error; isSeidStart cmdline table.
Client: RestartSeidTask round-trip.
go build ./..., go test ./sidecar/... green.

Consumed by

sei-k8s-controller RestartNode SeiNodeTask kind (supersedes RestartPod) — wired after this releases.

🤖 Generated with Claude Code

Adds a `restart-seid` sidecar task that restarts the co-located seid process in place — seid re-reads config.toml on the restart WITHOUT bouncing the sidecar, so the sidecar's in-process readiness flag survives and /v0/healthz stays 200 (no mark-ready reapproval gap). The handler finds the running `seid start` process via /proc (corroborating argv[0]==seid with the "start" subcommand so it never matches seid-init or the bash wrapper), drives the existing actions.GracefulStop (SIGTERM → 30s grace → SIGKILL), and completes when seid's local RPC serves /status again. The kubelet restarts the seid container (restartPolicy: Always) once its main process exits; this works because seid and the sidecar share the pod PID namespace and run as the same UID. The handler never starts seid and never flips the engine ready flag — it is not a readiness operation. The three OS interactions (find-pid, signal, probe-rpc-up) are injectable for unit testing. Adds the RestartSeidTask client struct + SubmitRestartSeidTask helper. Bumps version.json v0.0.55 → v0.0.56. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cursor · 2026-06-07T02:34:51Z

PR Summary

High Risk
The task sends SIGTERM to a live validator and can leave signing interrupted for minutes if shutdown or RPC recovery fails; graceful-only policy avoids SIGKILL but increases stuck-process risk.

Overview
Adds a restart-seid sidecar task so operators can recycle the co-located seid start process without restarting the sidecar—intended to reload config.toml (e.g. after peer discovery) while keeping the in-process ready flag and /v0/healthz behavior unchanged.

The new RestartSeider handler locates seid start via /proc (not generic seid / init / bash wrappers), sends SIGTERM, waits up to 90s for exit without SIGKILL, then polls local CometBFT /status for up to 5m. It fails if RPC is up but the process is invisible in /proc, and does not treat “RPC up” as caught-up or voting.

Wiring: TaskRestartSeid in the engine, handler registration in serve.go, client RestartSeidTask + SubmitRestartSeidTask, unit tests, and version v0.0.56.

^{Reviewed by Cursor Bugbot for commit 967c90e. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes using default effort and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a7b70b3. Configure here.}

Address cross-review (k8s + platform + sei-network): - sei-network BLOCKER: never silently force-kill a validator. Grace 30s→90s (the ~3s figure was idle-only; loaded shutdown = WAL flush + PebbleDB/IAVL close, possibly mid-compaction). Replace the inherited unconditional-SIGKILL GracefulStop with a graceful-only stop: SIGTERM, poll until exit or the grace deadline; if seid is still alive at the deadline, FAIL the task and leave seid running (a stuck-but-alive validator is safer than a force-kill mid-commit). No SIGKILL path remains; no force opt-in added (deferred, YAGNI). - k8s: close the silent no-op — if the seid process isn't found but its RPC is already serving, return a hard error rather than completing a restart that didn't happen. Genuinely-down (RPC not serving) still proceeds to wait-for-up. - platform: fix the inaccurate seidRPCUp comment (it checks latest_block_height parses, not node_info.network). - Document the completion contract: complete = "seid RPC serving again", NOT caught-up/voting; gate height downstream (AwaitNodesAtHeight). Tests: grace-timeout → fail-without-SIGKILL (asserts only SIGTERM sent); not-found+RPC-down → wait; not-found+RPC-up → hard error; waitForUp context-cancellation; plus happy-path, timeout, isSeidStart table. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-07T02:45:56Z

Suggested version: v0.0.56

Comparing to: v0.0.55 (diff)

Changes in go.mod file(s):

(empty)

gorelease says:

gorelease: preparing to load packages for github.com/sei-protocol/seictl: looking for missing dependencies: go: -d flag is deprecated. -d=true is a no-op
go: github.com/gogo/protobuf@v1.3.3: reading github.com/gogo/protobuf/go.mod at revision v1.3.3: unknown revision v1.3.3

gocompat says:

Your branch is up to date with 'origin/main'.

Cutting a Release (and modifying non-markdown files)

This PR is modifying both version.json and non-markdown files.
The Release Checker is not able to analyse files that are not checked in to main. This might cause the above analysis to be inaccurate.
Please consider performing all the code changes in a separate PR before cutting the release.

Automatically created GitHub Release

A draft GitHub Release has been created.
It is going to be published when this PR is merged.
You can modify its' body to include any release notes you wish to include with the release.

cursor Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread sidecar/tasks/restart_seid.go Outdated

Comment thread sidecar/tasks/restart_seid.go

Comment thread sidecar/tasks/restart_seid.go

bdchatham merged commit 640a599 into main Jun 7, 2026
4 checks passed

bdchatham deleted the brandon2/plt-438-restart-seid branch June 7, 2026 02:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sidecar): add restart-seid task for in-place seid restart#199

feat(sidecar): add restart-seid task for in-place seid restart#199
bdchatham merged 2 commits into
mainfrom
brandon2/plt-438-restart-seid

bdchatham commented Jun 7, 2026

Uh oh!

cursor Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdchatham commented Jun 7, 2026

What

Why

How

Changes

Test

Consumed by

Uh oh!

cursor Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 7, 2026

Cutting a Release (and modifying non-markdown files)

Automatically created GitHub Release

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cursor Bot commented Jun 7, 2026 •

edited

Loading