feat(operator): park multi-deployment barrier copy phase at the cutover barrier#417
Merged
Merged
Conversation
…er barrier OC-2 of the ordered-cutover workstream. An operation of a multi-deployment apply under the barrier policy now auto-defers its cutover, releases the copy drive at waiting_for_cutover, and is persisted parked for the deployment-ordered cutover claim (OC-3) — instead of inline-driving to completed or blocking for a manual cutover. The copy claim no longer re-leases a parked multi-op barrier row. Scoped to multi-op barrier operations, so single-op, rolling, and manual --defer-cutover behaviour is unchanged. Dormant until fan-out lands (one op per apply today). Local operator drive only; the gRPC remote drive parks via a later transport change. OC-3 must land before fan-out enablement (tracker PR 11).
There was a problem hiding this comment.
Pull request overview
This PR updates SchemaBot’s local Tern/operator execution path so that, under the barrier cutover policy for multi-deployment applies, an operation’s copy phase can stop at waiting_for_cutover and release its claim—enabling a later, deployment-ordered cutover claim (OC-3) to drive the high-risk swap.
Changes:
- Threaded “effective” execution options through the local apply/resume/polling stack so operation-scoped drives can auto-defer cutover without persisting that decision onto the apply.
- Added “park + release” behavior at
waiting_for_cutoverfor eligible multi-op barrier operations, including timeout/cancel exemptions and operator-side persistence of the parked operation state. - Updated MySQL store claiming to avoid re-leasing stale parked multi-op barrier rows (reserved for cutover-claim), with new unit/integration coverage.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/tern/local_control_resume.go | Operation-scoped resume now computes effective options and can release at the cutover barrier. |
| pkg/tern/local_client.go | Fresh Apply dispatch explicitly does not auto-park (no operation context). |
| pkg/tern/local_client_test.go | Updates call signatures and adds a unit test for “release at cutover barrier” behavior. |
| pkg/tern/local_client_integration_test.go | Updates atomic polling call signatures for integration coverage. |
| pkg/tern/local_apply.go | Grouped-apply mode selection now uses the effective options map. |
| pkg/tern/local_apply_grouped.go | Threads options + barrier-release through grouped execution and atomic polling; adds parking behavior. |
| pkg/tern/cutover_barrier.go | Introduces helpers to decide auto-defer and compute effective copy-drive options. |
| pkg/tern/cutover_barrier_test.go | Unit tests for auto-defer truth table and grouped-apply gating using effective options. |
| pkg/storage/mysqlstore/apply_operations.go | Adds stale-active exemption so copy-claim won’t re-lease parked multi-op barrier rows. |
| pkg/storage/mysqlstore/apply_operations_test.go | Integration tests covering the new stale-parked exemption scoping. |
| pkg/api/operator.go | Persists waiting_for_cutover on apply_operation rows (non-terminal, resumable). |
| pkg/api/operator_test.go | Unit test ensuring parked operations are persisted via UpdateState (not terminalized). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Derive the barrier claim-release decision separately from the auto-defer decision: when an apply was started with manual --defer-cutover, hold the claim and poll for a manual cutover (documented contract) instead of releasing at the barrier. The cutover is still deferred either way. When tasks exist but the apply_operation row is missing, return a distinct ErrApplyOperationRowMissing that wraps ErrNoTasksForApplyOperation, so the fail-closed errors.Is handling is preserved while the message reads accurately.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
aparajon
approved these changes
Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
OC-2 of the ordered-cutover workstream. Under the
barriercutover policy, an operation of a multi-deployment apply now runs its copy phase and parks atwaiting_for_cutoverinstead of inline-driving tocompleted, so the high-risk atomic swaps can later be driven in deployment order by the cutover claim (OC-3).Today the operator drives each claimed operation inline (copy + cutover in one claim). OC-1 added the dormant cutover-claim predicate; OC-2 makes the copy drive actually stop at the barrier and hand off.
Behaviour
Safety / scope
Testing / validation
References