Skip to content

Harden live-run reliability from AP-939 test#244

Merged
leobaldock merged 3 commits into
mainfrom
codex/live-run-reliability-clean
May 28, 2026
Merged

Harden live-run reliability from AP-939 test#244
leobaldock merged 3 commits into
mainfrom
codex/live-run-reliability-clean

Conversation

@leobaldock
Copy link
Copy Markdown
Contributor

@leobaldock leobaldock commented May 28, 2026

Summary

  • make Jira runs visible immediately during intake and parked states
  • write active phase state before phase handlers run so the dashboard does not lag live work
  • route unanimous peer-review approval through Phase 9/10 so PR/conformance state completes instead of stopping at review gates
  • add a shared writer lifecycle contract: Cube owns base refresh, verify, staging, commits, and pushes; agents only edit and use read-only Git inspection
  • refresh origin/main before worktree reset/create, and fail clearly if no base ref is available
  • hide known low-signal Codex runtime diagnostics while preserving actionable warnings
  • capture AP-939 live-run follow-up issues in planning/ap-939-live-run-test-fixes.md

Root Cause

AP-939 exposed Cube-owned workflow responsibilities leaking into agents or UI state. Writers were instructed to sync, stage, commit, and push inside a sandbox that cannot write shared Git worktree metadata. The dashboard also depended on completed phase writes, so active/parked runs could be invisible or stale. Finally, the all-approved peer-review path created a PR in Phase 8 and exited before Phase 10 could mark PR/conformance complete.

Validation

  • rtk pytest -q tests/core/test_state.py tests/core/test_writer_contract.py tests/core/test_display_events.py tests/cli/test_adapters.py tests/cli/test_single_writer.py tests/ui/test_run_snapshots.py tests/cli/test_orchestrate_handlers.py tests/core/test_git_refresh.py
  • rtk ruff check python/cube tests
  • rtk mypy python/cube
  • rtk pytest -q tests
  • rtk git diff --check

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This pull request refactors Cube's writer lifecycle model to centralise state management, defer phase-terminal operations, and enforce consistent ownership semantics across workflows. A new mark_complete parameter on update_phase allows phases to transition without immediate completion, enabling executor pre-phase state recording. The CLI now persists workflow state at intake and Jira boundaries via the updated API. A new refresh_origin_main() helper safely fetches origin/main with fallback verification, replacing direct git fetch calls. Writer and judge prompts are updated to reference Cube-owned git/verify lifecycle and the locally prepared origin/main ref respectively. Phase handlers defer PR creation to allow subsequent phase ownership. UI snapshot detection refactors status computation to use observed boolean signals rather than session strings. Test coverage spans state persistence, git operations, orchestration deferred execution, CLI state workflows, and Codex noise filtering.

Possibly related PRs

  • aetheronhq/agent-cube#182: Deterministic verify-gate and feedback-loop implementation in python/cube/automation/verify.py aligns with this PR's updates to writer-facing verify-failure prompt and post-failure instructions.
  • aetheronhq/agent-cube#192: Stale worktree validation and reset logic in python/cube/core/git.py shares the same create_worktree function modified here to integrate refresh_origin_main() calls.
  • aetheronhq/agent-cube#235: Jira metadata propagation into WorkflowContext and create_pr() directly supports this PR's plumbing of jira_metadata through phase state updates and executor/CLI workflows.
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly aligns with the main objective: hardening live-run reliability based on the AP-939 test findings and issues revealed during testing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • AP-939: Request failed with status code 401

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cube/cli.py`:
- Around line 859-866: The ensure_state call unconditionally overwrites stored
run fields (mode/writer_key) on resume; change the logic so before calling
ensure_state you read the current persisted state for task_id (the stored entry
for path "INTAKE") and only pass mode and writer_key through when the user
explicitly requested a change (e.g., single_mode flag set or writer_key
provided) or when the stored values are missing/blank; otherwise preserve the
stored state values and call ensure_state with the preserved mode/writer_key to
avoid clobbering on resume.

In `@python/cube/core/writer_contract.py`:
- Around line 32-35: The current replacement using
text.split(WRITER_LIFECYCLE_START, 1) and rest.split(WRITER_LIFECYCLE_END, 1)
can raise if the lifecycle block is malformed; update the logic in
writer_contract.py to robustly locate the sentinel pair (use text.find/ rfind or
guard the splits) and only perform the in-place replace when both start and end
are found and in correct order (preserve the existing names before, rest, _old,
after, updated, section); if locating/parsing fails, fall back to prepending the
lifecycle section (i.e., set updated = section + text) so corrupted partial
blocks do not break writer startup.

In `@python/cube/ui/routes/tasks.py`:
- Around line 1123-1126: _writer_log_exists currently only matches
writer-...-*.json which misses other valid artefact extensions (.jsonl, .log,
.txt) and causes false "not-started" states; update the function to detect any
file with the writer-{writer_name}-{task_id}-* prefix (e.g.,
LOGS_DIR.glob(f"writer-{writer_name}-{task_id}-*")) or explicitly include the
extensions ('.json','.jsonl','.log','.txt'), and apply the same fix to the
analogous check found around lines referenced (the block at 1078-1085) so all
writer log detection is extension-agnostic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e213a3cf-a91d-4d71-be48-1fbbae8ef981

📥 Commits

Reviewing files that changed from the base of the PR and between 75a430f and d4aa67c.

📒 Files selected for processing (23)
  • planning/ap-939-live-run-test-fixes.md
  • python/cube/automation/judge_panel.py
  • python/cube/automation/prompts.py
  • python/cube/automation/verify.py
  • python/cube/cli.py
  • python/cube/commands/orchestrate/executor.py
  • python/cube/commands/orchestrate/handlers.py
  • python/cube/commands/orchestrate/phases.py
  • python/cube/commands/orchestrate/prompts.py
  • python/cube/commands/peer_review.py
  • python/cube/core/git.py
  • python/cube/core/parsers/codex.py
  • python/cube/core/state.py
  • python/cube/core/writer_contract.py
  • python/cube/ui/routes/tasks.py
  • tests/cli/test_adapters.py
  • tests/cli/test_orchestrate_handlers.py
  • tests/cli/test_single_writer.py
  • tests/core/test_display_events.py
  • tests/core/test_git_refresh.py
  • tests/core/test_state.py
  • tests/core/test_writer_contract.py
  • tests/ui/test_run_snapshots.py
📜 Review details
🧰 Additional context used
🪛 LanguageTool
planning/ap-939-live-run-test-fixes.md

[style] ~24-~24: To elevate your writing, try using a synonym here.
Context: ...nch showed the real state. - Impact: hard to tell whether the run is active, park...

(HARD_TO)


[formatting] ~65-~65: Insert a comma before quoting reported speech: “says, "”…
Context: ...139 and 226. Cube's parent run log then says "Changes committed and pushed", so the pa...

(SAID_COMMA_SPEECH)


[uncategorized] ~66-~66: Possible missing comma found.
Context: ...s unclear. It also encourages raw git commands despite the repo instruction to use `rt...

(AI_HYDRA_LEO_MISSING_COMMA)


[grammar] ~76-~76: “Run” is a singular noun. It appears that the verb form is incorrect.
Context: ...step as pending. - Evidence: the run log shows unanimous peer-review approval fo...

(PCT_SINGULAR_NOUN_PLURAL_VERB_AGREEMENT)

🪛 Ruff (0.15.14)
python/cube/core/git.py

[error] 28-28: Starting a process with a partial executable path

(S607)


[error] 36-36: Starting a process with a partial executable path

(S607)

🔇 Additional comments (14)
planning/ap-939-live-run-test-fixes.md (1)

1-79: LGTM!

tests/core/test_writer_contract.py (1)

9-33: LGTM!

python/cube/core/state.py (1)

115-154: LGTM!

Also applies to: 182-182

tests/core/test_state.py (1)

8-8: LGTM!

Also applies to: 13-13, 84-110

python/cube/cli.py (1)

578-585: LGTM!

Also applies to: 743-748

python/cube/commands/orchestrate/executor.py (1)

5-5: LGTM!

Also applies to: 60-67

python/cube/commands/orchestrate/handlers.py (1)

626-627: LGTM!

python/cube/commands/orchestrate/prompts.py (1)

12-12: LGTM!

Also applies to: 23-23, 42-46, 61-61, 97-97, 199-202, 212-212, 237-237

tests/core/test_git_refresh.py (1)

1-45: LGTM!

tests/cli/test_orchestrate_handlers.py (1)

1-84: LGTM!

tests/cli/test_single_writer.py (1)

9-10: LGTM!

Also applies to: 89-90, 93-95, 125-130, 133-177

tests/ui/test_run_snapshots.py (1)

139-156: LGTM!

Also applies to: 158-177, 229-229, 262-262

tests/cli/test_adapters.py (1)

245-257: LGTM!

tests/core/test_display_events.py (1)

90-97: LGTM!

Also applies to: 103-103

Comment thread python/cube/cli.py Outdated
Comment thread python/cube/core/writer_contract.py Outdated
Comment thread python/cube/ui/routes/tasks.py Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/cube/cli.py (1)

862-885: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the persisted mode/writer for runtime execution, not the pre-override values.

You correctly persist persisted_mode and persisted_writer_key, but execution still uses single_mode and writer_key. On resume, this can run a different mode than the stored workflow state.

Suggested fix
@@
     if existing_state is None or explicit_mode_override:
         persisted_mode = computed_mode
         persisted_writer_key = writer_key
@@
         else:
             persisted_writer_key = None
+
+    effective_single_mode = persisted_mode == "single"
+    effective_writer_key = persisted_writer_key
@@
         asyncio.run(
             orchestrate_auto_command(
                 task_file,
                 resolved_resume_from,
                 task_id=task_id,
                 resume_alias=resume_alias,
-                single_mode=single_mode,
-                writer_key=writer_key,
+                single_mode=effective_single_mode,
+                writer_key=effective_writer_key,
                 fresh_writer=fresh_writer,
                 fresh_judges=fresh,
                 resume_prompt=prompt,
                 jira_metadata=jira_metadata,
             )
         )

Also applies to: 908-916

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cube/cli.py` around lines 862 - 885, The runtime execution is still
using the pre-override variables (single_mode and writer_key) instead of the
persisted values, which can cause resumed runs to execute in a different mode
than stored; update the places that determine runtime behavior (after load_state
and update_phase, and the analogous block around lines 908-916) to use
persisted_mode and persisted_writer_key for any downstream decision-making or
execution (e.g., task execution, mode checks, writer selection) so the running
process honors the persisted state from load_state and update_phase.
🧹 Nitpick comments (1)
python/cube/automation/verify.py (1)

100-100: ⚡ Quick win

Consider adding missing git commands for consistency with the main writer prompt.

The prohibition list here omits git fetch, git merge, and git pull, which are explicitly prohibited in the main writer prompt (prompts.py lines 44-45). While writers have already seen the full constraints and the most critical commands (add/commit/push) are present, adding these for completeness would reinforce the lifecycle contract consistently across all writer-facing prompts.

♻️ Proposed addition for consistency
-**Do not run verify, tests, lint, typecheck, `git add`, `git commit`, or `git push`.** Cube re-runs verify and commits/pushes outside the writer sandbox once you exit.
+**Do not run verify, tests, lint, typecheck, `git fetch`, `git merge`, `git pull`, `git add`, `git commit`, or `git push`.** Cube re-runs verify and commits/pushes outside the writer sandbox once you exit.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cube/automation/verify.py` at line 100, Update the prohibition message
in verify.py (the "Do not run verify, tests, lint, typecheck, `git add`, `git
commit`, or `git push`." text) to also include `git fetch`, `git merge`, and
`git pull` so it matches the main writer prompt constraints; locate the
prohibition string in python/cube/automation/verify.py (around the displayed "Do
not run verify..." message) and append those three commands to the list for
consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@python/cube/cli.py`:
- Around line 862-885: The runtime execution is still using the pre-override
variables (single_mode and writer_key) instead of the persisted values, which
can cause resumed runs to execute in a different mode than stored; update the
places that determine runtime behavior (after load_state and update_phase, and
the analogous block around lines 908-916) to use persisted_mode and
persisted_writer_key for any downstream decision-making or execution (e.g., task
execution, mode checks, writer selection) so the running process honors the
persisted state from load_state and update_phase.

---

Nitpick comments:
In `@python/cube/automation/verify.py`:
- Line 100: Update the prohibition message in verify.py (the "Do not run verify,
tests, lint, typecheck, `git add`, `git commit`, or `git push`." text) to also
include `git fetch`, `git merge`, and `git pull` so it matches the main writer
prompt constraints; locate the prohibition string in
python/cube/automation/verify.py (around the displayed "Do not run verify..."
message) and append those three commands to the list for consistency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 98a24c15-bb73-45e0-a605-8cf8c3d2a86f

📥 Commits

Reviewing files that changed from the base of the PR and between 2334260 and 45cbdc5.

📒 Files selected for processing (9)
  • python/cube/automation/verify.py
  • python/cube/cli.py
  • python/cube/commands/orchestrate/executor.py
  • python/cube/commands/orchestrate/phases.py
  • python/cube/commands/orchestrate/prompts.py
  • python/cube/core/state.py
  • python/cube/ui/routes/tasks.py
  • tests/core/test_state.py
  • tests/ui/test_run_snapshots.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/ui/test_run_snapshots.py
  • python/cube/commands/orchestrate/phases.py
📜 Review details
🔇 Additional comments (7)
python/cube/core/state.py (1)

126-151: LGTM!

tests/core/test_state.py (1)

82-87: LGTM!

Also applies to: 98-104

python/cube/commands/orchestrate/executor.py (1)

5-5: LGTM!

Also applies to: 60-68

python/cube/commands/orchestrate/prompts.py (3)

40-47: LGTM!


63-63: LGTM!


200-210: LGTM!

python/cube/ui/routes/tasks.py (1)

1084-1084: LGTM!

Also applies to: 1122-1131

@leobaldock leobaldock merged commit 803e1bd into main May 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant