Skip to content

Harden live-run reliability from AP-939 test#243

Closed
leobaldock wants to merge 2 commits into
mainfrom
codex/live-run-reliability-fixes
Closed

Harden live-run reliability from AP-939 test#243
leobaldock wants to merge 2 commits into
mainfrom
codex/live-run-reliability-fixes

Conversation

@leobaldock
Copy link
Copy Markdown
Contributor

@leobaldock leobaldock commented May 28, 2026

Summary

  • make Jira runs visible immediately during intake and parked states
  • write active phase state before phase handlers run so the dashboard does not lag live work
  • route unanimous peer-review approval through Phase 9/10 so PR/conformance state completes instead of stopping at review gates
  • add a shared writer lifecycle contract: Cube owns base refresh, verify, staging, commits, and pushes; agents only edit and use read-only Git inspection
  • refresh origin/main before worktree reset/create, and fail clearly if no base ref is available
  • hide known low-signal Codex runtime diagnostics while preserving actionable warnings
  • capture AP-939 live-run follow-up issues in planning/ap-939-live-run-test-fixes.md

Root Cause

AP-939 exposed Cube-owned workflow responsibilities leaking into agents or UI state. Writers were instructed to sync, stage, commit, and push inside a sandbox that cannot write shared Git worktree metadata. The dashboard also depended on completed phase writes, so active/parked runs could be invisible or stale. Finally, the all-approved peer-review path created a PR in Phase 8 and exited before Phase 10 could mark PR/conformance complete.

Validation

  • rtk pytest -q tests/core/test_state.py tests/core/test_writer_contract.py tests/core/test_display_events.py tests/cli/test_adapters.py tests/cli/test_single_writer.py tests/ui/test_run_snapshots.py tests/cli/test_orchestrate_handlers.py tests/core/test_git_refresh.py
  • rtk ruff check python/cube tests
  • rtk mypy python/cube
  • rtk pytest -q tests
  • rtk git diff --check

Harden Live-Run Reliability from AP-939 Test

State Visibility & Persistence

  • Runs now become visible immediately during intake and parked states by calling ensure_state() early in the workflow, persisting initial state before orchestration begins
  • Active phase state is written via start_phase() before phase handlers execute, preventing dashboard lag
  • State deserialisation tolerates missing JSON fields with sensible defaults

Workflow Phase Routing

  • Unanimous peer-review approvals no longer create PRs and exit at phase 8; instead they return all_approved=True and allow phases 9/10 to run, ensuring PR/conformance state completes properly

Writer Lifecycle Contract

  • New WRITER_LIFECYCLE_CONTRACT module defines shared Cube-owned responsibilities: Cube handles base refresh, verify, staging, commits, and pushes; agents only edit and inspect Git
  • Contract is injected into all writer prompts and enforced via ensure_writer_lifecycle_contract(), replacing conflicting hardcoded instructions
  • Dual/single writers receive explicit target parameters (target_agent_id, target_agent_label, delivery) when steers are injected

Git Operations

  • New refresh_origin_main() helper refreshes refs/remotes/origin/main before worktree operations, with graceful fallback to existing ref or clear failure when unavailable

Steering Injection Durability

  • Steers are now persisted to steering-injections.jsonl with record metadata (target agent, delivery type, timing) via record_injected_steers()
  • Legacy steers can be read from older logs, with defensive error handling that never fails the workflow

Dashboard & UI Redesign

  • Display events architecture refactored for provider-neutral log parsing: DisplayEventAdapter abstracts Codex/fallback parsers, generating normalised AgentDisplayEvent objects
  • New /tasks/snapshots and /tasks/{id}/snapshot endpoints return comprehensive run state snapshots (workflow steps, verify status, review cycles, PR/conformance, communications)
  • Agent timeline UI (new AgentTimeline component) renders merged writer/steerer messages with grouped tool events and optional debug output
  • Navigation and layout simplified using new shared ds component library (Container, Navbar, Badge, StatusPill, etc.)

Diagnostics & Noise Filtering

  • Known low-signal Codex stderr patterns are suppressed during parsing, reducing log pollution whilst preserving actionable warnings

Planning Documentation

  • AP-939 issues captured in planning/ap-939-live-run-test-fixes.md (14 observed operational/UI/workflow issues with reproduction, impact, and candidate fixes)

Test Coverage

Extended validation across state management, Git refresh, writer contracts, display events, steering injection, handler orchestration, UI snapshots, and E2E flows.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 247be72c-1da1-44af-8822-59928d6869c2

📥 Commits

Reviewing files that changed from the base of the PR and between 75a430f and 5a8acba.

⛔ Files ignored due to path filters (1)
  • web-ui/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (67)
  • .github/workflows/e2e.yml
  • planning/ap-939-live-run-test-fixes.md
  • python/cube/automation/dual_writers.py
  • python/cube/automation/judge_panel.py
  • python/cube/automation/prompter/steering/__init__.py
  • python/cube/automation/prompter/steering/inject.py
  • python/cube/automation/prompter/steering/injections.py
  • python/cube/automation/prompts.py
  • python/cube/automation/single_writer.py
  • python/cube/automation/verify.py
  • python/cube/cli.py
  • python/cube/commands/feedback.py
  • python/cube/commands/orchestrate/executor.py
  • python/cube/commands/orchestrate/handlers.py
  • python/cube/commands/orchestrate/phases.py
  • python/cube/commands/orchestrate/prompts.py
  • python/cube/commands/peer_review.py
  • python/cube/core/display_events/__init__.py
  • python/cube/core/display_events/base.py
  • python/cube/core/display_events/codex.py
  • python/cube/core/display_events/fallback.py
  • python/cube/core/display_events/models.py
  • python/cube/core/display_events/registry.py
  • python/cube/core/display_events/service.py
  • python/cube/core/display_events/steering.py
  • python/cube/core/git.py
  • python/cube/core/parsers/codex.py
  • python/cube/core/state.py
  • python/cube/core/writer_contract.py
  • python/cube/ui/routes/stream.py
  • python/cube/ui/routes/tasks.py
  • python/cube/ui/server.py
  • tests/automation/test_steering_inject.py
  • tests/cli/test_adapters.py
  • tests/cli/test_orchestrate_handlers.py
  • tests/cli/test_single_writer.py
  • tests/core/test_display_events.py
  • tests/core/test_git_refresh.py
  • tests/core/test_state.py
  • tests/core/test_writer_contract.py
  • tests/e2e/test_web_ui.py
  • tests/ui/test_agent_events.py
  • tests/ui/test_run_snapshots.py
  • tests/ui/test_stream_log_parsers.py
  • web-ui/index.html
  • web-ui/package.json
  • web-ui/postcss.config.js
  • web-ui/src/App.tsx
  • web-ui/src/components/AgentTimeline.tsx
  • web-ui/src/components/DualLayout.tsx
  • web-ui/src/components/JudgeVote.tsx
  • web-ui/src/components/Navigation.tsx
  • web-ui/src/components/OutputStream.tsx
  • web-ui/src/components/PromptsPanel.tsx
  • web-ui/src/components/StatusPill.tsx
  • web-ui/src/components/SynthesisView.tsx
  • web-ui/src/components/TaskCard.tsx
  • web-ui/src/components/ThinkingBox.tsx
  • web-ui/src/components/TripleLayout.tsx
  • web-ui/src/components/WorkflowTimeline.tsx
  • web-ui/src/components/ds.tsx
  • web-ui/src/hooks/useSSE.ts
  • web-ui/src/index.css
  • web-ui/src/pages/Dashboard.tsx
  • web-ui/src/pages/Decisions.tsx
  • web-ui/src/pages/TaskDetail.tsx
  • web-ui/src/types/index.ts

Walkthrough

This pull request introduces a comprehensive multi-layered system overhaul spanning backend display events, workflow state durability, and complete web UI redesign. The change adds provider-neutral display-event parsing infrastructure (Codex, fallback adapters) that converts raw agent output into timeline events; implements durable steering advisory recording with delivery metadata and writer-targeted injection; standardises writer behaviour via a lifecycle contract embedded across synthesis, fixes, and feedback phases; improves state initialisation with ensure_state/start_phase helpers for run visibility; consolidates Git workflows via refresh_origin_main(); expands the backend API with snapshot endpoints providing run metadata, agent events, and source plans; and completely redesigns the web UI from task-card dashboard to snapshot-driven triage view with tabbed task detail, new design system, and agent timeline rendering. The frontend type system is rebuilt around run snapshots, routing is simplified, and deprecated components are removed.

Possibly related PRs

  • aetheronhq/agent-cube#242: Overlaps on steering advisory injection signature changes, JSONL recording infrastructure, and display-event groundwork used by both PRs for steerer message rendering.
  • aetheronhq/agent-cube#176: Shares changes to judge-panel and peer-review prompts, updating what judges read (worktree/HEAD vs. Cube-prepared origin/main base).
  • aetheronhq/agent-cube#235: Overlaps on Jira metadata threading, state persistence, and PR creation flow changes that this PR also modifies.
✨ Finishing Touches
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch codex/live-run-reliability-fixes

@leobaldock
Copy link
Copy Markdown
Contributor Author

Superseded by clean branch PR #244. The first branch was based on the already-merged UI PR history, so this keeps the reliability diff reviewable.

@leobaldock leobaldock closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant